Today: AI Generated Clinical Summaries Require More than Accuracy

Worried about AI generated Clinical Summaries. You probably should be, for now.

Transcript

Today in health, it AI generated clinical summary. So we're going to take a look. And see where we're at. All right. My name is bill Russell. I'm a former CIO for a 16 hospital system. And creative this week health set of channels and events dedicated to transform health care. One connection at a time. We want to thank our show sponsors who are investing in developing the next generation of health leaders.

Short test artist site interprise health parlance, certified health notable. Service now and Panda health. Check them out at this week. health.com/today. All right. Let's see. Oh, how can I forget share this podcast with a friend or colleague use it as a foundation for daily or weekly discussions on the topics. That are relevant to you and the industry use it as a foundation for mentoring. They can subscribe wherever you listen to podcasts. All right.

th,: 2024

LLMs are advancing rapidly. In the longterm LLMs may revolutionize much of clinical medicine from patient diagnoses. To treatment in the short term. However, it is the everyday clinical tasks that LLMs will change most quickly. And with the least scrutiny, specifically, LLMs that summarize clinical notes. Medications and other forms of patient data or an advanced development, and could soon reach patients without us. Food and drug administration, FDA oversight summarization though, is not as simple as it seems and variation in LLM. Generated summaries could exert important and unpredictable effects on clinician.

Decision-making. The other article I'm taking a look at is massive. It's a big, old article. From a PhD. AI experts as doctors relying on generative AI. To summarize medical notes might unknowingly be taking big risks. All right. So these are the two things I think there's that. The use of this falls into two categories, but needs to go through the the framework that we've talked about many times on the show, which is. The copilot framework that it is an assistant that is generating results that get put in front of an expert, a clinician, a physician, a specialist of some kind who's reviewing that.

And then approves that. That information. So there are risks associated with it. We understand that LLMs require it's just like any other programming, except the programming language is natural language and it's prompts. And so much the same way. We don't want doctors creating their own lists and doing drug interaction with Excel spreadsheets. We also don't want them to start taking these things and doing their own prompt engineering, especially on clinical summaries and those kinds of things. With that being said, There.

And by the way, I'm hearing this over and over again in interviews, I'm hearing a reticence to for the summaries. And it makes sense when you hear the argument, it makes sense. It's essentially look. If it changes one or two words in that summary, it changes the whole meaning. Now, and we've all experienced at least one hallucination on these things.

We've also experienced some inaccuracies. I will give you. My anecdotal silly story is I had to try to respond to some of my emails. It doesn't respond the way I do it. Doesn't respond in my voice. Where the email's accurate in some respects. Yes, but it didn't pick up nuance.

It didn't pick up some of the things. It didn't pick up the history. That's not in the email. It only has that email. To which the function on. So it can't function on information. It doesn't have or tone or those kinds of things. So it it's acting on incomplete information. There's all sorts of things that can go wrong.

If physicians take this into their own hands. Now, I don't think that's how this is going to end up in the clinical setting. How I believe this is going to end up in the clinical setting is through through programmers and through researchers and others. UI UX, you name it, all the things that we put into development. I think they are going to generate things that are going to be tested significantly and then dropped into the EHR. I think that's going to be the approach that happens.

Let me tell you how this thing is going to start accelerating. There's a machine learning is a term that's thrown around. And a little, very little understood, but it is essentially I think going to be the foundation for training these models. Much quicker and more comprehensively, these large language models that we're used to. And, when I think about how they did it with computer. Assisted vision, computer vision training. And we've talked about this on the show as well.

It's really interesting to me what they used to do. So they used to take hundreds, thousands, millions of photos. Or pictures and they would send them to these massive farms of people who would say, this is a ball. This is an alligator. This is a rooftop. This is, and they would categorize all these pictures and that's how they would train. So fast forward a little bit. And these models have become much more sophisticated.

These machine learning models have become much more sophisticated and they're self-taught. And let me give you an example. Was talking with the guys at artist site who are who do computer vision. In the In the patient rooms and quite frankly, everywhere, but the patient room we were specifically talking about and I'm like, okay, how do you train it on all these different things?

And how do you train it so fast? Because that used to be the thing that used to take forever to train it. And they were talking about the fact that, we can take a picture of the room. And then we take that picture of the room. And you can essentially take pieces of that picture out. And then have the computer guess what's actually in the room and it actually fills in the other pieces, but it has the original photo.

So it can go, Hey, here's my guess at it. It'll look at the original photo and say, oh, I was wrong and it'll keep doing that. And it can do that millions of times until it figures out, oh, that's the room or that's a patient sitting up or that's a patient that's sitting up, that's a potential fall risk.

Or that's a patient that's been sitting in that location and they need to be turned for bed sores and those kinds of things. So we. We have the ability to create models where machines teach machines. Much quicker. Than humans, teaching machines. And so we still have the need for human oversight of that training and and fine tuning of that training for sure.

But it's not farms of people who are working on it. And so I say that to say, I think what's going to happen in the medication medical notes and summarizations is you're going to have this kind of machine learning that is going to be unleashed on these, in conjunction with these LLMs. And you're going to have the correct. Summary. You're going to have the summary that chatty. CPT comes up with or whatever it happens to be whatever the summary.

And by the way, I think there's going to be a lot of specialty, large language models in healthcare. I don't think chat GPT in and of itself is going to be the big winner. I think there will be specifically trained models. There's going to be big models. They're going to be small models. That get brought together and orchestrated together to come up with a, come up with the right summaries. With that being said, summary. You.

So you have the summary that the large language model came up with. Do you have the correct summary and it'll compare and say, oh, what'd I get right? What'd I get wrong? Oh, okay. Then it learns and it can do it again. And it can do it with the next patient and the next patient. And because we have a picture of what it could look like or should look like. When a human does it, not that humans don't make mistakes, we could we think we can then create machine learning models that can train these large language models faster. And I think that's already happening. Like I don't have the research on it, but I think that's already happening. And so when people say, Hey, it's going to change one or two words and that's going to change the entire meaning of the summary. I think we have ways, not only that way, but I think there's a I actually, again, I don't know the model name, but there's this antagonist model name where essentially it comes up with the summary. And then there's another model.

That's the antagonist. That's looking at the summary and questioning the summary and making it go back to the original and, adjust or fine tune it based on. The antagonist questions. To that, but again, the antagonist, isn't a human, the antagonist is another computer. I think you will.

Or another system. I think you will have systems that check systems, that antagonize systems that that are specialists systems that look at things and say, you know what, based on these drugs and these things, this summary doesn't make sense. And that orchestration of these various systems working in tandem, even if there are an antagonist there, they're working in tandem together. They will become learning systems.

This is one of my big arguments with AI. I artificial intelligence to me is a system that learns. It learns and it gets better with time. So when people say, oh, we don't even have a good definition of AI. I, I think AI models are our models that get better with use with each use. It gets better. It learns something.

It does something differently. And hopefully in their direction of better instead of the direction of worse, which we've seen in AI models, right? If you constantly reinforce it negatively, it can turn into a racist, a bigot, and those kinds of things. That's some of the really early models we saw got stuck there. And you see the adjustments that have been made to these models, that it's a much harder. To to do that negative reinforcement after a while to just cut you off, just like a human, if you negatively reinforce it long enough, it's going to say, you know what, I'm going to go find some other reinforcement.

This doesn't work. This is how I'm thinking about it. I thought I would as I'm reading these stories, And the concerns around medical notes, by the way, I share the concerns around medical notes. That challenge. Is that these, when you read the responses, they lowly you into a sense of, Hey, this is working.

This is, this is close enough. Isn't good enough. And healthcare. It has to be perfect. And while humans can make mistakes on the chart, a computer cannot make a mistake. Maybe make a mistake on the charge summary. And so we have to. We can't let our guard down. And we have to reinforce with clinicians who are potentially using these tools. Off on their own that, Hey, those shortcuts could be dangerous. And, let us. And not because we want to be the creators and the controllers of it. But at this point, we know how computers and how machines function.

We know what they're good at. We know what they're not good at. We know how to program them. We know how to adjust that programming and fine tune the configurations on that. Even if it's, even if it's prompt engineering, we should understand prompt engineering. Better than the average user. Because we've been utilizing these systems for decades, we've been programming them and making them do things. Those are just some of the things that are on my mind as I read these articles and I am concerned about summaries.

I think we will make significant progress in the next year. And I don't even think it's five years. I think it's in the next year. We'll just continually see progress. I think there's organizations, I think there's companies that want to sell you things that are working on this right now. And I think there is a lots and lots of money. And and brainpower chasing this specific problem. As we speak.

All right. I don't know what you can do with that, other than, keep an eye out. You going across these floors or somebody coming in and talk to you about AI. Talk to them about how we validate the summaries. How do we validate the information that's coming across and is it possible to get machines to validate that information? Give it more iterations, more cycles to VAT to get the validation right before we put it in front of a human and they validated. Cause it would be great if that success rate went from 98% to 99% to 99.99%. To five nines, if we can get there. All right.

That's all for today. Don't forget. Share this podcast with a friend or colleague mentor someone. We want to thank our channel sponsors who are investing in our mission to develop the next generation of health leaders. Short tests, artist site, enterprise health parlance. Certified health, notable service now and 📍 Panda health. Check them out at this week.

health.com/today. Thanks for listening. That's all for now.

Subscribe to This Week Health

Share this episode

Transcript

Thank You to Our Show Partners

Our Shows

Related Content

TownHall: Creative Rural Healthcare Solutions with Sue Schade and Brian Sterud

TownHall: A Look Inside the Incredible World of Organ Donation with Mathew Moss

Newsday: Mistaking Ambition for Readiness and Cultivating Talent with Samme Diaz

Newsday: The Looming Hospital Capacity Crisis and Nurse Violence with Philipp von Gilsa

Healthcare Transformation Powered by Community

Subscribe to This Week Health

Share this episode

Transcript

Thank You to Our Show Partners

Our Shows

Related Content

TownHall: Creative Rural Healthcare Solutions with Sue Schade and Brian Sterud

TownHall: A Look Inside the Incredible World of Organ Donation with Mathew Moss

Newsday: Mistaking Ambition for Readiness and Cultivating Talent with Samme Diaz

Newsday: The Looming Hospital Capacity Crisis and Nurse Violence with Philipp von Gilsa

Search

Healthcare Transformation Powered by Community