October 25, 2024: What happens when AI starts making medical decisions? A recent UCSF study reveals startling insights into ChatGPT’s performance in emergency departments, showing that AI overprescribes treatments and tests. Are we on the verge of a healthcare breakthrough or creating more problems? Join Kate Gamble and Sarah Richardson as they explore the risks and potential of AI in emergency care, and what it means for the future of patient outcomes. Can ChatGPT be trusted with life-or-death decisions? Tune in to find out.
Donate: Alex’s Lemonade Stand: Foundation for Childhood Cancer
This transcription is provided by artificial intelligence. We believe in technology but understand that even the smartest robots can sometimes get speech recognition wrong.
📍 Today in Health IT, we're discussing a study which found that chat GPT overprescribes treatments in EDs. This episode is brought to you by SureTest. your health system's application testing and reclaim thousands of hours with SureTest. Visit thisweekhealth. com backslash suretest to learn more.
My name is Kate Gamble, editor at This Week Health, where we host a set of channels and events dedicated to transforming healthcare, one connection at a time. I've spent the last 12 years interviewing CXOs, and I'm excited to bring that knowledge into this community of leaders. Today we're discussing the study which found that chat GPT overprescribes treatments in emergency departments.
And I am joined by Sarah Richardson, President of This Week Health 229 Executive Development Community. Sarah, welcome to the show. Happy Friday, Kate. Glad to be here. Thanks, I am glad to have you. And it's funny, we are at that Very interesting time of year. It's glorious time of year where we have baseball playoffs, you have football, you have college football, even hockey is starting.
So for sports fans like us this is a good time, right? It's a really good time, especially when you have the problem of having to watch two different leagues on two different TVs or have a laptop open and the TV open so you can catch both games at the same time. Definitely problems that I enjoy solving during this time of year.
And, even though my New York Giants appear hopeless, it's not actually hopeless yet. We're still in that, zone. So yay for that. I'm a 49er fan and I'll tell you that the fact that we tend to do relatively well yet have not won a Super Bowl since before Brock Purdy was born. It's just those moments that you realize it's been a hot minute.
It has, but you never know what's going to happen. And so with, that in mind, we're going to talk today about a UCSF study, reveals that CHAT GPT tends to over prescribe in emergency care, recommending unnecessary tests and treatments like x rays and antibiotics. Chat GPT 4 was 8 percent less accurate than resident physicians, while Chat GPT 3. 5 lagged by 24%. And this over prescribing is said to be due to CHAT GPT's general medical training. And this suggests that while AI has potential in clinical settings, which we've heard for quite some time now, it requires further fine tuning to prevent unnecessary interventions in emergency care.
First off, just your thoughts on those results. Are you surprised by this? Those are some pretty significant numbers. It is, but I'll tell you in an emergency room setting, it's not necessarily surprising to me yet. You think about a hybrid system of Human judgment and quote unquote, chat GPT accuracy or large language model accuracy being tested.
It's a place to continue to refine some of those capabilities, but I'll tell you as a CIO with all clinical caution in mind, AI tools, while promising, have to have further refinement before widely trusted in a critical healthcare environment, like emergency care. It is not where I am going to start. with AI in my organization, but it's definitely a place that I will be curious about how I apply learnings and the constant rigorous evaluation of integration and AI into a healthcare setting that's going to be top of mind for me.
, and as we've seen through a lot of our interviews, many organizations have chosen to start, using AI in areas like inbox management, or for things like appointment reminders, appointment rescheduling, and easing in almost, which really makes sense to me. It sounds like a good approach, especially considering that Ambient listening.
These things are still fairly new. They are new. And what's super fascinating as you allow the models to be trained to become more accurate and having the whole perspective of, again, the human in the loop is you realize, would I recommend these tests? No, they're unnecessary. Would I prescribe these medications?
No, I don't want to have over prescribing. Is this the right time to admit the patient? Perhaps yes. Do I put them in a transfer hold or do I move them over to an inpatient setting? Really thinking about what that right window is. And there's really important tactics to think about if you're looking for accuracy AI models that exist in healthcare today.
We know we need diverse data training. If you're training models using diverse and representative data sets to minimize bias and ensure accurate recommendations across various populations, that's going to be key. An ED in a smaller town or a more condensed city is going to have a very different basis to draw information from than in a highly transient or highly Visited area by tourists, as an example.
Going back to the human in the loop, that constant human oversight where medical professionals are validating AI is going to be important, especially before implementation. Always auditing AI models for bias by evaluating outputs. And that can be bias in, again, admissions, treatment, etc. And then what does that continuous learning going to look like?
If we know we are dabbling in spaces that are safer, chart prep and inbox management, then how are we looking at it to flag errors and refine the model in an emergency setting? And then, Hey, let's be transparent about all these algorithms and allowing for review and understanding of how decisions are made, perhaps we find out that we're going to implement all these strategies.
And we can leverage AI for so much of what occurs in our organization. And we heard recently, hey, if you don't automate 80 percent of the tasks that can be automated in the next five years, your organization's at risk for going out of business. Maybe the emergency department is a place where the accurate and equitable patient care is not where we implement some of these models.
We use it in a much smaller format, or we don't use it in the ED, because The fact that it's such a highly charged and highly changing environment, it can't be as predictable as some of the other workflows that we see every day in our hospitals. Yeah, really a lot to chew on there. Yes. But one of the things that came to mind for me was that when we first started to hear about the da Vinci robot, there was that fear, understandably.
But as you can see, they have not replaced. Actual human surgeons. I feel like there's room for some kind of hybrid thinking maybe more than we're used to, but this isn't a chat GPT or no chat GPT scenario, or at least I don't think that's the solution.
I agree as well. Go, low risk. And honestly, how many decisions are you making in your career, in your life, in your health care? Where you don't start with the low risk opportunity first. Administrative tasks, patient reminders, non critical diagnostics. If you're using it for something like dermatology imaging or radiology, and you're saving the complex settings like emergency care and surgical decision making for more rigorous testing and human oversight due to the potential that harm that could be, due to the potential harm that could occur you have an opportunity here to train and refine and gradually Expand into higher stakes environments.
The important thing to think about here is you always have the choice of whether or not you want to use it. And that's just going to be all about that governance process within your own organization. And the importance that you place on the accuracy of that data in decision making across the continuum of your healthcare organization.
And to make, because it's Friday and we have to have a little fun, In a far less serious scenario, couldn't we use things like this in baseball? How many times have we seen strike calls that are so far outside that it makes you angry ? In my case, yes. You and I have had our teams playing against one another in the playoffs, and those moments where you're like, leap off the sofa and you're like, that was not a strike.
And it was a very philosophical conversation for us in our household and realizing that. Okay, if it's just between a strike and a ball, then yes, the automated ball strike system is going to be more consistent. But here's where it starts to really matter.
Human umpires excel in interpreting nuanced roles and managing the flow of the game, where robots lack adaptability. And this is what you heard me say earlier, that any hybrid system that can combine human judgment and robotic accuracy needs to be thoughtful and needs to be tested. So you may find that there is the machine to back up the decision.
And in those critical moments, the human is still going to prevail. Absolutely agree. And it may seem silly, maybe, but there are parallels. in life that just do come up and AI, chat GPT, these things are finding their way into so many aspects of our lives. So it's only natural, I think, to think about it in these terms.
And think about it. You have 12 year old twins. If you're going to talk to them about AI and robotics and accuracy in something that is interesting and approachable to them, would they want to talk about emergency department management with you tonight? Or would they want to talk about baseball with you tonight?
Yeah, that's not even close. So you create spaces where information that's being shared and learned is something that is taught in the environment setting that is best received by somebody, then what better opportunity to think about the criticality of trusting AI than during the playoffs? Lifelong baseball fan, I could not agree more.
And I love the fact that we're talking about this in terms of emergency room management, because that's often the front door to your hospital, because it's where people go when they have critical need. If you're going to be over prescribing or having too many tests run or determining if someone should be admitted, those all have serious implications on both the organization and the patient.
And so making sure that both parties can trust how and why decisions are made, that's something that's not going to go away. That's where the patient is still going to look to the judgment and expertise of their physician, above all else. And a shout out to UCSF for doing this study. I think that's It's really great to see this information. I think we're going to start to see more, but the more the better, so we can start to compare these numbers and see what we're dealing with, especially as this becomes more pervasive.
100 percent academic medical centers doing this type of research shows the intentionality by which they're making all of the decisions when it comes to patient care. And You're grateful that they may trailblaze and pioneer some of these conversations so that others that don't have the reach or breadth of depth ability to do some of this, they can learn from those academic medical centers that will trickle throughout healthcare.
All right. Thank you once again, Sarah, for joining me and giving us your thoughts. I always love our Friday dialogue, for sure. Me too. 📍 Don't forget to share this podcast with a friend or colleague. Use it as a foundation for daily or weekly discussions on the topics that are relevant to you and the industry.
They can subscribe wherever you listen to podcasts. Thank you. And happy Friday.