Research Spotlight: Head-to-Head Comparisons of Generative Artificial Intelligence and Internal Medicine Physicians
Mass General Brigham
|
Contributed by: Drex DeFord
Summary
Daniel Restrepo, MD, a physician at Massachusetts General Hospital, conducted two studies comparing the diagnostic reasoning abilities of large language models (LLMs) to human physicians. Published in *JAMA Internal Medicine* and the *Journal of Hospital Medicine*, the research explored whether AI could reduce clinical reasoning errors that often lead to misdiagnoses. The first study compared an LLM and a human doctor in diagnosing a real medical case, revealing that the AI was slower to integrate new data points but arrived at the correct diagnosis. The second study assessed GPT-4 against resident and attending physicians, noting the AI's comparable reasoning abilities but also its tendency toward verbosity and higher instances of incorrect reasoning. The findings suggest LLMs could potentially augment, but not replace, human clinicians, highlighting the need for further study and careful implementation to address biases and data safety.