This Week Health
Alex's Lemonade Stand This Week Health
SUBSCRIBE NOW to receive top 7 stories daily to your inbox
<--  All Stories

Large Language Models Are Poor Medical Coders — Benchmarking of Medical Code Querying

May 6, 2024
NEJM AI
Summary
The study, titled "Large Language Models Are Poor Medical Coders — Benchmarking of Medical Code Querying," conducted by Ali Soroush et al., investigates the performance of various large language models (LLMs) including GPT-3.5, GPT-4, Gemini Pro, and Llama2-70b Chat in the task of medical code querying. The research demonstrates that these LLMs are generally ineffective at accurately generating medical billing codes such as ICD-9-CM, ICD-10-CM, and CPT from descriptions, with even the best performing model, GPT-4, failing to achieve high accuracy. Factors such as code frequency, brevity of code descriptions, and exactness of match were analyzed to understand performance disparities. The findings suggest that these LLMs, in their current state, are unreliable for medical coding tasks, often producing imprecise or entirely fabricated codes, which could undermine medical billing and record-keeping if used in clinical settings without further dedicated research and refinement.
Transform Healthcare - One Connection at a Time

© Copyright 2024 Health Lyrics All rights reserved