<- Back to Insights
September 24, 2023
Reproducibility of GPT4 in Healthcare - An Experiment
Summary
Bill Russell discusses generative AI's reliability in healthcare. Using OpenAI's function calling in webapp: gptexperiments.patient.dev, he tested triage nurse blurbs against top five likely conditions. Despite the same input, the AI resulted in varying answers. A 98% majority of diagnoses related to gallbladder disease. The Jaccard similarity of different runs was 3.41%, indicating a moderate variation in generated diagnoses. Russell sees function calling's potential for healthcare apps but worries about variation, though believes imperfect results can still be useful. Experiment repeated with zero temperature showed less variation.
Explore Related Topics