This Week Health

Interviews in Action

More
This Week Health is a series of IT podcasts dedicated to healthcare transformation powered by the community

What would you like to learn about today?

Error!
No 'SiteSearch360Widget' widget registered in this installation.

Latest Episodes
View All
Popular Episodes
R25 - Podcasts Category Filter-2
  • All
  • Leadership (729)
  • Emerging Technology (583)
  • Security (334)
  • Patient Experience (311)
  • Interoperability (310)
  • Financial (308)
  • Analytics (185)
  • Telehealth (178)
  • Clinician Burnout (172)
  • Digital (168)
  • AI (167)
  • Legal & Regulatory (146)
  • Cloud (92)
View All
In the News

Beyond Hype: Getting the Most Out of Generative AI in Healthcare Today

October 6, 2023

While Covid-19 may no longer be dominating the global news cycle, healthcare providers and payers are still feeling its reverberations. More than half of US hospitals ended 2022 with a negative margin, marking the most difficult financial year since the start of the pandemic.

CEOs and CFOs remember the challenges all too well: The Omicron surge halted nonurgent procedures in the first half of the year, government support tapered off, and labor expenses ballooned amid staffing shortages. There was also the record-high inflation that continues to intensify margin pressures today. According to a recent Bain survey of health system executives, 60% cite rising costs as their greatest concern.

Payers and providers are now on the hunt for margin improvements. In our experience, the most successful companies won’t merely reduce costs, but also ramp up productivity. When done right, modest technology investments can accomplish both.

Artificial intelligence (AI) may hold part of the answer. With the costs to train a system down 1,000-fold since 2017, AI provides an arsenal of new productivity-enhancing tools at a low investment.

Many executives recognize the growing opportunity, especially with the recent rise of generative AI, which uses sophisticated large language models (LLMs) to create original text, images, and other content. It’s inspiring an explosion of ideas around use cases, from reviewing medical records for accuracy to making diagnoses and treatment recommendations.

Our survey reveals that 75% of health system executives believe generative AI has reached a turning point in its ability to reshape the industry. However, only 6% have an established generative AI strategy.

It’s time to play offense—or be forced to play defense later. But choosing from the laundry list of generative AI applications is daunting. Companies are at high risk of overinvesting in the wrong opportunities and underinvesting in the right ones, undermining future profitability, growth, and value creation. A wait-and-see approach is a tempting prospect.

However, we believe the next generation of leading healthcare companies will start today, with highly focused, low-risk use cases that boost productivity and cost efficiency. Over the next three to nine months, these companies will improve margins and learn how to implement a generative AI strategy, building up the funds and experience needed to invest in a more transformative vision.

Endless potential—and high hurdles 

The excitement around generative AI may feel akin to the hype around other recent digital and technology developments that never quite rose to their promised potential. Well-intentioned, well-informed individuals are debating how much change will truly materialize in the next few years. While developments over the past six months have been a testament to the breakneck speed of change, nobody can accurately predict what the next six months, year, or decade will look like. Will new players emerge? Will we rely on different LLMs for different use cases, or will one dominate the landscape?

Despite the uncertainty, generative AI already has the power to alleviate some of providers’ biggest woes, which include rising costs and high inflation, clinician shortages, and physician burnout. Quick relief is critical, considering that the heightened risk of a recession will only compound margin pressures, and the US could be short 40,800 to 104,900 physicians by 2030, according to the Association of American Medical Colleges.

Many health systems are eyeing imminent opportunities to reduce administrative burdens and enhance operational efficiency. They rank improving clinical documentation, structuring and analyzing patient data, and optimizing workflows as their top three priorities (see Figure 1).

Some generative AI applications are already streamlining administrative tasks and allowing thinly stretched physicians to spend more time with patients. For instance, Doximity is rolling out a ChatGPT tool that can draft preauthorization and appeal letters. HCA Healthcare partnered with Parlance, a conversational AI-based switchboard, to improve its call center experience while reducing operators’ workload. And there are new announcements seemingly every week: Consider how healthcare software company Epic Systems is incorporating ChatGPT with electronic health records (EHRs) to draft response messages to patients, or how Google Cloud is launching an AI-enabled Claims Acceleration Suite for prior authorization processing. 

These applications only scratch the surface of potential. In the future, generative AI could profoundly transform care delivery and patient outcomes. Looking ahead two to five years, executives are most interested in predictive analytics, clinical decision support, and treatment recommendations (see Figure 2).

It’s hard not to catch AI “fever.” But there are real challenges ahead. Some are already tackling the biggest questions: Organizations such as Duke Health, Stanford Medicine, Google, and Microsoft have formed the Coalition for Health AI to create guidelines for responsible AI systems. Even so, solutions to the greatest hurdles aren’t yet keeping up with the rapid technology development. Resource and cost constraints, a lack of expertise, and regulatory and legal considerations are the largest barriers to implementing generative AI, according to executives (see Figure 3).

Even when organizations can overcome these hurdles, one major challenge remains: focus and prioritization. In many boardrooms, executives are debating overwhelming lists of potential generative AI investments, only to deem them incomplete or outdated given the dizzying pace of innovation. These protracted debates are a waste of precious organizational energy—and time. 

Starting small to win big 

Setting the bar too high is setting up for failure. It’s easy to get caught up, betting big on what seems like the greatest opportunity in the moment. But 12 months later, leaders often find themselves frustrated that they haven’t seen results or feeling as if they’ve made a misplaced bet. Momentum and investments slow, further hindering progress. 

Leading companies are forming a more pragmatic strategy that considers current capabilities, regulations, and barriers to adoption. Their CEOs and CFOs work together to enforce four guiding principles: 

  • Pilot low-risk applications with a narrow focus first. Tomorrow’s leaders are making no-regret moves to deliver savings and productivity enhancements in short order—at a time when they need it most. Gaining experience with currently available technology, they are testing and learning their way to minimum viable products in low-risk, repeatable use cases. These quick wins are typically in areas where they already have the right data, can create tight guardrails, and see a strong potential return on investment. Some, like call center and chatbot support, can improve the patient experience. However, given the current challenges around regulation and compliance, the most successful early initiatives are likely to be internally focused, such as billing or scheduling. Most importantly, executives prioritize initiatives by potential savings, value, and cost.
  • Decide to buy, partner, or build. CEOs will need to think about how to invest in different use cases based on availability of third-party technology and importance of the initiative.
  • Funnel cost savings and experience into bigger bets. As the technology matures and the value becomes clear, companies that generate savings, accumulate experience, and build organizational buy-in today will be best positioned for the next wave of more sophisticated, transformative use cases. These include higher-risk clinical activities with a greater need for accuracy due to ethical and regulatory considerations, such as clinical decision support, as well as administrative activities that require third-party integration, such as prior authorization.
  • Remember generative AI isn’t a strategy unto itself. To build a true competitive advantage, top CEOs and CFOs are selective and discerning, ensuring that every generative AI initiative reinforces and enables their overarching goals.

Some health systems are already seeing powerful results from relatively small, more practical investments. For instance, recognizing that clinicians were spending an extra 130 minutes per day outside of working hours on administrative tasks, the University of Kansas Health System partnered with Abridge, a generative AI platform, to reduce documentation burden. By summarizing the most important points from provider-patient conversations, Abridge is improving the quality and consistency of documentation, getting more patients in the door, and cutting down on pervasive physician burnout.

Although it will require some upfront investment, in the long run it will be more costly to underestimate the level and speed at which generative AI will transform healthcare. The next generation of leaders will start testing, learning, and saving today, putting them on a path to eventually revolutionize their businesses.

Read More

8 Practical Predictions For The Near Future Of Healthcare

October 5, 2023

8 Practical Predictions For The Near Future Of Healthcare

Explore the 8 transformative predictions for healthcare’s near future, driven by technology and evolving knowledge. Dive into how patient-centric care, global health networks, remote care, patient design, tech giants’ entry, cultural shifts, personalized medicine, and AI-integrated medical teams are reshaping healthcare landscapes.

Andrea Koncz 11 min |

3 October 2023

Key Takeaways The healthcare sector is evolving towards a patient-centric model, leveraging digital technologies to shift care from traditional hospital settings to patients’ surroundings.

The entry of tech giants and the integration of Artificial Intelligence (AI) are set to significantly enhance diagnostic capabilities and personalize healthcare services.

A cultural transformation is underway, changing traditional patient-physician relationships into collaborative partnerships, further enriched by the emergence of AI in medical teams.

The healthcare sector is at a juncture of significant transformation, fueled by technological advancements and evolving medical knowledge. Over the past few years, we’ve pinpointed eight high-level trends that are set to reshape healthcare delivery profoundly. While we have explored these trends individually in previous posts, we’ve yet to summarize them cohesively in one comprehensive article.

This article aims to bridge that gap, providing a condensed overview beneficial for healthcare professionals and users alike.

1) Patients will become the point of care

In the pursuit of more streamlined and patient-centric healthcare, traditional hospital frameworks are under reassessment. The stereotypical scenes of long waiting lines, overwhelming paperwork, and sterile, uninviting corridors symbolize a dated workflow. The evolution towards modernity beckons a shift from this conventional setup, ushering in an era where patients, armed with digital health tools, become the focal point of care, reducing the dependency on traditional hospital confines.

This transformation is in part driven by the advent and adoption of digital health technologies, notably wearables, telemedicine, and solutions like at-home lab tests, that allow us to carry out procedures at home that formerly were impossible without visiting a healthcare institution.

Patients can now monitor their vital signs regardless of their location, sharing this data with healthcare providers remotely. Although this doesn’t render hospitals obsolete, (we won’t have MRI machines at home), it repositions them as health centers focused on disease prevention, acute care, and specific medical procedures requiring sophisticated equipment.

Although we started our list with this trend, this actually will be the end result of the paradigm shift we witness.

2. Healthcare becomes globalized

Companies such as Atlas Biomed in the UK, Dante Labs in Italy, and AliveCor in the US exemplify the erosion of geographical barriers, offering patients worldwide access to quality digital health services, albeit with some shipping restrictions. The digital health sector not only democratizes healthcare by making patients the point of care with direct-to-consumer services but also augments the doctor-patient relationship with a shared decision-making paradigm.

It’s worth mentioning that digital health can only assist users who have the means to afford it. As many studies have proved, wearables are currently not equally available to everyone around the globe, and in most cases, these devices are out of reach for users with no or limited access to quality healthcare. In other words, digital health is typically unavailable to those who would benefit the most from it. Health equity, however, is not a technological question , but a profound social issue.

The ultimate goal is to achieve affordable and universal access to healthcare, leveraging digital technologies to transcend traditional geographical and systemic barriers, making healthcare truly global. But this is a task that puts more burden on politicians and policymakers than on tech companies.

3) Remote care is the new norm

The evolution of remote care is setting a new standard in the healthcare landscape. Asynchronous telemedicine, a facet of remote care, allows for a more efficient exchange between patients and healthcare providers without the necessity for real-time communication. This form of telemedicine comes with a host of benefits including flexible schedules for healthcare practices, efficient task stacking, accommodating younger patients’ communication preferences, addressing language barriers, and transmitting extensive health data seamlessly. While concerns about impersonal communication and potential delays in response exist, the overarching narrative is about leveraging asynchronous telemedicine to mitigate physician shortages and burnout, which are pressing issues globally.

The COVID-19 pandemic significantly accelerated the adoption of telemedicine and at-home diagnostic kits, showcasing the convenience and accessibility digital health technologies offer. In the linked article, we also envisioned a futuristic scenario with a seamless global network of healthcare providers and the immense potential for patient-centered care. However, this global convergence necessitates updated training for physicians to adeptly handle international data and a new regulatory framework to ensure these technologies’ safe and effective use.

The palpable shift towards remote care isn’t just a technological leap but also a cultural transformation, propelled significantly by the pandemic. The newfound convenience, cost-effectiveness, and time efficiency associated with telemedicine stay with us in the post-pandemic era.

Additionally, remote care also intersects with broader goals like decarbonizing patient pathways . The shift to digital prescriptions, promoting healthy eating, organizing resources with AI, and especially the core of remote monitoring and care, play parts in a larger narrative of fostering a more sustainable and environmentally friendly healthcare model. As remote care integrates further into the healthcare system, the anticipation is not only towards more efficient patient-doctor interactions but also towards a significant reduction in the carbon footprint associated with traditional healthcare practices.

4) Patient design will rule what hospitals look like

The shift from a patient-centric to a patient-designed approach in healthcare is a much-needed evolution, aiming to materially involve patients in decision-making processes. Patient centricity, while a laudable idea, often remained a tokenistic gesture, where decisions were still heavily dominated by medical professionals without genuinely incorporating the patient’s perspective. On the contrary, patient design brings patients into the core of decision-making, reflecting a co-design approach where they are seen as active participants and stakeholders. This approach recognizes that the dated hierarchy, placing medical professionals as the sole custodians of relevant knowledge, no longer holds in the face of today’s information accessibility and technology-driven empowerment.

Patient design is manifested in various facets of healthcare, from product development to research and clinical design. Inspirational stories of individuals like Sarah Olson and Tal Golesworthy underscore the profound impact of patient-driven innovations, which arise from a dire need for practical solutions, often spurred by personal or familial suffering. Unlike traditional research teams focused on broad theories, these patient innovators are driven by a pressing need for immediate solutions. This approach leads to pragmatic, real-world applications that may not only alleviate the suffering of individuals and their loved ones but also significantly benefit others facing similar health challenges.

Furthermore, the integration of patient design extends to research priorities and clinical environments. For instance, when research is reshaped to address patients’ urgent needs , it bridges the existing gap between scientific advancements and public health outcomes.

Additionally, involving patients in clinical design, as the Netherlands’ Radboud University Medical Centre Nijmegen did, can redefine the healing environment to foster better patient-physician relationships and potentially expedite recovery. Patient design, therefore, is not merely a conceptual shift but a pragmatic approach to infusing healthcare with real-life perspectives and needs, making the field more responsive, humane, and effective.

5) Tech giants will join the healthcare ecosystem

The tech giants like Google, Amazon, Apple, Microsoft, NVIDIA, and IBM are rigorously exploring healthcare as a new frontier for innovation and business expansion. With a long-term vision, they are integrating technology to solve complex health challenges and improve healthcare delivery.

The core of this convergence lies in the integration of Artificial Intelligence (AI), machine learning, cloud computing , and other cutting-edge technologies. These are being employed across a myriad of applications such as diagnostic imaging, patient monitoring, and disease prediction, aiming to bolster patient care while mitigating costs.

This merger of tech and healthcare is also marked by a consumer-centric approach, as witnessed through the advent of wearables and health applications. These innovations are rendering healthcare more accessible and personalised, empowering individuals to monitor and manage their health seamlessly.

Tech companies entering the sector will lower the threshold for accessible diagnostics. While many can’t afford to spend $150-200 on a dermatologist consultation to check a new birthmark, they may spend a few dollars in an app to have it checked – and thus have a better chance to catch a malignant lesion in time.

However, this convergence is not without its challenges. Navigating through the complex labyrinth of healthcare regulations remains a significant hurdle. The paramount concern revolves around the privacy and security of health data , necessitating stringent compliance with healthcare standards and laws.

6) The cultural transformation changes the roles

The paradigm shift we witness is not just technological but also profoundly cultural, morphing traditional patient-physician relationships. The healthcare industry is moving away from a paternalistic, hierarchical model towards a more partnership-driven paradigm. In this new landscape, physicians transition from being the sole custodians of medical knowledge to guides navigating patients through a complex jungle of data, decisions, and information. This transformation underscores a democratization of healthcare knowledge, enabled by the vast swathes of information and digital tools now available at the fingertips of both medical practitioners and the populace.

On the flip side, patients evolve from being passive stakeholders, awaiting symptoms to manifest before seeking medical intervention, to proactive, empowered individuals. The modern patient is more informed, more engaged with their health, and desires a collaborative relationship with their healthcare providers. Wearable tech, health apps, and online platforms provide real-time data and a plethora of information, allowing individuals to monitor their health metrics continually, anticipate potential issues, and seek timely advice.

The infusion of technology acts as a significant catalyst in this cultural transformation, changing the way healthcare is perceived, accessed, and delivered.

7) Access to data makes healthcare personalised

Personalised medicine is tailored to the individual characteristics, needs, and preferences of patients. This form of medicine leverages advancements in technology, genomics, and data analytics to offer more precise, predictable, and preventive healthcare.

For instance, genetic testing, a cornerstone of personalized medicine, can unveil an individual’s susceptibility to certain diseases, their likely response to various treatments, and the potential risk of adverse reactions to specific drugs . This level of insight empowers clinicians to design treatment regimens that are aligned with each individual’s unique genetic makeup, significantly enhancing the efficacy and safety of medical interventions.

As we venture deeper into the realm of personalized medicine, the concept of digital twins emerges. Digital twins are virtual replicas of an individual’s physiological and genetic profile. These present a platform to simulate the effects of various treatments in a risk-free, virtual environment.

These digital replicas can allow for a meticulous analysis of treatment outcomes. For instance, in the foreseeable future, a digital twin could simulate the response of a cancer patient to different drug combinations, offering invaluable insights into the most effective and least toxic treatment strategy for that individual. This form of virtual trial-and-error can significantly accelerate the identification of optimal treatment protocols, minimizing the physical, emotional, and financial burdens traditionally associated with the trial-and-error nature of conventional medicine.

The marriage of genetic insights with innovative technologies fundamentally transforms the healthcare landscape. By embracing the tenets of personalised medicine, healthcare providers can offer more targeted, effective, and safe medical care, where the right treatment is delivered to the right patient at the right time.

8) New medical teams arise

The paradigm shift in healthcare towards a more inclusive and collaborative approach redefines the traditional medical team’s construct. Gone are the days when the medical professional was the sole custodian of a patient’s health journey. In the contemporary scenario, the medical team is a confluence of healthcare professionals, the patient, and increasingly, artificial intelligence. The rationale is simple yet profound: who could be more invested in a patient’s well-being than the patient themselves? Empowering individuals to take the reins of their health not only democratizes healthcare but also cultivates a richer, more informed dialogue.

Initially, the notion of patients morphing into active participants in their healthcare may evoke apprehensions among both medical professionals and patients. However, the continuously expanding corpus of medical knowledge renders it virtually impossible for any physician, regardless of their dedication, to single-handedly stay abreast of every medical advancement. Contrastingly, an engaged patient or their kin can delve into the depths of specific medical domains, unearthing contemporary studies, technologies, or emerging treatment protocols relevant to their health conditions. Though not replacements for medical professionals, these informed individuals can enhance the decision-making process.

As we tread further along this trajectory, artificial intelligence (AI) emerges as the newest entrant in this expanding medical team. With an innate capacity to sift through vast swaths of data, discern patterns, and provide insights, AI serves as a force multiplier in the medical domain. Algorithms, designed to assist in diagnostics, treatment planning, and monitoring are making headway, and we can rest assured that various AI models will become safer and more efficient in assisting medical work.

Read More

Creation and Adoption of Large Language Models in Medicine

October 3, 2023

[Skip to Navigation]

 full text icon

Full Text

 contents icon

Contents

 figure icon

Figures / Tables

 references icon

References

 related icon

Related

Download PDF

Top of Article

Abstract

Introduction

Are the LLMs Being Trained With the Relevant Data and the Right Kind of Self-Supervision?

Are the Purported Value Propositions of Using LLMs in Medicine Being Verified?

Conclusion

Article Information

References

Figure.  An Overview of the Key Issues in Shaping the Creation and Adoption of Large Language Models (LLMs) in Medicine

  

EHR indicates electronic health record. A, During training, LLMs learn generally useful patterns in large amounts of unlabeled data via self-supervision. For example, a commonly used form of self-supervision is to predict the next word in a sequence conditioned on prior words, as seen in large public datasets. Once an LLM is trained, users can interact with it via submitting an instruction (or prompt), to which the LLM responds with a sequence of words that are its valid completions. B, As is, LLMs are not good at following instructions. The LLMs are adapted for specific tasks by providing examples of instructions (blue) and the expected responses (yellow). This adaptation process is called tuning. In responding to medical instructions, such as “summarize the past specialist visits of a patient,” LLMs require tuning using a set of relevant instructions and their expected responses.

1. Li

R, Kumar

A, Chen

JH.  How chatbots and large language model artificial intelligence systems will reshape modern medicine: fountain of creativity or Pandora’s box?   JAMA Intern Med . 2023;183(6):596-597. doi: 10.1001/jamainternmed.2023.1835 Article Article | PubMed | Google Scholar | Crossref

2. Vaswani

A, Shazeer

N, Parmar

N,

et al. Attention is all you need. In:  Proceedings of the 31st International Conference on Neural Information Processing Systems: NIPS ’17 . Curran Associates Inc; 2017:6000-6010.

3. Zhao

WX, Zhou

K, Li

J,

et al.  A survey of large language models.   arXiv . Preprint posted online March 31, 2023. https://arxiv.org/abs/2303.18223v10 Google Scholar

4. Lee

P, Bubeck

S, Petro

J.  Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine.   N Engl J Med . 2023;388(13):1233-1239. doi: 10.1056/NEJMsr2214184 PubMed | Google Scholar | Crossref

5. Wikipedia contributors. GPT-3. Published May 8, 2023. Accessed July 25, 2023. https://en.wikipedia.org/w/index.php?title=GPT-3&oldid=1153892380

6. OpenAI. Aligning language models to follow instructions. Published January 27, 2022. Accessed May 22, 2023. https://openai.com/research/instruction-following

7. Chui

M, Hazan

E, Roberts

R,

et al. The economic potential of generative AI: the next productivity frontier. Published June 14, 2023. Accessed June 16, 2023. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier

8. Wornow

M, Xu

Y, Thapa

R,

et al.  The shaky foundations of large language models and foundation models for electronic health records.   NPJ Digital Med . 2023;135(6). doi: 10.1038/s41746-023-00879-8 Google Scholar | Crossref

9. Taori

R, Gulrajani

I, Zhang

T, Dubois

Y, Li

X. Stanford Alpaca: code and documentation to train Stanford’s Alpaca models, and generate the data. Accessed June 16, 2023. https://github.com/tatsu-lab/stanford_alpaca

10. Steinberg

E, Jung

K, Fries

JA, Corbin

CK, Pfohl

SR, Shah

NH.  Language models are an effective representation learning technique for electronic health record data.   J Biomed Inform . 2021;113:103637. doi: 10.1016/j.jbi.2020.103637 PubMed | Google Scholar | Crossref

11. Mello

MM, Guha

N.  ChatGPT and physicians’ malpractice risk.   JAMA Health Forum . 2023;4(5):e231938. doi: 10.1001/jamahealthforum.2023.1938 Article Article | PubMed | Google Scholar | Crossref

12. Brynjolfsson

E. The turing trap: the promise and peril of human-like artificial intelligence. In:  Augmented Education in the Global Age . Routledge; 2023:103-116.

An AI-Enhanced Electronic Health Record Could Boost Primary Care Productivity

JAMA |

Viewpoint |

September 5, 2023

This Viewpoint discusses ways that artificial intelligence (AI) may improve the productivity of primary care physicians with easier and more accurate use of AI-enhanced electronic health records.

JAMA Call for Papers for AI in Medicine

JAMA |

Editorial |

September 5, 2023

Large Language Models Answer Medical Questions Accurately, but Can’t Match Clinicians’ Knowledge

JAMA |

Medical News & Perspectives |

September 5, 2023

This Medical News article discusses new research on artificial intelligence systems such as ChatGPT and Med-PaLM.

See More About

Health Informatics

Trending

Call for Research for the 10th International Congress on Peer Review and Scientific Publication

JAMA |

Opinion |

October 3, 2023 |

Evaluating the Application of Large Language Models in Clinical Research Contexts

JAMA Network Open |

Opinion |

October 2, 2023 |

How to Safely Integrate Large Language Models Into Health Care

JAMA Health Forum |

Opinion |

September 21, 2023 |

Select Your Interests

Others Also Liked

AlphaFold2 and its applications in the fields of biology and medicine

Zhenyu Yang et al.,

Signal Transduction and Targeted Therapy, 2023

TCMBank-the largest TCM database provides deep learning-based Chinese-Western medicine exclusion prediction

Qiujie Lv et al.,

Signal Transduction and Targeted Therapy, 2023

Orchestration of cell plasticity by phase separation

Xuebiao Yao,

Journal of Molecular Cell Biology, 2021

Powered by

This Issue

Views

42,666

|

Citations

0

|

Altmetric 117

|

PDF



 

 CME & MOC

Special Communication

| AI in Medicine

August 7, 2023

Creation and Adoption of Large Language Models in Medicine

Nigam H. Shah, MBBS, PhD 1,2,3 ; David Entwistle, BS, MHSA 1 ; Michael A. Pfeffer, MD 1,2

 Author Affiliations

| Article Information

JAMA. 2023;330(9):866-869. doi:10.1001/jama.2023.14217

 editorial comment icon

Editorial Comment

 related articles icon

Related Articles

Editorial

JAMA Call for Papers for AI in Medicine

Rohan Khera, MD, MS; Atul J. Butte, MD, PhD; Michael Berkwits, MD, MSCE; Yulin Hswen, ScD, MPH; Annette Flanagin, RN, MA; Hannah Park; Gregory Curfman, MD; Kirsten Bibbins-Domingo, PhD, MD, MAS

JAMA

Viewpoint

An AI-Enhanced Electronic Health Record Could Boost Primary Care Productivity

Jeffrey E. Harris, MD, PhD

JAMA

Medical News & Perspectives

Large Language Models Answer Medical Questions Accurately, but Can’t Match Clinicians’ Knowledge

Emily Harris

JAMA

Abstract

Importance

There is increased interest in and potential benefits from using large language models (LLMs) in medicine. However, by simply wondering how the LLMs and the applications powered by them will reshape medicine instead of getting actively involved, the agency in shaping how these tools can be used in medicine is lost.

Observations

Applications powered by LLMs are increasingly used to perform medical tasks without the underlying language model being trained on medical records and without verifying their purported benefit in performing those tasks.

Conclusions and Relevance

The creation and use of LLMs in medicine need to be actively shaped by provisioning relevant training data, specifying the desired benefits, and evaluating the benefits via testing in real-world deployments.

Advertisement

Introduction

Large language models (LLMs) and the applications built using them, such as ChatGPT, have become popular. Within 2 months of the November 2022 release, ChatGPT surpassed 100 million users. The medical community has been pursuing off-the-shelf LLMs provided by technology companies. New users have been asking how the LLMs and the chatbots powered by them will reshape medicine. 1 Perhaps the reverse question should be asked: How can the intended medical use shape the training of the LLMs and the chatbots or the other applications they power?

Language models learn the probabilities of occurrence for sequences of words from the corpus of text. For example, if the corpus had the 2 questions of “where are we going” and “where are we at,” the probability is 0.5 for seeing the word going after seeing the 3 words where are we . An LLM is essentially learning such probabilities on a massive scale, such that the resulting model has billions of parameters (a glossary appears in the Box ). In 2017, Vaswani et al 2 demonstrated that a certain kind of deep neural network, called a transformer, could learn LLMs that later performed amazingly well at language translation tasks. Their insight led to the creation of hundreds of language models that were reviewed by Zhao et al. 3

Box.

Glossary Chatbot

A computer program designed to simulate conversation with human users, especially over the internet.

Deep neural network

A setup for machine learning inspired by biological neural networks in which computational units referred to as neurons are arranged in a network that is composed of multiple layers of interconnected neurons, allowing it to learn complex patterns in the data presented to it.

Large language model (LLM)

Learns the probabilities of occurrence of sequences of words from a corpus of text, whose probabilities are learned using textual corpora with trillions of words such that the resulting model has billions of parameters.

Self-supervision

A learning approach in which a machine learning model learns without relying on explicitly labeled data as examples. Instead, the model generates its own training objective from the input data without the need for human-annotated data, which can be time-consuming and expensive to produce. A common type of self-supervision is in the form of an autoregressive training objective, in which the model is trained to predict the next word or token in a sequence, given the previous words or tokens. The training objective is to maximize the likelihood of the correct word given the context. Training in this manner is often the first stage in training LLMs (generative pretrained transformer) and helps the model learn language structure, grammar, and semantics. Learning to predict the next medical code in a patient’s longitudinal medical record does not require a human to label a code as the “next code”; that information is available in the data directly by looking at the sequence of appearance of the code in the medical record.

Transformer

A deep neural network architecture that is designed to be efficient at capturing relationships and dependencies between elements in a sequence, such as words in a sentence.

Instruction tuning

Refers to a kind of tuning in which an existing LLM is adapted (via tuning) to respond accurately and effectively to natural language instructions. This process involves continuing to train the model on a dataset containing pairs of instructions and corresponding desired outputs or responses. Doing so allows the model to be more useful in real-world applications, such as providing relevant information, answering questions, or following specific commands provided by users in a natural language.

Tuning

Refers to the process of adapting a pretrained LLM to perform well on a specific task or domain. This process involves training the model on a smaller labeled dataset that is specific to the target task, such as sentiment analysis, machine translation, or answering questions. For example, in a medical setting, a model could be tuned for tasks such as summarizing the available past medical records of a patient or the course of their current admission. During tuning, the model’s weights and parameters are updated using pertinent examples to optimize its performance on the target task. This allows the model to build on the general language understanding it gained via self-supervised learning, while adapting to the nuances and specific requirements of the task at hand.

Although language models are trained to predict the next word in a sentence (basically an advanced autocomplete), new capabilities (such the ability to summarize text and answer questions posed in natural language) become possible without explicitly training for them, which allow the model to perform tasks such as pass medical licensing examinations, simplify radiology reports, extract drug names from a physician’s note, reply to patient questions, summarize medical dialogues, and write histories and physical assessments. 4 ChatGPT, perhaps the most popular application, uses an LLM called a generative pretrained transformer (GPT; version 3.5 or 4.0) underneath to ingest text and output text in response.

The creation of language models capable of such diverse tasks hinges on 2 things. First is the ability to learn generally useful patterns in large amounts of unlabeled data via self-supervision (training and interacting with an LLM in the Figure ). For example, a commonly used form of self-supervision is to predict the next word in a sequence conditioned on prior words, which later identifies the words that go together in general. The GPT-3 model was trained on 45 terabytes of text data comprising roughly 500 billion tokens (1 token is approximately 4 characters or three-fourths of a word for English text) at a cost of approximately $4.6 million. 5

Second is the subsequent tuning of the LLM to generate responses aligned with human expectations via instruction tuning. For example, in response to the request, “explain the moon landing to a 6-year-old in a few sentences,” the GPT-3 model suggested possible completions as “explain the theory of gravity to a 6-year-old” and “explain the big bang theory to a 6-year-old” (instruction tuning an LLM in the Figure ). Users helped train GPT-3 by providing the instructions (also called prompts) for which the labelers (hired by OpenAI, the company that built GPT-3) provided demonstrations of the desired output and ranked the outputs from the model. OpenAI used these pairs of instructions and their desired outputs to instruction tune GPT-3. 6

Although general-purpose LLMs can perform many medically relevant tasks, they have not been exposed to medical records during self-supervised training and they are not specifically instruction tuned for any medical task. By not asking how the intended medical use can shape the training of LLMs and the chatbots or other applications they power, technology companies are deciding what is right for medicine. The medical profession has made a mistake in not shaping the creation, design, and adoption of most information technology systems in health care. Given the profound disruption that is possible for such diverse activities as clinical documentation, decision support, information technology operations, medical coding, and patient-physician communication with the use of LLMs (estimated in a McKinsey report to be as high as 1.8%-3.2% of total health care revenues 7 ), the same mistake cannot be repeated. At a minimum, the medical profession should be asking the following questions.

Are the LLMs Being Trained With the Relevant Data and the Right Kind of Self-Supervision?

Medical records can be viewed as consisting of sequences of time-stamped clinical events represented by medical codes and textual documents, which can be the training data for a language model. Wornow et al 8 reviewed the training data and the kind of self-supervision used by more than 80 medical language models and found 2 categories.

First, there are medical LLMs that are trained on documents. The self-supervision is via learning to predict the next word in a textual document, such as a progress note or a PubMed abstract, and conditioned on prior words seen. Therefore, these models are similar in their anatomy to general purpose LLMs (eg, GPT-3), but are trained on clinical or biomedical text. These models can be used for language manipulation tasks such as summarization, translation, and answering questions. Given the increased training and use costs of LLMs, it is necessary to investigate whether smaller language models trained on relevant data may achieve the desired performance at a lower cost. For example, researchers at the Center for Research on Foundation Models at Stanford University created a model called Alpaca with 4% as many parameters as OpenAI’s text-davinci-003, matching its performance at a cost of $600 to create. 9

Second, there are medical LLMs that are trained on the sequence of medical codes in a patient’s entire record that take time into account. Here, the self-supervision is in the form of learning the probability of the next day’s set of codes, or learning how much time elapses until a certain code is seen. As a result, the sequence and timing of medical events in a patient’s entire record is considered. As a concrete example, given the code for “hypertension,” these models learn when a code for a stroke, myocardial infarction, or kidney failure is likely to occur. When provided with a patient’s medical record as input, such models will not output text but instead a machine understandable “representation” of that patient, referred to as an “embedding,” which is a fixed-length, high-dimensional vector representing the patient’s medical record. Such embeddings can be used in building models for predicting 30-day readmissions, long hospital lengths of stay, and in-patient mortality using less training data (as few as 100 examples). 10

The medical community needs to actively shape the creation of LLMs in medicine. For example, given the importance of instruction tuning, the medical community should be discussing how to create shared instruction tuning datasets with examples of prompts to be fulfilled, such as “summarize the past specialist visits of a patient” with its corresponding valid completion ( Figure ). Perhaps instead of using GPT-4 at the cost of $0.06 to $0.12 per 1000 tokens (about 75 words), health care systems should be training shared, open-source models using their own data. The technology companies should be asked whether the models being offered have seen any medical data during training and whether the nature of self-supervision used is relevant for the final use of the model.

Are the Purported Value Propositions of Using LLMs in Medicine Being Verified?

Current evaluations of LLMs also do not quantify the benefits of novel collaboration between humans and artificial intelligence that is at the core of using these models in clinical settings. The methods for evaluating LLMs in the real world remain unclear. Concerns with current evaluations range from training dataset contamination (such as when the evaluation data are included in the training dataset) to the inappropriateness of using standardized examinations designed for humans to evaluate the models. Consider the analogy of evaluating a person for a driver’s license. The person takes a multiple-choice, knowledge-based test. The car, meanwhile, undergoes safety tests during manufacturing, some of which are regulated by the government. Then the person gets in the car for a road test to certify them for a license. The car does not take a multiple-choice test at the department of motor vehicles or get certified for driving, but that is the absurdity tolerated for LLMs when it is declared that they are certified to give medical advice because they passed the US medical licensing examination.

The purported benefits need to be defined and evaluations conducted to verify such benefits. 8 Only after these evaluations are completed should statements be allowed such as an LLM was used for a defined task in this specific workflow, it measured a metric, and observed an improvement (or deterioration) in a prespecified outcome. Such evaluations also are necessary to clarify the medicolegal risks that might occur with the use of LLMs to guide medical care, 11 and to identify mitigation strategies for the models’ tendency to generate factually incorrect outputs that are probabilistically plausible (called hallucinations).

Conclusion

The building of relevant medical LLMs needs to be balanced with verifying the presumed value propositions via testing in real-world deployments akin to road driving tests. If the goal in using such models is to augment human judgment, and not replace it, adopting this driving test mindset is critically important. Otherwise, there is a risk of falling into the trap of automating tasks that individuals already know how to do, and failing to ask the question of what a person plus such models could do together that may yield better medical care. 12

Given the highly disruptive potential of these technologies, clinicians cannot afford to be on the sidelines. The adoption of LLMs in medicine needs to be shaped by the medical profession that can identify the right training (and instruction tuning) data and perform the evaluations that verify the purported benefits of using LLMs in medicine.

Back to top

Article Information

Accepted for Publication: July 11, 2023.

Published Online: August 7, 2023. doi: 10.1001/jama.2023.14217

Corresponding Author: Nigam H. Shah, MBBS, PhD, Center for Biomedical Informatics Research, Stanford University, 3180 Porter Dr, Palo Alto, CA 94305 ( nigam@stanford.edu ).

Author Contributions: Dr Shah had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: All authors.

Drafting of the manuscript: Shah, Pfeffer.

Critical review of the manuscript for important intellectual content: All authors.

Administrative, technical, or material support: Entwistle, Pfeffer.

Supervision: Pfeffer.

Conflict of Interest Disclosures: Dr Shah reported being a co-founder of Prealize Health (a predictive analytics company) and Atropos Health (an on-demand evidence generation company). No other disclosures were reported.

Additional Contributions: We acknowledge the members of the data science team at Stanford Health Care for helpful discussions to refine the arguments made in this article. We acknowledge Jason Fries, PhD, and Alison Callahan, PhD (both with Stanford University), for help in creating the first draft of the Figure; they were not compensated for their contributions.

References

1. Li

R, Kumar

A, Chen

JH.  How chatbots and large language model artificial intelligence systems will reshape modern medicine: fountain of creativity or Pandora’s box?   JAMA Intern Med . 2023;183(6):596-597. doi: 10.1001/jamainternmed.2023.1835 Article Article | PubMed | Google Scholar | Crossref

2. Vaswani

A, Shazeer

N, Parmar

N,

et al. Attention is all you need. In:  Proceedings of the 31st International Conference on Neural Information Processing Systems: NIPS ’17 . Curran Associates Inc; 2017:6000-6010.

3. Zhao

WX, Zhou

K, Li

J,

et al.  A survey of large language models.   arXiv . Preprint posted online March 31, 2023. https://arxiv.org/abs/2303.18223v10 Google Scholar

4. Lee

P, Bubeck

S, Petro

J.  Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine.   N Engl J Med . 2023;388(13):1233-1239. doi: 10.1056/NEJMsr2214184 PubMed | Google Scholar | Crossref

5. Wikipedia contributors. GPT-3. Published May 8, 2023. Accessed July 25, 2023. https://en.wikipedia.org/w/index.php?title=GPT-3&oldid=1153892380

6. OpenAI. Aligning language models to follow instructions. Published January 27, 2022. Accessed May 22, 2023. https://openai.com/research/instruction-following

7. Chui

M, Hazan

E, Roberts

R,

et al. The economic potential of generative AI: the next productivity frontier. Published June 14, 2023. Accessed June 16, 2023. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier

8. Wornow

M, Xu

Y, Thapa

R,

et al.  The shaky foundations of large language models and foundation models for electronic health records.   NPJ Digital Med . 2023;135(6). doi: 10.1038/s41746-023-00879-8 Google Scholar | Crossref

9. Taori

R, Gulrajani

I, Zhang

T, Dubois

Y, Li

X. Stanford Alpaca: code and documentation to train Stanford’s Alpaca models, and generate the data. Accessed June 16, 2023. https://github.com/tatsu-lab/stanford_alpaca

10. Steinberg

E, Jung

K, Fries

JA, Corbin

CK, Pfohl

SR, Shah

NH.  Language models are an effective representation learning technique for electronic health record data.   J Biomed Inform . 2021;113:103637. doi: 10.1016/j.jbi.2020.103637 PubMed | Google Scholar | Crossref

11. Mello

MM, Guha

N.  ChatGPT and physicians’ malpractice risk.   JAMA Health Forum . 2023;4(5):e231938. doi: 10.1001/jamahealthforum.2023.1938 Article Article | PubMed | Google Scholar | Crossref

12. Brynjolfsson

E. The turing trap: the promise and peril of human-like artificial intelligence. In:  Augmented Education in the Global Age . Routledge; 2023:103-116.

JAMA ®

JAMA Network ™

Help



Get the latest from JAMA

Sign Up

Privacy Policy | Terms of Use

© 2023 American Medical Association. All Rights Reserved.

Terms of Use |

Privacy Policy |

Accessibility Statement |

Cookie Settings

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_~

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_~

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_~

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_~

Read More

Where health systems are boosting tech spending

October 3, 2023

CIOs told Becker's that health systems are increasing their spending in revenue cycle management, clinical workflow optimization and patient engagement.

A recent report from KLAS said 80 percent of healthcare organizations increased their technology spending in the past year and that the top three areas where money was spent were those cited by the CIOs. 

Will Landry, senior vice president and CIO of Baton Rouge, La.-based Franciscan Missionaries of Our Lady Health System, said the report was accurate. 

"These three items are core to our IS strategic vision for the foreseeable future," he told Becker's

Michael Williams, vice president and CIO of LMH Health, based in Lawrence, Kan., also agreed with the assessment on where hospitals and health systems are increasing IT spending.

This comes as KLAS indicated that the increase in IT spending is expected to continue to grow as health systems look to keep up with emerging technologies, ease labor shortages and help reduce cost pressures.

Read More

Beyond Hype: Getting the Most Out of Generative AI in Healthcare Today

October 6, 2023

While Covid-19 may no longer be dominating the global news cycle, healthcare providers and payers are still feeling its reverberations. More than half of US hospitals ended 2022 with a negative margin, marking the most difficult financial year since the start of the pandemic.

CEOs and CFOs remember the challenges all too well: The Omicron surge halted nonurgent procedures in the first half of the year, government support tapered off, and labor expenses ballooned amid staffing shortages. There was also the record-high inflation that continues to intensify margin pressures today. According to a recent Bain survey of health system executives, 60% cite rising costs as their greatest concern.

Payers and providers are now on the hunt for margin improvements. In our experience, the most successful companies won’t merely reduce costs, but also ramp up productivity. When done right, modest technology investments can accomplish both.

Artificial intelligence (AI) may hold part of the answer. With the costs to train a system down 1,000-fold since 2017, AI provides an arsenal of new productivity-enhancing tools at a low investment.

Many executives recognize the growing opportunity, especially with the recent rise of generative AI, which uses sophisticated large language models (LLMs) to create original text, images, and other content. It’s inspiring an explosion of ideas around use cases, from reviewing medical records for accuracy to making diagnoses and treatment recommendations.

Our survey reveals that 75% of health system executives believe generative AI has reached a turning point in its ability to reshape the industry. However, only 6% have an established generative AI strategy.

It’s time to play offense—or be forced to play defense later. But choosing from the laundry list of generative AI applications is daunting. Companies are at high risk of overinvesting in the wrong opportunities and underinvesting in the right ones, undermining future profitability, growth, and value creation. A wait-and-see approach is a tempting prospect.

However, we believe the next generation of leading healthcare companies will start today, with highly focused, low-risk use cases that boost productivity and cost efficiency. Over the next three to nine months, these companies will improve margins and learn how to implement a generative AI strategy, building up the funds and experience needed to invest in a more transformative vision.

Endless potential—and high hurdles 

The excitement around generative AI may feel akin to the hype around other recent digital and technology developments that never quite rose to their promised potential. Well-intentioned, well-informed individuals are debating how much change will truly materialize in the next few years. While developments over the past six months have been a testament to the breakneck speed of change, nobody can accurately predict what the next six months, year, or decade will look like. Will new players emerge? Will we rely on different LLMs for different use cases, or will one dominate the landscape?

Despite the uncertainty, generative AI already has the power to alleviate some of providers’ biggest woes, which include rising costs and high inflation, clinician shortages, and physician burnout. Quick relief is critical, considering that the heightened risk of a recession will only compound margin pressures, and the US could be short 40,800 to 104,900 physicians by 2030, according to the Association of American Medical Colleges.

Many health systems are eyeing imminent opportunities to reduce administrative burdens and enhance operational efficiency. They rank improving clinical documentation, structuring and analyzing patient data, and optimizing workflows as their top three priorities (see Figure 1).

Some generative AI applications are already streamlining administrative tasks and allowing thinly stretched physicians to spend more time with patients. For instance, Doximity is rolling out a ChatGPT tool that can draft preauthorization and appeal letters. HCA Healthcare partnered with Parlance, a conversational AI-based switchboard, to improve its call center experience while reducing operators’ workload. And there are new announcements seemingly every week: Consider how healthcare software company Epic Systems is incorporating ChatGPT with electronic health records (EHRs) to draft response messages to patients, or how Google Cloud is launching an AI-enabled Claims Acceleration Suite for prior authorization processing. 

These applications only scratch the surface of potential. In the future, generative AI could profoundly transform care delivery and patient outcomes. Looking ahead two to five years, executives are most interested in predictive analytics, clinical decision support, and treatment recommendations (see Figure 2).

It’s hard not to catch AI “fever.” But there are real challenges ahead. Some are already tackling the biggest questions: Organizations such as Duke Health, Stanford Medicine, Google, and Microsoft have formed the Coalition for Health AI to create guidelines for responsible AI systems. Even so, solutions to the greatest hurdles aren’t yet keeping up with the rapid technology development. Resource and cost constraints, a lack of expertise, and regulatory and legal considerations are the largest barriers to implementing generative AI, according to executives (see Figure 3).

Even when organizations can overcome these hurdles, one major challenge remains: focus and prioritization. In many boardrooms, executives are debating overwhelming lists of potential generative AI investments, only to deem them incomplete or outdated given the dizzying pace of innovation. These protracted debates are a waste of precious organizational energy—and time. 

Starting small to win big 

Setting the bar too high is setting up for failure. It’s easy to get caught up, betting big on what seems like the greatest opportunity in the moment. But 12 months later, leaders often find themselves frustrated that they haven’t seen results or feeling as if they’ve made a misplaced bet. Momentum and investments slow, further hindering progress. 

Leading companies are forming a more pragmatic strategy that considers current capabilities, regulations, and barriers to adoption. Their CEOs and CFOs work together to enforce four guiding principles: 

  • Pilot low-risk applications with a narrow focus first. Tomorrow’s leaders are making no-regret moves to deliver savings and productivity enhancements in short order—at a time when they need it most. Gaining experience with currently available technology, they are testing and learning their way to minimum viable products in low-risk, repeatable use cases. These quick wins are typically in areas where they already have the right data, can create tight guardrails, and see a strong potential return on investment. Some, like call center and chatbot support, can improve the patient experience. However, given the current challenges around regulation and compliance, the most successful early initiatives are likely to be internally focused, such as billing or scheduling. Most importantly, executives prioritize initiatives by potential savings, value, and cost.
  • Decide to buy, partner, or build. CEOs will need to think about how to invest in different use cases based on availability of third-party technology and importance of the initiative.
  • Funnel cost savings and experience into bigger bets. As the technology matures and the value becomes clear, companies that generate savings, accumulate experience, and build organizational buy-in today will be best positioned for the next wave of more sophisticated, transformative use cases. These include higher-risk clinical activities with a greater need for accuracy due to ethical and regulatory considerations, such as clinical decision support, as well as administrative activities that require third-party integration, such as prior authorization.
  • Remember generative AI isn’t a strategy unto itself. To build a true competitive advantage, top CEOs and CFOs are selective and discerning, ensuring that every generative AI initiative reinforces and enables their overarching goals.

Some health systems are already seeing powerful results from relatively small, more practical investments. For instance, recognizing that clinicians were spending an extra 130 minutes per day outside of working hours on administrative tasks, the University of Kansas Health System partnered with Abridge, a generative AI platform, to reduce documentation burden. By summarizing the most important points from provider-patient conversations, Abridge is improving the quality and consistency of documentation, getting more patients in the door, and cutting down on pervasive physician burnout.

Although it will require some upfront investment, in the long run it will be more costly to underestimate the level and speed at which generative AI will transform healthcare. The next generation of leaders will start testing, learning, and saving today, putting them on a path to eventually revolutionize their businesses.

Read More

8 Practical Predictions For The Near Future Of Healthcare

October 5, 2023

8 Practical Predictions For The Near Future Of Healthcare

Explore the 8 transformative predictions for healthcare’s near future, driven by technology and evolving knowledge. Dive into how patient-centric care, global health networks, remote care, patient design, tech giants’ entry, cultural shifts, personalized medicine, and AI-integrated medical teams are reshaping healthcare landscapes.

Andrea Koncz 11 min |

3 October 2023

Key Takeaways The healthcare sector is evolving towards a patient-centric model, leveraging digital technologies to shift care from traditional hospital settings to patients’ surroundings.

The entry of tech giants and the integration of Artificial Intelligence (AI) are set to significantly enhance diagnostic capabilities and personalize healthcare services.

A cultural transformation is underway, changing traditional patient-physician relationships into collaborative partnerships, further enriched by the emergence of AI in medical teams.

The healthcare sector is at a juncture of significant transformation, fueled by technological advancements and evolving medical knowledge. Over the past few years, we’ve pinpointed eight high-level trends that are set to reshape healthcare delivery profoundly. While we have explored these trends individually in previous posts, we’ve yet to summarize them cohesively in one comprehensive article.

This article aims to bridge that gap, providing a condensed overview beneficial for healthcare professionals and users alike.

1) Patients will become the point of care

In the pursuit of more streamlined and patient-centric healthcare, traditional hospital frameworks are under reassessment. The stereotypical scenes of long waiting lines, overwhelming paperwork, and sterile, uninviting corridors symbolize a dated workflow. The evolution towards modernity beckons a shift from this conventional setup, ushering in an era where patients, armed with digital health tools, become the focal point of care, reducing the dependency on traditional hospital confines.

This transformation is in part driven by the advent and adoption of digital health technologies, notably wearables, telemedicine, and solutions like at-home lab tests, that allow us to carry out procedures at home that formerly were impossible without visiting a healthcare institution.

Patients can now monitor their vital signs regardless of their location, sharing this data with healthcare providers remotely. Although this doesn’t render hospitals obsolete, (we won’t have MRI machines at home), it repositions them as health centers focused on disease prevention, acute care, and specific medical procedures requiring sophisticated equipment.

Although we started our list with this trend, this actually will be the end result of the paradigm shift we witness.

2. Healthcare becomes globalized

Companies such as Atlas Biomed in the UK, Dante Labs in Italy, and AliveCor in the US exemplify the erosion of geographical barriers, offering patients worldwide access to quality digital health services, albeit with some shipping restrictions. The digital health sector not only democratizes healthcare by making patients the point of care with direct-to-consumer services but also augments the doctor-patient relationship with a shared decision-making paradigm.

It’s worth mentioning that digital health can only assist users who have the means to afford it. As many studies have proved, wearables are currently not equally available to everyone around the globe, and in most cases, these devices are out of reach for users with no or limited access to quality healthcare. In other words, digital health is typically unavailable to those who would benefit the most from it. Health equity, however, is not a technological question , but a profound social issue.

The ultimate goal is to achieve affordable and universal access to healthcare, leveraging digital technologies to transcend traditional geographical and systemic barriers, making healthcare truly global. But this is a task that puts more burden on politicians and policymakers than on tech companies.

3) Remote care is the new norm

The evolution of remote care is setting a new standard in the healthcare landscape. Asynchronous telemedicine, a facet of remote care, allows for a more efficient exchange between patients and healthcare providers without the necessity for real-time communication. This form of telemedicine comes with a host of benefits including flexible schedules for healthcare practices, efficient task stacking, accommodating younger patients’ communication preferences, addressing language barriers, and transmitting extensive health data seamlessly. While concerns about impersonal communication and potential delays in response exist, the overarching narrative is about leveraging asynchronous telemedicine to mitigate physician shortages and burnout, which are pressing issues globally.

The COVID-19 pandemic significantly accelerated the adoption of telemedicine and at-home diagnostic kits, showcasing the convenience and accessibility digital health technologies offer. In the linked article, we also envisioned a futuristic scenario with a seamless global network of healthcare providers and the immense potential for patient-centered care. However, this global convergence necessitates updated training for physicians to adeptly handle international data and a new regulatory framework to ensure these technologies’ safe and effective use.

The palpable shift towards remote care isn’t just a technological leap but also a cultural transformation, propelled significantly by the pandemic. The newfound convenience, cost-effectiveness, and time efficiency associated with telemedicine stay with us in the post-pandemic era.

Additionally, remote care also intersects with broader goals like decarbonizing patient pathways . The shift to digital prescriptions, promoting healthy eating, organizing resources with AI, and especially the core of remote monitoring and care, play parts in a larger narrative of fostering a more sustainable and environmentally friendly healthcare model. As remote care integrates further into the healthcare system, the anticipation is not only towards more efficient patient-doctor interactions but also towards a significant reduction in the carbon footprint associated with traditional healthcare practices.

4) Patient design will rule what hospitals look like

The shift from a patient-centric to a patient-designed approach in healthcare is a much-needed evolution, aiming to materially involve patients in decision-making processes. Patient centricity, while a laudable idea, often remained a tokenistic gesture, where decisions were still heavily dominated by medical professionals without genuinely incorporating the patient’s perspective. On the contrary, patient design brings patients into the core of decision-making, reflecting a co-design approach where they are seen as active participants and stakeholders. This approach recognizes that the dated hierarchy, placing medical professionals as the sole custodians of relevant knowledge, no longer holds in the face of today’s information accessibility and technology-driven empowerment.

Patient design is manifested in various facets of healthcare, from product development to research and clinical design. Inspirational stories of individuals like Sarah Olson and Tal Golesworthy underscore the profound impact of patient-driven innovations, which arise from a dire need for practical solutions, often spurred by personal or familial suffering. Unlike traditional research teams focused on broad theories, these patient innovators are driven by a pressing need for immediate solutions. This approach leads to pragmatic, real-world applications that may not only alleviate the suffering of individuals and their loved ones but also significantly benefit others facing similar health challenges.

Furthermore, the integration of patient design extends to research priorities and clinical environments. For instance, when research is reshaped to address patients’ urgent needs , it bridges the existing gap between scientific advancements and public health outcomes.

Additionally, involving patients in clinical design, as the Netherlands’ Radboud University Medical Centre Nijmegen did, can redefine the healing environment to foster better patient-physician relationships and potentially expedite recovery. Patient design, therefore, is not merely a conceptual shift but a pragmatic approach to infusing healthcare with real-life perspectives and needs, making the field more responsive, humane, and effective.

5) Tech giants will join the healthcare ecosystem

The tech giants like Google, Amazon, Apple, Microsoft, NVIDIA, and IBM are rigorously exploring healthcare as a new frontier for innovation and business expansion. With a long-term vision, they are integrating technology to solve complex health challenges and improve healthcare delivery.

The core of this convergence lies in the integration of Artificial Intelligence (AI), machine learning, cloud computing , and other cutting-edge technologies. These are being employed across a myriad of applications such as diagnostic imaging, patient monitoring, and disease prediction, aiming to bolster patient care while mitigating costs.

This merger of tech and healthcare is also marked by a consumer-centric approach, as witnessed through the advent of wearables and health applications. These innovations are rendering healthcare more accessible and personalised, empowering individuals to monitor and manage their health seamlessly.

Tech companies entering the sector will lower the threshold for accessible diagnostics. While many can’t afford to spend $150-200 on a dermatologist consultation to check a new birthmark, they may spend a few dollars in an app to have it checked – and thus have a better chance to catch a malignant lesion in time.

However, this convergence is not without its challenges. Navigating through the complex labyrinth of healthcare regulations remains a significant hurdle. The paramount concern revolves around the privacy and security of health data , necessitating stringent compliance with healthcare standards and laws.

6) The cultural transformation changes the roles

The paradigm shift we witness is not just technological but also profoundly cultural, morphing traditional patient-physician relationships. The healthcare industry is moving away from a paternalistic, hierarchical model towards a more partnership-driven paradigm. In this new landscape, physicians transition from being the sole custodians of medical knowledge to guides navigating patients through a complex jungle of data, decisions, and information. This transformation underscores a democratization of healthcare knowledge, enabled by the vast swathes of information and digital tools now available at the fingertips of both medical practitioners and the populace.

On the flip side, patients evolve from being passive stakeholders, awaiting symptoms to manifest before seeking medical intervention, to proactive, empowered individuals. The modern patient is more informed, more engaged with their health, and desires a collaborative relationship with their healthcare providers. Wearable tech, health apps, and online platforms provide real-time data and a plethora of information, allowing individuals to monitor their health metrics continually, anticipate potential issues, and seek timely advice.

The infusion of technology acts as a significant catalyst in this cultural transformation, changing the way healthcare is perceived, accessed, and delivered.

7) Access to data makes healthcare personalised

Personalised medicine is tailored to the individual characteristics, needs, and preferences of patients. This form of medicine leverages advancements in technology, genomics, and data analytics to offer more precise, predictable, and preventive healthcare.

For instance, genetic testing, a cornerstone of personalized medicine, can unveil an individual’s susceptibility to certain diseases, their likely response to various treatments, and the potential risk of adverse reactions to specific drugs . This level of insight empowers clinicians to design treatment regimens that are aligned with each individual’s unique genetic makeup, significantly enhancing the efficacy and safety of medical interventions.

As we venture deeper into the realm of personalized medicine, the concept of digital twins emerges. Digital twins are virtual replicas of an individual’s physiological and genetic profile. These present a platform to simulate the effects of various treatments in a risk-free, virtual environment.

These digital replicas can allow for a meticulous analysis of treatment outcomes. For instance, in the foreseeable future, a digital twin could simulate the response of a cancer patient to different drug combinations, offering invaluable insights into the most effective and least toxic treatment strategy for that individual. This form of virtual trial-and-error can significantly accelerate the identification of optimal treatment protocols, minimizing the physical, emotional, and financial burdens traditionally associated with the trial-and-error nature of conventional medicine.

The marriage of genetic insights with innovative technologies fundamentally transforms the healthcare landscape. By embracing the tenets of personalised medicine, healthcare providers can offer more targeted, effective, and safe medical care, where the right treatment is delivered to the right patient at the right time.

8) New medical teams arise

The paradigm shift in healthcare towards a more inclusive and collaborative approach redefines the traditional medical team’s construct. Gone are the days when the medical professional was the sole custodian of a patient’s health journey. In the contemporary scenario, the medical team is a confluence of healthcare professionals, the patient, and increasingly, artificial intelligence. The rationale is simple yet profound: who could be more invested in a patient’s well-being than the patient themselves? Empowering individuals to take the reins of their health not only democratizes healthcare but also cultivates a richer, more informed dialogue.

Initially, the notion of patients morphing into active participants in their healthcare may evoke apprehensions among both medical professionals and patients. However, the continuously expanding corpus of medical knowledge renders it virtually impossible for any physician, regardless of their dedication, to single-handedly stay abreast of every medical advancement. Contrastingly, an engaged patient or their kin can delve into the depths of specific medical domains, unearthing contemporary studies, technologies, or emerging treatment protocols relevant to their health conditions. Though not replacements for medical professionals, these informed individuals can enhance the decision-making process.

As we tread further along this trajectory, artificial intelligence (AI) emerges as the newest entrant in this expanding medical team. With an innate capacity to sift through vast swaths of data, discern patterns, and provide insights, AI serves as a force multiplier in the medical domain. Algorithms, designed to assist in diagnostics, treatment planning, and monitoring are making headway, and we can rest assured that various AI models will become safer and more efficient in assisting medical work.

Read More

Creation and Adoption of Large Language Models in Medicine

October 3, 2023

[Skip to Navigation]

 full text icon

Full Text

 contents icon

Contents

 figure icon

Figures / Tables

 references icon

References

 related icon

Related

Download PDF

Top of Article

Abstract

Introduction

Are the LLMs Being Trained With the Relevant Data and the Right Kind of Self-Supervision?

Are the Purported Value Propositions of Using LLMs in Medicine Being Verified?

Conclusion

Article Information

References

Figure.  An Overview of the Key Issues in Shaping the Creation and Adoption of Large Language Models (LLMs) in Medicine

  

EHR indicates electronic health record. A, During training, LLMs learn generally useful patterns in large amounts of unlabeled data via self-supervision. For example, a commonly used form of self-supervision is to predict the next word in a sequence conditioned on prior words, as seen in large public datasets. Once an LLM is trained, users can interact with it via submitting an instruction (or prompt), to which the LLM responds with a sequence of words that are its valid completions. B, As is, LLMs are not good at following instructions. The LLMs are adapted for specific tasks by providing examples of instructions (blue) and the expected responses (yellow). This adaptation process is called tuning. In responding to medical instructions, such as “summarize the past specialist visits of a patient,” LLMs require tuning using a set of relevant instructions and their expected responses.

1. Li

R, Kumar

A, Chen

JH.  How chatbots and large language model artificial intelligence systems will reshape modern medicine: fountain of creativity or Pandora’s box?   JAMA Intern Med . 2023;183(6):596-597. doi: 10.1001/jamainternmed.2023.1835 Article Article | PubMed | Google Scholar | Crossref

2. Vaswani

A, Shazeer

N, Parmar

N,

et al. Attention is all you need. In:  Proceedings of the 31st International Conference on Neural Information Processing Systems: NIPS ’17 . Curran Associates Inc; 2017:6000-6010.

3. Zhao

WX, Zhou

K, Li

J,

et al.  A survey of large language models.   arXiv . Preprint posted online March 31, 2023. https://arxiv.org/abs/2303.18223v10 Google Scholar

4. Lee

P, Bubeck

S, Petro

J.  Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine.   N Engl J Med . 2023;388(13):1233-1239. doi: 10.1056/NEJMsr2214184 PubMed | Google Scholar | Crossref

5. Wikipedia contributors. GPT-3. Published May 8, 2023. Accessed July 25, 2023. https://en.wikipedia.org/w/index.php?title=GPT-3&oldid=1153892380

6. OpenAI. Aligning language models to follow instructions. Published January 27, 2022. Accessed May 22, 2023. https://openai.com/research/instruction-following

7. Chui

M, Hazan

E, Roberts

R,

et al. The economic potential of generative AI: the next productivity frontier. Published June 14, 2023. Accessed June 16, 2023. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier

8. Wornow

M, Xu

Y, Thapa

R,

et al.  The shaky foundations of large language models and foundation models for electronic health records.   NPJ Digital Med . 2023;135(6). doi: 10.1038/s41746-023-00879-8 Google Scholar | Crossref

9. Taori

R, Gulrajani

I, Zhang

T, Dubois

Y, Li

X. Stanford Alpaca: code and documentation to train Stanford’s Alpaca models, and generate the data. Accessed June 16, 2023. https://github.com/tatsu-lab/stanford_alpaca

10. Steinberg

E, Jung

K, Fries

JA, Corbin

CK, Pfohl

SR, Shah

NH.  Language models are an effective representation learning technique for electronic health record data.   J Biomed Inform . 2021;113:103637. doi: 10.1016/j.jbi.2020.103637 PubMed | Google Scholar | Crossref

11. Mello

MM, Guha

N.  ChatGPT and physicians’ malpractice risk.   JAMA Health Forum . 2023;4(5):e231938. doi: 10.1001/jamahealthforum.2023.1938 Article Article | PubMed | Google Scholar | Crossref

12. Brynjolfsson

E. The turing trap: the promise and peril of human-like artificial intelligence. In:  Augmented Education in the Global Age . Routledge; 2023:103-116.

An AI-Enhanced Electronic Health Record Could Boost Primary Care Productivity

JAMA |

Viewpoint |

September 5, 2023

This Viewpoint discusses ways that artificial intelligence (AI) may improve the productivity of primary care physicians with easier and more accurate use of AI-enhanced electronic health records.

JAMA Call for Papers for AI in Medicine

JAMA |

Editorial |

September 5, 2023

Large Language Models Answer Medical Questions Accurately, but Can’t Match Clinicians’ Knowledge

JAMA |

Medical News & Perspectives |

September 5, 2023

This Medical News article discusses new research on artificial intelligence systems such as ChatGPT and Med-PaLM.

See More About

Health Informatics

Trending

Call for Research for the 10th International Congress on Peer Review and Scientific Publication

JAMA |

Opinion |

October 3, 2023 |

Evaluating the Application of Large Language Models in Clinical Research Contexts

JAMA Network Open |

Opinion |

October 2, 2023 |

How to Safely Integrate Large Language Models Into Health Care

JAMA Health Forum |

Opinion |

September 21, 2023 |

Select Your Interests

Others Also Liked

AlphaFold2 and its applications in the fields of biology and medicine

Zhenyu Yang et al.,

Signal Transduction and Targeted Therapy, 2023

TCMBank-the largest TCM database provides deep learning-based Chinese-Western medicine exclusion prediction

Qiujie Lv et al.,

Signal Transduction and Targeted Therapy, 2023

Orchestration of cell plasticity by phase separation

Xuebiao Yao,

Journal of Molecular Cell Biology, 2021

Powered by

This Issue

Views

42,666

|

Citations

0

|

Altmetric 117

|

PDF



 

 CME & MOC

Special Communication

| AI in Medicine

August 7, 2023

Creation and Adoption of Large Language Models in Medicine

Nigam H. Shah, MBBS, PhD 1,2,3 ; David Entwistle, BS, MHSA 1 ; Michael A. Pfeffer, MD 1,2

 Author Affiliations

| Article Information

JAMA. 2023;330(9):866-869. doi:10.1001/jama.2023.14217

 editorial comment icon

Editorial Comment

 related articles icon

Related Articles

Editorial

JAMA Call for Papers for AI in Medicine

Rohan Khera, MD, MS; Atul J. Butte, MD, PhD; Michael Berkwits, MD, MSCE; Yulin Hswen, ScD, MPH; Annette Flanagin, RN, MA; Hannah Park; Gregory Curfman, MD; Kirsten Bibbins-Domingo, PhD, MD, MAS

JAMA

Viewpoint

An AI-Enhanced Electronic Health Record Could Boost Primary Care Productivity

Jeffrey E. Harris, MD, PhD

JAMA

Medical News & Perspectives

Large Language Models Answer Medical Questions Accurately, but Can’t Match Clinicians’ Knowledge

Emily Harris

JAMA

Abstract

Importance

There is increased interest in and potential benefits from using large language models (LLMs) in medicine. However, by simply wondering how the LLMs and the applications powered by them will reshape medicine instead of getting actively involved, the agency in shaping how these tools can be used in medicine is lost.

Observations

Applications powered by LLMs are increasingly used to perform medical tasks without the underlying language model being trained on medical records and without verifying their purported benefit in performing those tasks.

Conclusions and Relevance

The creation and use of LLMs in medicine need to be actively shaped by provisioning relevant training data, specifying the desired benefits, and evaluating the benefits via testing in real-world deployments.

Advertisement

Introduction

Large language models (LLMs) and the applications built using them, such as ChatGPT, have become popular. Within 2 months of the November 2022 release, ChatGPT surpassed 100 million users. The medical community has been pursuing off-the-shelf LLMs provided by technology companies. New users have been asking how the LLMs and the chatbots powered by them will reshape medicine. 1 Perhaps the reverse question should be asked: How can the intended medical use shape the training of the LLMs and the chatbots or the other applications they power?

Language models learn the probabilities of occurrence for sequences of words from the corpus of text. For example, if the corpus had the 2 questions of “where are we going” and “where are we at,” the probability is 0.5 for seeing the word going after seeing the 3 words where are we . An LLM is essentially learning such probabilities on a massive scale, such that the resulting model has billions of parameters (a glossary appears in the Box ). In 2017, Vaswani et al 2 demonstrated that a certain kind of deep neural network, called a transformer, could learn LLMs that later performed amazingly well at language translation tasks. Their insight led to the creation of hundreds of language models that were reviewed by Zhao et al. 3

Box.

Glossary Chatbot

A computer program designed to simulate conversation with human users, especially over the internet.

Deep neural network

A setup for machine learning inspired by biological neural networks in which computational units referred to as neurons are arranged in a network that is composed of multiple layers of interconnected neurons, allowing it to learn complex patterns in the data presented to it.

Large language model (LLM)

Learns the probabilities of occurrence of sequences of words from a corpus of text, whose probabilities are learned using textual corpora with trillions of words such that the resulting model has billions of parameters.

Self-supervision

A learning approach in which a machine learning model learns without relying on explicitly labeled data as examples. Instead, the model generates its own training objective from the input data without the need for human-annotated data, which can be time-consuming and expensive to produce. A common type of self-supervision is in the form of an autoregressive training objective, in which the model is trained to predict the next word or token in a sequence, given the previous words or tokens. The training objective is to maximize the likelihood of the correct word given the context. Training in this manner is often the first stage in training LLMs (generative pretrained transformer) and helps the model learn language structure, grammar, and semantics. Learning to predict the next medical code in a patient’s longitudinal medical record does not require a human to label a code as the “next code”; that information is available in the data directly by looking at the sequence of appearance of the code in the medical record.

Transformer

A deep neural network architecture that is designed to be efficient at capturing relationships and dependencies between elements in a sequence, such as words in a sentence.

Instruction tuning

Refers to a kind of tuning in which an existing LLM is adapted (via tuning) to respond accurately and effectively to natural language instructions. This process involves continuing to train the model on a dataset containing pairs of instructions and corresponding desired outputs or responses. Doing so allows the model to be more useful in real-world applications, such as providing relevant information, answering questions, or following specific commands provided by users in a natural language.

Tuning

Refers to the process of adapting a pretrained LLM to perform well on a specific task or domain. This process involves training the model on a smaller labeled dataset that is specific to the target task, such as sentiment analysis, machine translation, or answering questions. For example, in a medical setting, a model could be tuned for tasks such as summarizing the available past medical records of a patient or the course of their current admission. During tuning, the model’s weights and parameters are updated using pertinent examples to optimize its performance on the target task. This allows the model to build on the general language understanding it gained via self-supervised learning, while adapting to the nuances and specific requirements of the task at hand.

Although language models are trained to predict the next word in a sentence (basically an advanced autocomplete), new capabilities (such the ability to summarize text and answer questions posed in natural language) become possible without explicitly training for them, which allow the model to perform tasks such as pass medical licensing examinations, simplify radiology reports, extract drug names from a physician’s note, reply to patient questions, summarize medical dialogues, and write histories and physical assessments. 4 ChatGPT, perhaps the most popular application, uses an LLM called a generative pretrained transformer (GPT; version 3.5 or 4.0) underneath to ingest text and output text in response.

The creation of language models capable of such diverse tasks hinges on 2 things. First is the ability to learn generally useful patterns in large amounts of unlabeled data via self-supervision (training and interacting with an LLM in the Figure ). For example, a commonly used form of self-supervision is to predict the next word in a sequence conditioned on prior words, which later identifies the words that go together in general. The GPT-3 model was trained on 45 terabytes of text data comprising roughly 500 billion tokens (1 token is approximately 4 characters or three-fourths of a word for English text) at a cost of approximately $4.6 million. 5

Second is the subsequent tuning of the LLM to generate responses aligned with human expectations via instruction tuning. For example, in response to the request, “explain the moon landing to a 6-year-old in a few sentences,” the GPT-3 model suggested possible completions as “explain the theory of gravity to a 6-year-old” and “explain the big bang theory to a 6-year-old” (instruction tuning an LLM in the Figure ). Users helped train GPT-3 by providing the instructions (also called prompts) for which the labelers (hired by OpenAI, the company that built GPT-3) provided demonstrations of the desired output and ranked the outputs from the model. OpenAI used these pairs of instructions and their desired outputs to instruction tune GPT-3. 6

Although general-purpose LLMs can perform many medically relevant tasks, they have not been exposed to medical records during self-supervised training and they are not specifically instruction tuned for any medical task. By not asking how the intended medical use can shape the training of LLMs and the chatbots or other applications they power, technology companies are deciding what is right for medicine. The medical profession has made a mistake in not shaping the creation, design, and adoption of most information technology systems in health care. Given the profound disruption that is possible for such diverse activities as clinical documentation, decision support, information technology operations, medical coding, and patient-physician communication with the use of LLMs (estimated in a McKinsey report to be as high as 1.8%-3.2% of total health care revenues 7 ), the same mistake cannot be repeated. At a minimum, the medical profession should be asking the following questions.

Are the LLMs Being Trained With the Relevant Data and the Right Kind of Self-Supervision?

Medical records can be viewed as consisting of sequences of time-stamped clinical events represented by medical codes and textual documents, which can be the training data for a language model. Wornow et al 8 reviewed the training data and the kind of self-supervision used by more than 80 medical language models and found 2 categories.

First, there are medical LLMs that are trained on documents. The self-supervision is via learning to predict the next word in a textual document, such as a progress note or a PubMed abstract, and conditioned on prior words seen. Therefore, these models are similar in their anatomy to general purpose LLMs (eg, GPT-3), but are trained on clinical or biomedical text. These models can be used for language manipulation tasks such as summarization, translation, and answering questions. Given the increased training and use costs of LLMs, it is necessary to investigate whether smaller language models trained on relevant data may achieve the desired performance at a lower cost. For example, researchers at the Center for Research on Foundation Models at Stanford University created a model called Alpaca with 4% as many parameters as OpenAI’s text-davinci-003, matching its performance at a cost of $600 to create. 9

Second, there are medical LLMs that are trained on the sequence of medical codes in a patient’s entire record that take time into account. Here, the self-supervision is in the form of learning the probability of the next day’s set of codes, or learning how much time elapses until a certain code is seen. As a result, the sequence and timing of medical events in a patient’s entire record is considered. As a concrete example, given the code for “hypertension,” these models learn when a code for a stroke, myocardial infarction, or kidney failure is likely to occur. When provided with a patient’s medical record as input, such models will not output text but instead a machine understandable “representation” of that patient, referred to as an “embedding,” which is a fixed-length, high-dimensional vector representing the patient’s medical record. Such embeddings can be used in building models for predicting 30-day readmissions, long hospital lengths of stay, and in-patient mortality using less training data (as few as 100 examples). 10

The medical community needs to actively shape the creation of LLMs in medicine. For example, given the importance of instruction tuning, the medical community should be discussing how to create shared instruction tuning datasets with examples of prompts to be fulfilled, such as “summarize the past specialist visits of a patient” with its corresponding valid completion ( Figure ). Perhaps instead of using GPT-4 at the cost of $0.06 to $0.12 per 1000 tokens (about 75 words), health care systems should be training shared, open-source models using their own data. The technology companies should be asked whether the models being offered have seen any medical data during training and whether the nature of self-supervision used is relevant for the final use of the model.

Are the Purported Value Propositions of Using LLMs in Medicine Being Verified?

Current evaluations of LLMs also do not quantify the benefits of novel collaboration between humans and artificial intelligence that is at the core of using these models in clinical settings. The methods for evaluating LLMs in the real world remain unclear. Concerns with current evaluations range from training dataset contamination (such as when the evaluation data are included in the training dataset) to the inappropriateness of using standardized examinations designed for humans to evaluate the models. Consider the analogy of evaluating a person for a driver’s license. The person takes a multiple-choice, knowledge-based test. The car, meanwhile, undergoes safety tests during manufacturing, some of which are regulated by the government. Then the person gets in the car for a road test to certify them for a license. The car does not take a multiple-choice test at the department of motor vehicles or get certified for driving, but that is the absurdity tolerated for LLMs when it is declared that they are certified to give medical advice because they passed the US medical licensing examination.

The purported benefits need to be defined and evaluations conducted to verify such benefits. 8 Only after these evaluations are completed should statements be allowed such as an LLM was used for a defined task in this specific workflow, it measured a metric, and observed an improvement (or deterioration) in a prespecified outcome. Such evaluations also are necessary to clarify the medicolegal risks that might occur with the use of LLMs to guide medical care, 11 and to identify mitigation strategies for the models’ tendency to generate factually incorrect outputs that are probabilistically plausible (called hallucinations).

Conclusion

The building of relevant medical LLMs needs to be balanced with verifying the presumed value propositions via testing in real-world deployments akin to road driving tests. If the goal in using such models is to augment human judgment, and not replace it, adopting this driving test mindset is critically important. Otherwise, there is a risk of falling into the trap of automating tasks that individuals already know how to do, and failing to ask the question of what a person plus such models could do together that may yield better medical care. 12

Given the highly disruptive potential of these technologies, clinicians cannot afford to be on the sidelines. The adoption of LLMs in medicine needs to be shaped by the medical profession that can identify the right training (and instruction tuning) data and perform the evaluations that verify the purported benefits of using LLMs in medicine.

Back to top

Article Information

Accepted for Publication: July 11, 2023.

Published Online: August 7, 2023. doi: 10.1001/jama.2023.14217

Corresponding Author: Nigam H. Shah, MBBS, PhD, Center for Biomedical Informatics Research, Stanford University, 3180 Porter Dr, Palo Alto, CA 94305 ( nigam@stanford.edu ).

Author Contributions: Dr Shah had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: All authors.

Drafting of the manuscript: Shah, Pfeffer.

Critical review of the manuscript for important intellectual content: All authors.

Administrative, technical, or material support: Entwistle, Pfeffer.

Supervision: Pfeffer.

Conflict of Interest Disclosures: Dr Shah reported being a co-founder of Prealize Health (a predictive analytics company) and Atropos Health (an on-demand evidence generation company). No other disclosures were reported.

Additional Contributions: We acknowledge the members of the data science team at Stanford Health Care for helpful discussions to refine the arguments made in this article. We acknowledge Jason Fries, PhD, and Alison Callahan, PhD (both with Stanford University), for help in creating the first draft of the Figure; they were not compensated for their contributions.

References

1. Li

R, Kumar

A, Chen

JH.  How chatbots and large language model artificial intelligence systems will reshape modern medicine: fountain of creativity or Pandora’s box?   JAMA Intern Med . 2023;183(6):596-597. doi: 10.1001/jamainternmed.2023.1835 Article Article | PubMed | Google Scholar | Crossref

2. Vaswani

A, Shazeer

N, Parmar

N,

et al. Attention is all you need. In:  Proceedings of the 31st International Conference on Neural Information Processing Systems: NIPS ’17 . Curran Associates Inc; 2017:6000-6010.

3. Zhao

WX, Zhou

K, Li

J,

et al.  A survey of large language models.   arXiv . Preprint posted online March 31, 2023. https://arxiv.org/abs/2303.18223v10 Google Scholar

4. Lee

P, Bubeck

S, Petro

J.  Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine.   N Engl J Med . 2023;388(13):1233-1239. doi: 10.1056/NEJMsr2214184 PubMed | Google Scholar | Crossref

5. Wikipedia contributors. GPT-3. Published May 8, 2023. Accessed July 25, 2023. https://en.wikipedia.org/w/index.php?title=GPT-3&oldid=1153892380

6. OpenAI. Aligning language models to follow instructions. Published January 27, 2022. Accessed May 22, 2023. https://openai.com/research/instruction-following

7. Chui

M, Hazan

E, Roberts

R,

et al. The economic potential of generative AI: the next productivity frontier. Published June 14, 2023. Accessed June 16, 2023. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier

8. Wornow

M, Xu

Y, Thapa

R,

et al.  The shaky foundations of large language models and foundation models for electronic health records.   NPJ Digital Med . 2023;135(6). doi: 10.1038/s41746-023-00879-8 Google Scholar | Crossref

9. Taori

R, Gulrajani

I, Zhang

T, Dubois

Y, Li

X. Stanford Alpaca: code and documentation to train Stanford’s Alpaca models, and generate the data. Accessed June 16, 2023. https://github.com/tatsu-lab/stanford_alpaca

10. Steinberg

E, Jung

K, Fries

JA, Corbin

CK, Pfohl

SR, Shah

NH.  Language models are an effective representation learning technique for electronic health record data.   J Biomed Inform . 2021;113:103637. doi: 10.1016/j.jbi.2020.103637 PubMed | Google Scholar | Crossref

11. Mello

MM, Guha

N.  ChatGPT and physicians’ malpractice risk.   JAMA Health Forum . 2023;4(5):e231938. doi: 10.1001/jamahealthforum.2023.1938 Article Article | PubMed | Google Scholar | Crossref

12. Brynjolfsson

E. The turing trap: the promise and peril of human-like artificial intelligence. In:  Augmented Education in the Global Age . Routledge; 2023:103-116.

JAMA ®

JAMA Network ™

Help



Get the latest from JAMA

Sign Up

Privacy Policy | Terms of Use

© 2023 American Medical Association. All Rights Reserved.

Terms of Use |

Privacy Policy |

Accessibility Statement |

Cookie Settings

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_~

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_~

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_~

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_~

Read More

Where health systems are boosting tech spending

October 3, 2023

CIOs told Becker's that health systems are increasing their spending in revenue cycle management, clinical workflow optimization and patient engagement.

A recent report from KLAS said 80 percent of healthcare organizations increased their technology spending in the past year and that the top three areas where money was spent were those cited by the CIOs. 

Will Landry, senior vice president and CIO of Baton Rouge, La.-based Franciscan Missionaries of Our Lady Health System, said the report was accurate. 

"These three items are core to our IS strategic vision for the foreseeable future," he told Becker's

Michael Williams, vice president and CIO of LMH Health, based in Lawrence, Kan., also agreed with the assessment on where hospitals and health systems are increasing IT spending.

This comes as KLAS indicated that the increase in IT spending is expected to continue to grow as health systems look to keep up with emerging technologies, ease labor shortages and help reduce cost pressures.

Read More
View All
Insights by Kate Gamble
View All
Our Partners

Premier

Diamond Partners

Platinum Partners

Silver Partners

This Week Health
Healthcare Transformation Powered by Community

Questions about the Podcast?

Contact us with any questions, requests, or comments about the show. We love hearing your feedback.

Hello@ThisWeekHealth.com

Looking to connect or attend events? Visit our sister organization, 229 Project
Click here.

© Copyright 2024 Health Lyrics All rights reserved