Hallucinating AI Perfection in Healthcare: Navigating the Challenge of Hallucinations

by Joshua Tamayo-Sarver, MD, PhD, FACEP, FAMIA

Inflect Health
5 min readOct 28, 2024

As a physician working on the front lines of healthcare for over 15 years, I’ve come to accept human fallibility. We make mistakes, misinterpret things, and sometimes, our brains play tricks on us. I recall one instance when I was driving down the highway, exceeding the speed limit, of course. Up ahead, I spotted a car with something on its roof and my heart skipped a beat. Highway patrol! my brain screamed. I slammed on the brakes, only to realize it was a regular sedan with a ski rack. Yet another reminder to get my regular eye exam.

This real-life example of a hallucination — a misinterpretation of patterns based on incomplete information — serves as a perfect analogy for the challenges we face with artificial intelligence in healthcare today. Just as our brains can misinterpret visual cues, AI algorithms, particularly Large Language Models (LLMs), can also misinterpret patterns in data, leading to inaccurate conclusions. We call these AI “hallucinations.”

Understanding AI Hallucinations

LLM hallucinations occur when an LLM system produces demonstrably incorrect or misleading outputs, appearing confident and plausible despite being factually flawed (Hatem et al., 2023). In the context of healthcare, these hallucinations can have serious implications. Imagine an LLM system misinterpreting patient data, leading to unnecessary interventions or delayed treatments. Even worse, consider an LLM-driven system recommending incorrect drug dosages, potentially compromising patient safety (Gondode et al., 2024).

The root causes of AI hallucinations are multifaceted. They can stem from insufficient or biased training data, limitations in the LLM model’s architecture, or even the inherent probabilistic nature of how these models function based on pattern recognition without any conceptual framework. As Bender et all (2021) notes: “Large language models operate on statistical patterns in text, without any true understanding of meaning or facts. This fundamental characteristic leads to a tendency for these models to generate plausible-sounding but potentially false or nonsensical information — a phenomenon often referred to as hallucination.” (Bender et al., 2021) For instance, an LLM model trained predominantly on data from adult patients of average weight might struggle when faced with pediatric or obese patients, potentially “hallucinating” inappropriate dosage recommendations (Gondode et al., 2024).

The Impact on Healthcare

The potential impact of LLM hallucinations in healthcare is significant and varied:

  1. Misdiagnosis and mistreatment: LLM systems might misinterpret patient data, leading to incorrect diagnoses or treatment plans.
  2. Medication errors: Incorrect drug dosage recommendations could have severe consequences for patient safety.
  3. Research skewing: LLM-driven analysis of medical data could be skewed by hallucinations, potentially leading to misleading conclusions in research.
  4. Legal and ethical concerns: Questions of liability and informed consent become complex when AI systems are involved in decision-making processes.

Recent research suggests these concerns are far from just theoretical. A study by the University of Massachusetts Amherst and Mendel found hallucinations in “almost all” of the medical summaries generated by state-of-the-art language models. The most frequent hallucinations were related to symptoms, diagnosis, and medicinal instructions, highlighting the ongoing challenges in applying AI to the medical domain (Adams, 2024). This highlights the real-world risks of relying on AI in healthcare without sufficient safeguards, as these inaccuracies can have serious implications for patient care.

Strategies for Mitigating LLM Hallucinations within LLM

Despite these challenges, there are several strategies we can employ to mitigate the risk of LLM hallucinations in healthcare:

  1. High-quality, diverse training data: Utilizing comprehensive and diverse datasets can significantly improve AI model accuracy and reduce hallucination risks.
  2. Explainable AI: Developing transparent AI models aids in identifying and rectifying hallucinations. This approach allows us to trace back erroneous predictions to specific data points, enabling targeted adjustments.
  3. Human oversight: Maintaining human expertise in the loop is crucial. While LLMs can process vast amounts of data, the nuanced understanding and context provided by healthcare professionals remain irreplaceable.
  4. Continuous monitoring and improvement: Regular evaluation and refinement of LLM models based on real-world performance is essential.

The Bottom Line: Understand the Math the LLM is based on, and incorporate non-LLM approaches

As we navigate this new era of medical innovation, it’s crucial to recognize that the journey with AI in healthcare is just beginning. We have all been impressed by the incredible power of the LLM to do things none of us expected to see in our lifetimes. Yet the LLM is still based on the math that underlies its functions. That math is not suited for self-detecting hallucinations with sufficient reliability for medical applications. So, what are we to do to safely unlock the awesome power of LLM in healthcare?

On our teams, we find that a diversity of perspectives leads to the best decisions. We are learning that AI tools are remarkably similar. Having a range of LLM models means each model “thinks” the same way and is not nearly has helpful as having an ensemble of different model types that are best at their specific function. Now, on the AI tools that we have developed and the AI tools that we are developing, we are incorporating diverse approaches to ensure hallucinations are caught prior to being surfaced to the user. One approach for LLM summarization involves establishing a ground truth, such as a source document, and then having good old-fashioned code ensure that what is offered as a solution by the LLM truly exists in the source of truth.

Building effective AI solutions for healthcare requires more than just technical prowess; it demands a deep understanding of the healthcare landscape and a deep understanding of the mathematical models on which the technologies are built. It requires a team of clinicians, data scientists, security experts, compliance officers, and engineers working together to ensure that AI solutions are trained on robust, clinically relevant data, eliminating the risk of hallucinations and maximizing real-world impact.

In the end, just as I learned to double-check before slamming on my brakes, we must approach AI in healthcare with a balance of enthusiasm and caution. Because when it comes to healthcare, there’s no room for hallucinations — whether they’re human or artificial. The stakes are simply too high.

Adams, K. (2024, August 11). How Often Do LLMs Hallucinate When Producing Medical Summaries? MedCity News. https://medcitynews.com/2024/08/ai-healthcare-llm/

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623. https://doi.org/10.1145/3442188.3445922

Gondode, P., Duggal, S., & Mahor, V. (2024). Artificial intelligence hallucinations in anaesthesia: Causes, consequences and countermeasures. Indian Journal of Anaesthesia, 68(7), 658. https://doi.org/10.4103/ija.ija_203_24

Hatem, R., Simmons, B., & Thornton, J. E. (2023). A Call to Address AI “Hallucinations” and How Healthcare Professionals Can Mitigate Their Risks. Cureus, 15(9), e44720. https://doi.org/10.7759/cureus.44720

Joshua Tamayo-Sarver, MD, PhD, FACEP, FAMIA

--

--