The Price of Progress: Scaling AI Solutions Effectively in Healthcare
Joshua Tamayo-Sarver, MD, PhD, FACEP, FAMIA
“The AI tool is taking off with adoption like we’ve never seen before.” This was coming from one of the most gifted IT leaders I have had the pleasure of knowing and he had helped guide one of our AI products from ideation to scaling at record speed. I was waiting to speak on a panel about AI, and we were outside the conference room. “It’s amazing, we have really got this thing figured out,” I foolishly responded. “I could tell we hit a nerve,” he unfortunately continued, “because I saw we paid $180,000 more this month for the tokens processed.” Um, huh? Last month we were under $1,000 total. Houston, we have a problem. I clearly know nothing about how to make this work.
As AI continues to revolutionize healthcare, the journey from innovative concept to widespread implementation is fraught with challenges. One of the most significant hurdles we face is the often-overlooked cost of scaling these solutions. While the initial stages of AI development can feel like an exhilarating playground for innovation, the transition to real-world deployment introduces a sobering reality: every decision has a price tag.
In the early days of AI development in healthcare, we were captivated by the potential of large language models (LLMs) and complex prompts. The ability to process vast amounts of medical data and generate insights seemed limitless at our fingertips. However, as we moved from proof-of-concept to production-level systems, we quickly realized that every token — each word or piece of data fed into the LLM — came with a cost.
This realization forced us to reevaluate our approach. We discovered that by strategically refining our context windows and identifying only the most essential information to feed into the LLM, we could dramatically reduce costs without compromising performance. This process of distillation not only improved efficiency but also enhanced the quality of the AI’s output by focusing on the most relevant data.
Balancing Efficiency and Effectiveness
One of the key lessons learned in scaling AI solutions is the importance of using the right tool for the job. While LLMs excel at pattern recognition and inferring probabilistic conclusions, they are not always the most efficient solution for every task in healthcare.
For example, mapping medical diagnoses to specific billing codes is a critical but relatively straightforward task in healthcare administration. Testing this, we found that using LLMs for this purpose was not only less reliable but also significantly more expensive than implementing a simple lookup table. This realization led us to adopt a more nuanced approach, reserving the power of AI for tasks that truly benefit from its capabilities while using more traditional, cost-effective methods for simpler operations.
The Multi-Agent Approach
Another innovative strategy we’ve employed is the multi-agent approach. By breaking down complex tasks and distributing them among multiple specialized AI agents, we’ve been able to improve performance significantly. However, this approach initially led to increased costs as each agent required its own context window.
To mitigate this, we implemented a preliminary agent responsible for distilling the large initial context into key elements. This refined information is then passed to subsequent agents, resulting in exponential cost savings while maintaining the benefits of the multi-agent system.
Investing in the Future of Healthcare AI
The journey of scaling AI solutions in healthcare has taught us valuable lessons about balancing innovation with practicality. It’s not just about creating the most advanced AI; it’s about developing solutions that are effective, financially sustainable, and importantly scalable.
By embracing smart design principles, streamlined workflows, and a deep understanding of cost drivers, we can make AI-powered healthcare a reality for a broader population. This approach not only ensures the viability of AI solutions but also aligns with the broader goal of improving patient care and outcomes.
Using these approaches, we were able to decrease our compute costs by more than 95% while improving speed and quality. But I can still vividly remember the shock of his offhand statement about our gigantic cost outside that conference room. Although he is a good friend, I may try to save our future conversations for after I speak to an audience.