GPT-4 and LLMs like it hold promise for healthcare, but caution is warranted

Photo: Augmedix
Regulators want to get their hands on artificial intelligence like ChatGPT, but workers love it.
Here healthcare sits at GPT's intersection of push and pull. And like other health tech conferences, at HIMSS23 next week, there will be a lot of GPT pull.
Manny Krakaris, CEO of Augmedix, says large language models are in the automated medical documentation and data services company's DNA. Augmedix has focused on the problem of note bloat and as a result, operates refined LLMs in its Ambient Automation Platform available to healthcare systems and providers.
We spoke to Krakaris this week about healthcare's need for AI to help physicians streamline processes and the fascination with OpenAI's GPT models.
Everyone is trying out ChatGPT because OpenAI has made it easy to work with, he said.
"Every company now is going to claim that they use LLMs. And it's really easy to use them. You get a license, you get the API and you can integrate it into whatever process you use. So they're really easy to use, but what you do with it will vary greatly by company."
The push and pull of natural language processing
Put simply, having natural language processing capabilities at the fingertips – right in the software programs and platforms innovators, processors and other workers use every day – can speed things up, potentially a lot.
While ChatGPT may have passed its U.S. Medical Licensing Exam, one emergency room doctor found its results alarmingly disturbing in diagnosing patients. The ER doctor said he feared that people are using ChatGPT to medically diagnose themselves instead of seeing a doctor.
The issue is that GPT does not offer pitch-perfect accuracy.
"Large language models have a tendency to hallucinate," said Krakaris.
Or, they can just miss something completely, because the model is based on information fed to it. In ChatGPTs case, it contains data through 2021.
The good and the bad
AI like GPT has the potential to help physicians consume and summarize publicized, scientifically validated data.
"What they're really good at is providing color and context to medical issues," Krakaris said.
"They're very accurate in responding to specific prompts or questions. They're also broadly applicable so they can cover a wide range of subjects."
But the LLM behind GPT and other models like it are not a panacea of information for healthcare, he says.
"The weaknesses – the key weakness of LLMs is that they have a heavy dependency on the transcript. But a good medical note requires input not just from the transcript, but from the electronic health record and from presets," said Krakaris.
For example, "at one point in a conversation, a patient might say, 'I was on my regular medication,' and then later on they might say something totally contradictory. 'I was not on my regular medication,' but they might be talking about two different symptoms or complaints that aren't picked up."
The output of LLMs can also can contain information that isn't germane to medical notes, which add to the bloat.
"You have a lot of irrelevant information that appears in the in the finished product. That really doesn't help the physician when they're reviewing the patient's record for future visits or other physicians doing follow up visits with that particular patient," he explained.
The future of GPT
From a compliance perspective, the use of GPT integrations for medical notes raises data security questions, such as, where is the data being stored?
A few weeks ago, The Verge reported that a bug temporarily exposed AI chat histories to other users.
But it's the hallucinations of newer LLMs that really concern Krakaris.
"You don't want the the model to extrapolate. Because that could lead to really bad outcomes in terms of what appears and becomes a permanent element within that electronic health record of the patient. So, you have to put guardrails against hallucinations, if you will. And the larger the aperture of prompts – things that you ask it – the more general those questions are, and the more likely you're going to have inaccuracies or hallucinations appear," he said.
When you're using LLMs, the output will depend on the right prompts or questions and sequence of them, he explained.
"The tighter your prompts or queries, the narrower the aperture of the LLM. So, it doesn't have to search or compile as much information, and therefore, the output from the LLM is going to be a lot more accurate and pertinent to the to the specific prompt you've given it," he said.
Augmedix, he said, has developed its AI predicated on structured data and with hundreds of models that organize data based on a complaint or condition. These models serve as a road map of prompts to generate the appropriate answers for compiling medical notes.
After that, content validation is needed because AI technology is not perfect, Krakaris said.
"You have to understand those limitations and incorporate that into your process, whatever it happens to be, to deliver something that is useful and a value to your constituents."
Augmedix is in Booth 8531 at HIMSS23.
Andrea Fox is senior editor of Healthcare IT News.
Email: afox@himss.org
Healthcare IT News is a HIMSS Media publication.
Ty Vachon will offer more detail in the HIMSS23 session "ML and AI Forum: 2023 AI in Healthcare: The Good, The Bad and The Hopeful." It is scheduled for Monday, April 17 at 3 p.m. – 4 p.m. CT at the South Building, Level 1, room S100 B.
 
        






















