STAT+: Hospitals struggle to validate AI-generated clinical summaries. ‘It’s a bit chaotic’
AI-powered summarization tools can save time and angst for hospital staff, but they can also introduce mistakes, or make things up.
Rob Bart remembers what it felt like, the moment of discovery. As an intern at Duke University Medical Center in the ’90s, he’d sometimes be tasked with poring through a patient’s medical history to uncover the cause of their latest hospitalization. Back then, the stacks of paper records could tower 18 inches tall.
“I can remember that needle in the haystack feeling,” said Bart, now chief medical information officer at the University of Pittsburgh Medical Center, “when you found that one thing in the medical record that helps us figure this out.”
Today, he replays those memories when he hears his colleagues talking about the promise of large language models to summarize medical records. Electronic health records have led to even larger haystacks that stymie needle-hunting clinicians. But despite the capabilities of models like OpenAI’s GPT-4, it’s so far unclear whether they’re ready for the high stakes of clinical summarization, when a single missing word could mean the difference in a diagnosis.
What's Your Reaction?