Why the early tests of ChatGPT in medicine miss the mark
Stanford researchers asked GPT-4 questions about real-life medical situations. The AI didn't fare as well as it did on the medical boards.
ChatGPT has rocketed into health care like a medical prodigy. The artificial intelligence tool correctly answered more than 80% of board exam questions, showing an impressive depth of knowledge in a field that takes even elite students years to master.
But in the hype-heavy days that followed, experts at Stanford University began to ask the AI questions drawn from real situations in medicine — and got much different results. Almost 60% of its answers either disagreed with human specialists or provided information that wasn’t clearly relevant.
What's Your Reaction?