
Can artificial intelligence contribute to improved clinical reasoning?
Key Takeaways
- GPT-4 did not significantly improve physicians' diagnostic reasoning compared to conventional resources like UpToDate and Google.
- The study involved 50 U.S.-licensed physicians across family, internal, and emergency medicine.
A recent study compared the diagnostic performance of physicians referencing AI to those limited to conventional resources.
Researchers from the
“
The study analyzed 50 total U.S.-licensed physicians across family, internal and emergency medicine. The median diagnostic reasoning score per case was 76% for the group with AI access, and 74% for the group only referencing conventional resources. The AI group spent an average of 519 seconds per case, compared with 565 seconds per case for the conventional resources group.
Researchers were able to conclude that access to GPT-4 did not significantly increase physicians’ diagnostic reasoning, although, on its own, the LLM did surpass the performances of both clinicians using conventional diagnostic online resources, and clinicians assisted by the program. These findings could prove the necessity of further research to understand how clinicians should be trained to use these tools.
Independently, the LLM demonstrated higher performance than either physician group, thereby indicating the need for training and development to achieve the full potential of physician-AI collaboration in clinical practice. At the forefront of these efforts, the four institutions behind the study announced a collaboration on a bi-coastal AI evaluation network, ARiSE, designed to further evaluate generative AI outputs in healthcare.
Newsletter
Stay informed and empowered with Medical Economics enewsletter, delivering expert insights, financial strategies, practice management tips and technology trends — tailored for today’s physicians.