You’ve likely heard about ChatGPT by now. The conversational AI can do it all, even power Microsoft’s Bing. Well, according to a study in the open-access journal PLOS Digital Health by Tiffany Kung, Victor Tseng, and colleagues at AnsibleHealth, the artificial intelligence can even (just about) pass the United States Medical Licensing Exam (USMLE) with a score near 60%.
For those that don’t know, ChatGPT is a new AI system known as a large language model designed to interface with us in a human-like conversational style. The system can’t search the internet, instead generating text using word relationships which it predicts via internal processes.
The research team tested ChatGPT’s performance on the USMLE, a highly standardized and regulated series of three exams (Steps 1, 2CK, and 3) required for medical licensure in the United States. The test, taken by med students and physicians-in-training, assesses knowledge spanning most medical disciplines, ranging from biochemistry to diagnostic reasoning to bioethics.
While the team removed image-based questions to allow the AI to participate and removed “indeterminate responses,” once complete, the artificial intelligence scored between 52.4% and 75% across three exams. It also demonstrated concordance across its responses and generated at least one significant insight (something new, non-obvious, and clinically valid) for 88.9% of its answers. The passing threshold is about 60%.
The research team believes that their findings provide a glimpse of ChatGPT’s potential to enhance medical education and, eventually, clinical practice. Clinicians at AnsibleHealth already use ChatGPT to rewrite jargon-heavy reports for easier patient comprehension.
“Reaching the passing score for this notoriously difficult expert exam, and doing so without any human reinforcement, marks a notable milestone in clinical AI maturation,” say the authors.
The team also used ChatGPT to help write their paper, interacting with the AI “much like a colleague, asking it to synthesize, simplify, and offer counterpoints to drafts in progress.”
You can find their paper here: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000198