%0 Journal Article
%T Performance of Artificial Intelligence Chatbots on Standardized Medical Examination Questions in Obstetrics & Gynecology
%A Angelo Cadiente
%A Natalia DaFonte
%A Jonathan D. Baum
%J Open Journal of Obstetrics and Gynecology
%P 1-9
%@ 2160-8806
%D 2025
%I Scientific Research Publishing
%R 10.4236/ojog.2025.151001
%X Objective: This study assesses the quality of artificial intelligence chatbots in responding to standardized obstetrics and gynecology questions. Methods: Using ChatGPT-3.5, ChatGPT-4.0, Bard, and Claude to respond to 20 standardized multiple choice questions on October 7, 2023, responses and correctness were recorded. A logistic regression model assessed the relationship between question character count and accuracy. For each incorrect question, an independent error analysis was undertaken. Results: ChatGPT-4.0 scored a 100% across both obstetrics and gynecology questions. ChatGPT-3.5 scored a 95% overall, earning an 85.7% in obstetrics and a 100% in gynecology. Claude scored a 90% overall, earning a 100% in obstetrics and an 84.6% in gynecology. Bard scored a 77.8% overall, earning an 83.3% in obstetrics and a 75% in gynecology and would not respond to two questions. There was no statistical significance between character count and accuracy. Conclusions: ChatGPT-3.5 and ChatGPT-4.0 excelled in both obstetrics and gynecology while Claude performed well in obstetrics but possessed minor weaknesses in gynecology. Bard comparatively performed the worst and had the most limitations, leading to our support of the other artificial intelligence chatbots as preferred study tools. Our findings support the use of chatbots as a supplement, not a substitute for clinician-based learning or historically successful educational tools.
%K Large Language Models
%K ChatGPT
%K Bard
%K Claude
%K Medical Education
%U http://www.scirp.org/journal/PaperInformation.aspx?PaperID=138736