
AI in Medical Education: Assessing the Vulnerability of Digital Health Exams
Evaluating AI in Medical Education: Strengths and Limitations
The rapid integration of AI in medical education presents both unprecedented opportunities and significant challenges for academic integrity. As generative tools like ChatGPT become ubiquitous, educators in Digital Health and Health Information Management (DIGHIM) must understand which assessment formats are most vulnerable to AI-generated content. Specifically, a recent quasi-experimental pilot study evaluated ChatGPT’s performance across various task types to provide data-driven recommendations for curriculum design.
How ChatGPT Performs Across Assessment Types
The study revealed that ChatGPT excels in objective, rule-based environments. For instance, it achieved a high mean score of 88% in health classification quizzes involving multiple-choice items. Furthermore, the AI produced coherent and well-structured responses for reflective assessments. However, these reflective outputs often lacked the deep personalization and nuanced industry context required for professional practice. While the AI can simulate logical structures, it frequently misses the specific domain insights that human students provide. Consequently, markers found the AI work lacked the expected professional depth.
Critical Gaps in Technical AI in Medical Education
Technical and scenario-based tasks exposed the most significant limitations of current generative models. In SQL health database programming, ChatGPT averaged only 42% due to persistent schema errors and incomplete queries. Moreover, its performance in clinical coding using ICD-10-AM conventions was even more striking, where it scored a mere 7%. These results indicate that AI lacks the precision necessary for complex medical classifications and data interpretation. Therefore, educators should prioritize these high-complexity areas to ensure authentic student evaluation. In addition, using AI as a critique tool rather than a primary author may improve learning outcomes.
In the Indian context, the National Medical Commission (NMC) has recently emphasized that AI should support rather than replace clinical judgment. Consequently, medical colleges are moving toward \"AI-ready\" classrooms while maintaining strict ethical standards and academic integrity. This study confirms that while AI can assist in content refinement, it cannot substitute for the critical reasoning required in clinical practice.
Frequently Asked Questions
Which assessment types are most susceptible to AI cheating?
Objective tasks like multiple-choice quizzes and well-structured reflective essays are highly susceptible. AI performs best when following clear rules or generating standard logical structures.
Can AI accurately perform clinical coding for medical exams?
No, current research shows that AI performs poorly in clinical coding tasks, such as ICD-10-AM, due to a lack of precision in applying complex coding conventions and navigating health data schemas.
Disclaimer: This content is for informational and educational purposes only. It does not constitute medical advice or a substitute for professional healthcare education. Refer to the latest local and national guidelines for clinical practice.
References
- Wani TA et al. Susceptibility of Assessment Types to AI-Generated Content in Digital Health and Health Information Management Education: Quasi-Experimental Pilot Study. JMIR Med Educ. 2026 Mar 30. doi: 10.2196/82988. PMID: 41911020.
- Teixeira B et al. Can ChatGPT Support Clinical Coding Using the ICD-10-CM/PCS? Informatics. 2024; 11(4):84. doi: 10.3390/informatics11040084.
- National Board of Examinations in Medical Sciences (NBEMS). Programme on Artificial Intelligence in Medical Education. Available from: natboard.edu.in.

More from MedShots Daily

A pilot study evaluates ChatGPT's performance in health informatics assessments, revealing strengths in quizzes but critical failures in technical coding ta...
3 weeks back

Researchers modified Aspergillus flavus uricase with carbon dots, significantly reducing immunogenicity and inflammatory cytokines in animal models for gout...
Today

Scientists discovered conserved cell states between sea urchins and humans, suggesting sea urchins are powerful models for studying human ovarian aging....
Today

A recent study shows 7 in 10 heart failure patients in India lack insurance, with 90% of treatment costs paid out-of-pocket, leading to financial distress....
Today

Fortis Hospitals Bengaluru has launched a Preventive Genomics Clinic to offer advanced genetic diagnostics, prenatal screening, and personalized medicine....
Today

Dr Nikhil Tandon, Dean (Academic), takes over as interim AIIMS Delhi Director following Dr M Srinivas's transition to NITI Aayog as a full-time member....
Today