Hassanein FEA, Hussein RR, Elgarhy MR, Maher SM, Hassen A, Heidar S, Ezz El Arab M, Edress A, Abou-Bakr A, Mekhemar M. Artificial intelligence versus human dental expertise in diagnosing periapical pathosis on periapical radiographs: a multicenter study. Bioengineering. 2026;13(2):232.
Article PubMed PubMed Central Google Scholar
Hassanein F. Evaluating multimodal large language models for clinical diagnosis of oral lesions: a biomedical informatics perspective. 2025.
Almohareb T, Abou-Bakr A, Hassanein FEA, Ahmed Y, Hamza M, Aboheikal M, Nagi N. Clinical and patient comparison of AI and expert digital smile design: a prospective paired study. Dent J. 2026;14(3):166.
Ras AA, Kheir El Din NH, Talaat AM, Hussein RR, Khalil E. Mucocutaneous changes in end-stage renal disease under regular hemodialysis—a cross-sectional study. Indian J Dent Res. 2023;34(2):130–5.
Ghalwash D, Ammar A, Abou-Bakr A, Diab AH, El-Gawish A. Validation of salivary proteomic biomarkers for early detection of oral cancer in the Egyptian population. Future Sci OA. 2025;11(1):2432222.
Ghalwash D, El-Gawish A, Ammar A, Hamdy A, Ghanem R, Ghanem M, et al. Epidemiology of Sjogren’s syndrome in a sample of the Egyptian population: a cross-sectional study. J Int Med Res. 2024;52(10):3000605241289292.
Article CAS PubMed PubMed Central Google Scholar
Abou-Bakr A, Hassanein FEA. Comment on “Diagnostic Performance of Multimodal Large Language Models in the Analysis of Oral Pathology”. Oral Dis. 2026. https://doi.org/10.1111/odi.70216.
Schwendicke F, Samek W, Krois J. Artificial intelligence in dentistry: chances and challenges. J Dent Res. 2020;99(7):769–74.
Article CAS PubMed PubMed Central Google Scholar
Abou-Bakr A, Eissa AA, Alshikh B, Ahmed Y, AbuShady EF, Tassoker M, et al. Comparative diagnostic accuracy of ChatGPT models in salivary gland disease: a multimodal vignette-based evaluation. Eur Arch Otorhinolaryngol. 2025. https://doi.org/10.1007/s00405-025-09925-5.
Robaian A, Hassanein FEA, Hassan MT, Alqahtani AS, Abou-Bakr A. A multimodal large language model framework for clinical subtyping and malignant transformation risk prediction in oral lichen planus: a paired comparison with expert clinicians. Int Dent J. 2026;76(1):109357.
Ouyang L, Wu J, Jiang X, Almeida D, Wainwright CL, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A et al: Training language models to follow instructions with human feedback (2022). ArXiv: abs/2203.02155.
McKinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, et al. International evaluation of an AI system for breast cancer screening. Nature. 2020;577(7788):89–94.
Article CAS PubMed Google Scholar
Hassanein FEA, Hussein RR, Almalahy HG, Sarhan S, Ahmed Y, Abou-Bakr A. Vision-based diagnostic gain of ChatGPT-5 and gemini 2.5 pro compared with human experts in oral lesion assessment. Sci Rep. 2025;15(1):43279.
Article CAS PubMed PubMed Central Google Scholar
AlFarabi Ali S, AlDehlawi H, Jazzar A, Ashi H, Esam Abuzinadah N, AlOtaibi M, et al. The diagnostic performance of large language models and oral medicine consultants for identifying oral lesions in text-based clinical scenarios: prospective comparative study. JMIR AI. 2025;4:e70566.
Article PubMed PubMed Central Google Scholar
Grinberg N, Whitefield S, Kleinman S, Ianculovici C, Wasserman G, Peleg O. Artificial intelligence differential diagnosis of soft-tissue oral lesions using ChatGPT. Oral Surg Oral Med Oral Pathol Oral Radiol. 2025;139(2):e54–5.
Grinberg N, Whitefield S, Kleinman S, Ianculovici C, Wasserman G, Peleg O. Assessing the performance of an artificial intelligence based chatbot in the differential diagnosis of oral mucosal lesions: clinical validation study. Clin Oral Investig. 2025;29(4):188.
Abou-Bakr A, El Barbary A, Hassanein FEA: ChatGPT-5 vs oral medicine experts for rank-based differential diagnosis of oral lesions: a prospective, biopsy-validated comparison. Odontology. 2025.
Hassanein FEA, Hussein RR, Ahmed Y, El-Guindy J, Ahmed DE, Abou-Bakr A. Calibration of AI large language models with human subject matter experts for grading of clinical short-answer responses in dental education. BMC Oral Health. 2026;26(1):286.
Article PubMed PubMed Central Google Scholar
Hassanein FEA, Ahmed Y, Maher S, Barbary AE, Abou-Bakr A. Prompt-dependent performance of multimodal AI model in oral diagnosis: a comprehensive analysis of accuracy, narrative quality, calibration, and latency versus human experts. Sci Rep. 2025;15(1):37932.
Article CAS PubMed PubMed Central Google Scholar
Hirosawa T, Kawamura R, Harada Y, Mizuta K, Tokumasu K, Kaji Y, et al. ChatGPT-generated differential diagnosis lists for complex case-derived clinical vignettes: diagnostic accuracy evaluation. JMIR Med Inform. 2023;11:e48808.
Article PubMed PubMed Central Google Scholar
Wei J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F, Chi EH, Le QV, Zhou D: Chain-of-thought prompting elicits reasoning in large language models. In: Proceedings of the 36th International Conference on Neural Information Processing Systems, 2022. New Orleans, LA, USA: Curran Associates Inc.
Vaira LA, Lechien JR, Maniaci A, De Vito A, Mayo-Yáñez M, Troise S, et al. Diagnostic performance of ChatGPT-4o in analyzing oral mucosal lesions: a comparative study with experts. Medicina Kaunas. 2025;61(8):1379.
Article PubMed PubMed Central Google Scholar
Sivarajkumar S, Kelley M, Samolyk-Mazzanti A, Visweswaran S, Wang Y. An empirical evaluation of prompting strategies for large language models in zero-shot clinical natural language processing: algorithm development and validation study. JMIR Med Inform. 2024;12:e55318.
Article PubMed PubMed Central Google Scholar
Zhou K, Liu Z, Qiao Y, Xiang T, Loy CC: Domain Generalization: A Survey, 2021.
Reise S. The rediscovery of bifactor measurement models (vol 47, pg 667, 2012). Multivar Behav Res. 2013;48:461–461.
Bishop C: Pattern recognition and machine learning. In. Vol 16, edn.; 2006. pp. 140–155.
Sounderajah V, Ashrafian H, Aggarwal R, De Fauw J, Denniston AK, Greaves F, et al. Developing specific reporting guidelines for diagnostic accuracy studies assessing AI interventions: the STARD-AI steering group. Nat Med. 2020;26(6):807–8.
Article CAS PubMed Google Scholar
Glick M, Greenberg MS, Lockhart PB, Challacombe SJ. Burket’s Oral Medicine. Wiley; 2021.
Neville BW, Damm DD, Allen CM, Chi AC: Oral and maxillofacial pathology: Elsevier Health Sciences. 2015.
Hendrycks D, Dietterich T: Benchmarking neural network robustness to common corruptions and perturbations; 2019.
Zhou K, Liu Z, Qiao Y, Xiang T, Loy CC. Domain generalization: a survey. IEEE Trans Pattern Anal Mach Intell. 2023;45(4):4396–415.
Bishop CM: Pattern recognition and machine learning. New York: Springer; 2006.
Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, et al. Chain-of-thought prompting elicits reasoning in large language models. Adv Neural Inf Process Syst. 2022;35:24824–37.
Jin Q, Chen F, Zhou Y, Xu Z, Cheung JM, Chen R, et al. Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine. NPJ Digit Med. 2024;7(1):190.
Article PubMed PubMed Central Google Scholar
Chen P, Huang Z, Deng Z, Li T, Su Y, Wang H, Ye J, Qiao Y, He J: Enhancing medical task performance in gpt-4v: a comprehensive study on prompt engineering strategies; 2023. arXiv: 231204344.
Vaira LA, Lechien JR, Abbate V, Gabriele G, Frosolini A, De Vito A, et al. Enhancing AI chatbot responses in health care: the SMART prompt structure in head and neck surgery. OTO Open. 2025;9(1):e70075.
Article PubMed PubMed Central Google Scholar
Renze M, Guven E: Self-reflection in llm agents: Effects on problem-solving performance; 2024. arXiv:240506682.
Alam L, Mueller ST. Examining physicians’ explanatory reasoning in re-diagnosis scenarios for improving AI diagnostic systems. J Cogn Eng Decis Mak. 2022;16(2):63–78.
Nishida N, Yamakawa M, Shiina T, Mekada Y, Nishida M, Sakamoto N, et al. Artificial intelligence (AI) models for the ultrasonographic diagnosis of liver tumors and comparison of diagnostic accuracies between AI and human experts. J Gastroenterol. 2022;57(4):309–21.
Article PubMed PubMed Central Google Scholar
Chan PZ, Ramli MAIB, Chew HSJ. Diagnostic test accuracy of artificial intelligence-assisted detection of acute coronary syndrome: a systematic review and meta-analysis. Comput Biol Med. 2023;167:107636.
Sounderajah V, Ashrafian H, Golub RM, Shetty S, De Fauw J, Hooft L, et al. Developing a reporting guideline for artificial intelligence-centred diagnostic test accuracy studies: the STARD-AI protocol. BMJ Open. 2021;11(6):e047709.
Comments (0)