Is ChatGPT-4 accurate and complete when answering questions on tuberculosis? Results of the ChatGTB study

Academic Article

Publication Date:

2025

Short description:

Is ChatGPT-4 accurate and complete when answering questions on tuberculosis? Results of the ChatGTB study / De Vito, A.; Colpani, A.; Buonsenso, D.; Candoli, P. M. M.; Falbo, E.; La Fauci, S.; Madeddu, G.; Masini, T.; Misiano, G.; Monari, C.; Pontarelli, A.; Riccardi, N.; Saderi, L.; Saluzzo, F.; Sotgiu, G.; Tadolini, M.; Besozzi, G.; Calcagno, A.. - In: INFECTIOUS DISEASES AND TROPICAL MEDICINE. - ISSN 2379-4054. - 11:(2025). [10.32113/idtm_202510_1766]

abstract:

Objective: Artificial intelligence (AI), particularly large language models like ChatGPT, offers the potential to disseminate health information. This study aimed to assess the accuracy and completeness of ChatGPT-4’s responses to TB-related questions. Materials and Methods: Ninety English-language TB questions based on official guidelines and clinical experience were formulated. ChatGPT-4o provided answers to these questions between February 1 and March 1, 2024. Three evaluation subgroups assessed the responses for accuracy (using a sixpoint Likert scale) and completeness (using a three-point Likert scale). Statistical analyses were performed using non-parametric tests. Results: The median accuracy score was 5 out of 6, with 88.9% of responses scoring at least 5, indicating high overall accuracy. However, only 34.4% achieved the highest score of 6, with diminished performance on medium and high level of expertise (LOE) questions. Low LOE questions had the highest accuracy, with 63.3% scoring 6. Completeness scores showed that 48.9% of responses were comprehensive (score of 3), particularly for low LOE questions (70% scored 3). In contrast, only 23.3% of high LOE questions achieved the highest completeness score. ChatGPT-4 often lacked specificity in complex topics, such as drug-resistant TB therapies, and provided outdated information not aligned with current World Health Organization guidelines. Conclusions: ChatGPT-4 effectively delivers accurate and comprehensive information for general TB inquiries, making it a valuable resource for the public and non-specialist clinicians. However, its performance declines with increasing question complexity, limiting its utility for advanced clinical decision-making in TB care. Continuous updates and enhancements are necessary to improve its accuracy and relevance in specialised medical contexts.

Iris type:

1.1 Articolo in rivista

Keywords:

ChatGPT; Education; LLM; Prevention; Tuberculosis; Tuberculosis treatment

List of contributors:

De Vito, A.; Colpani, A.; Buonsenso, D.; Candoli, P. M. M.; Falbo, E.; La Fauci, S.; Madeddu, G.; Masini, T.; Misiano, G.; Monari, C.; Pontarelli, A.; Riccardi, N.; Saderi, L.; Saluzzo, F.; Sotgiu, G.; Tadolini, M.; Besozzi, G.; Calcagno, A.

Authors of the University:

DE VITO ANDREA

MADEDDU Giordano

SOTGIU Giovanni

Handle:

https://iris.uniss.it/handle/11388/373001

Published in:

INFECTIOUS DISEASES AND TROPICAL MEDICINE

Journal