Skip to Main Content (Press Enter)

Logo UNISS
  • ×
  • Home
  • Degrees
  • Courses
  • Jobs
  • People
  • Outputs
  • Organizations
  • Third Mission
  • Expertise & Skills

Logo UNISS

|

UNIFIND

uniss.it
  • ×
  • Home
  • Degrees
  • Courses
  • Jobs
  • People
  • Outputs
  • Organizations
  • Third Mission
  • Expertise & Skills
  1. Outputs

Is ChatGPT-4 accurate and complete when answering questions on tuberculosis? Results of the ChatGTB study

Academic Article
Publication Date:
2025
Short description:
Is ChatGPT-4 accurate and complete when answering questions on tuberculosis? Results of the ChatGTB study / De Vito, A.; Colpani, A.; Buonsenso, D.; Candoli, P. M. M.; Falbo, E.; La Fauci, S.; Madeddu, G.; Masini, T.; Misiano, G.; Monari, C.; Pontarelli, A.; Riccardi, N.; Saderi, L.; Saluzzo, F.; Sotgiu, G.; Tadolini, M.; Besozzi, G.; Calcagno, A.. - In: INFECTIOUS DISEASES AND TROPICAL MEDICINE. - ISSN 2379-4054. - 11:(2025). [10.32113/idtm_202510_1766]
abstract:
Objective: Artificial intelligence (AI), particularly large language models like ChatGPT, offers the potential to disseminate health information. This study aimed to assess the accuracy and completeness of ChatGPT-4’s responses to TB-related questions. Materials and Methods: Ninety English-language TB questions based on official guidelines and clinical experience were formulated. ChatGPT-4o provided answers to these questions between February 1 and March 1, 2024. Three evaluation subgroups assessed the responses for accuracy (using a sixpoint Likert scale) and completeness (using a three-point Likert scale). Statistical analyses were performed using non-parametric tests. Results: The median accuracy score was 5 out of 6, with 88.9% of responses scoring at least 5, indicating high overall accuracy. However, only 34.4% achieved the highest score of 6, with diminished performance on medium and high level of expertise (LOE) questions. Low LOE questions had the highest accuracy, with 63.3% scoring 6. Completeness scores showed that 48.9% of responses were comprehensive (score of 3), particularly for low LOE questions (70% scored 3). In contrast, only 23.3% of high LOE questions achieved the highest completeness score. ChatGPT-4 often lacked specificity in complex topics, such as drug-resistant TB therapies, and provided outdated information not aligned with current World Health Organization guidelines. Conclusions: ChatGPT-4 effectively delivers accurate and comprehensive information for general TB inquiries, making it a valuable resource for the public and non-specialist clinicians. However, its performance declines with increasing question complexity, limiting its utility for advanced clinical decision-making in TB care. Continuous updates and enhancements are necessary to improve its accuracy and relevance in specialised medical contexts.
Iris type:
1.1 Articolo in rivista
Keywords:
ChatGPT; Education; LLM; Prevention; Tuberculosis; Tuberculosis treatment
List of contributors:
De Vito, A.; Colpani, A.; Buonsenso, D.; Candoli, P. M. M.; Falbo, E.; La Fauci, S.; Madeddu, G.; Masini, T.; Misiano, G.; Monari, C.; Pontarelli, A.; Riccardi, N.; Saderi, L.; Saluzzo, F.; Sotgiu, G.; Tadolini, M.; Besozzi, G.; Calcagno, A.
Authors of the University:
DE VITO ANDREA
MADEDDU Giordano
SOTGIU Giovanni
Handle:
https://iris.uniss.it/handle/11388/373001
Published in:
INFECTIOUS DISEASES AND TROPICAL MEDICINE
Journal
  • Use of cookies

Powered by VIVO | Designed by Cineca | 26.5.0.0