Skip to Main Content (Press Enter)

Logo UNISS
  • ×
  • Home
  • Corsi
  • Insegnamenti
  • Professioni
  • Persone
  • Pubblicazioni
  • Strutture
  • Terza Missione
  • Competenze

Logo UNISS

|

UNIFIND

uniss.it
  • ×
  • Home
  • Corsi
  • Insegnamenti
  • Professioni
  • Persone
  • Pubblicazioni
  • Strutture
  • Terza Missione
  • Competenze
  1. Pubblicazioni

Comparing large language models for antibiotic prescribing in different clinical scenarios: which performs better?

Articolo
Data di Pubblicazione:
2025
Citazione:
Comparing large language models for antibiotic prescribing in different clinical scenarios: which performs better? / De Vito, A.; Geremia, N.; Bavaro, D. F.; Seo, S. K.; Laracy, J.; Mazzitelli, M.; Marino, A.; Maraolo, A. E.; Russo, A.; Colpani, A.; Bartoletti, M.; Cattelan, A. M.; Mussini, C.; Parisi, S. G.; Vaira, L. A.; Nunnari, G.; Madeddu, G.. - In: CLINICAL MICROBIOLOGY AND INFECTION. - ISSN 1198-743X. - 31:8(2025), pp. 1336-1342. [10.1016/j.cmi.2025.03.002]
Abstract:
Objectives: Large language models (LLMs) show promise in clinical decision-making, but comparative evaluations of their antibiotic prescribing accuracy are limited. This study assesses the performance of various LLMs in recommending antibiotic treatments across diverse clinical scenarios. Methods: Fourteen LLMs, including standard and premium versions of ChatGPT, Claude, Copilot, Gemini, Le Chat, Grok, Perplexity, and Pi.ai, were evaluated using 60 clinical cases with antibiograms covering 10 infection types. A standardized prompt was used for antibiotic recommendations focusing on drug choice, dosage, and treatment duration. Responses were anonymized and reviewed by a blinded expert panel assessing antibiotic appropriateness, dosage correctness, and duration adequacy. Results: A total of 840 responses were collected and analysed. ChatGPT-o1 demonstrated the highest accuracy in antibiotic prescriptions, with 71.7% (43/60) of its recommendations classified as correct and only one (1.7%) incorrect. Gemini and Claude 3 Opus had the lowest accuracy. Dosage correctness was highest for ChatGPT-o1 (96.7%, 58/60), followed by Perplexity Pro (90.0%, 54/60) and Claude 3.5 Sonnet (91.7%, 55/60). In treatment duration, Gemini provided the most appropriate recommendations (75.0%, 45/60), whereas Claude 3.5 Sonnet tended to over-prescribe duration. Performance declined with increasing case complexity, particularly for difficult-to-treat microorganisms. Discussion: : There is significant variability among LLMs in prescribing appropriate antibiotics, dosages, and treatment durations. ChatGPT-o1 outperformed other models, indicating the potential of advanced LLMs as decision-support tools in antibiotic prescribing. However, decreased accuracy in complex cases and inconsistencies among models highlight the need for careful validation before clinical utilization.
Tipologia CRIS:
1.1 Articolo in rivista
Keywords:
Antibiotic treatment; Antimicrobial susceptibility testing; ChatGPT-o1; Difficult-to-treat infection; Large language models; LLMs
Elenco autori:
De Vito, A.; Geremia, N.; Bavaro, D. F.; Seo, S. K.; Laracy, J.; Mazzitelli, M.; Marino, A.; Maraolo, A. E.; Russo, A.; Colpani, A.; Bartoletti, M.; Cattelan, A. M.; Mussini, C.; Parisi, S. G.; Vaira, L. A.; Nunnari, G.; Madeddu, G.
Autori di Ateneo:
DE VITO ANDREA
MADEDDU Giordano
VAIRA Luigi Angelo
Link alla scheda completa:
https://iris.uniss.it/handle/11388/367359
Pubblicato in:
CLINICAL MICROBIOLOGY AND INFECTION
Journal
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.5.1.0