Peer-reviewed by human experts: AI failed in key steps to generate a scoping review on the neural mechanisms of cross-education

Articolo

Data di Pubblicazione:

2025

Citazione:

Peer-reviewed by human experts: AI failed in key steps to generate a scoping review on the neural mechanisms of cross-education / Morrone, M., Hortobágyi, T., Kidgell, D., Farthing, J.P., Deriu, F., Manca, A.. - In: EUROPEAN JOURNAL OF APPLIED PHYSIOLOGY. - ISSN 1439-6319. - Dec 2025(2025). [10.1007/s00421-025-06100-w]

Abstract:

The integration of Large Language Models (LLMs) into scientific writing presents significant opportunities for scholars but
also risks, including misinformation and plagiarism. A new body of literature is shaping to verify the capability of LLMs
to execute the complex tasks that are inherent to academic publishing. In this context this study was driven by the need to
critically assess LLM’s out-of-the-box performance in generating evidence synthesis reviews. To this end, the signature topic
of the authors’ group, cross-education of voluntary force, was chosen as a model. We prompted a popular LLM (Gemini
2.5 Pro, Deep Research enabled) to generate a scoping review on the neural mechanisms underpinning cross-education. The
resulting unedited manuscript was submitted for formal peer-review to four leading subject-matter experts. Their qualitative
feedback on manuscript’s structure, content, and integrity was collated and analyzed. Peer-reviewers identified critical failures
at fundamental stages of the review process. The LLM failed to: (1) identify specific research questions; (2) adhere to established
methodological frameworks; (3) implement trustworthy search strategies; (4) objectively synthesize data. Importantly,
the Results section was deemed interpretative rather than descriptive. Referencing was agreed as the worst issue being inaccurate,
biased toward open-access sources (84%), and containing instances of plagiarism. The LLM also failed to hierarchize
evidence, presenting minor or underexplored findings as established evidence. The LLM generated a non-systematic, poorly
structured, and unreliable narrative review. These findings suggest that the selected LLM is incapable of autonomously performing scientific synthesis and requires massive human supervision to correct the observed issues.

Tipologia CRIS:

1.1 Articolo in rivista

Keywords:

Generative AI · Plagiarims · Scholarly Publishing · Peer review · Evidence synthesis · Neurophysiology

Elenco autori: