(Ro)Bot Reviewers: what role might AI play in evaluating our research articles?

[Version anglaise de l’article « (Ro)Bot-Reviewing : quelle place pour l’IA dans l’évaluation de nos articles ? »]

The editorial team is pleased to welcome Caitlin Martin, who will be writing articles for the « Tales from a postdoc » section this year. She introduces herself in a few words:

Hello! Caitlin Martin here, currently a postdoctoral researcher in the Human Genetics and Cognitive Functions group at the Institut Pasteur. I began my scientific career studying genomic evolution and health, specifically in paleogenomics (i.e. working with “ancient DNA”). I am Canadian (from Toronto) and I am thrilled to pick up the torch for this segment of the open science blog. I consider open science imperative for the transparent exchange and advance of knowledge, certainly between researchers, but also for the wider public audience.

AI tools for helping review academic articles – lately there is lots of buzz, but what is the true potential for researchers? Will the adoption of AI-assisted peer review increase access to open science, or instead, lead to ever-more opaque reviewing?

Peer reviews that are more pertinent, more transparent, and above all, more equitable – such is the promise of Niv Samuel Mastboim and Oded Rechavi, creators of the tool “q.e.d.”. This AI reviewer takes its name from “quod erat demonstrandum” (what has been demonstrated), and is designed to comprehensively evaluate your unpublished manuscript in the context of published literature. Mastboim and Rechavi dream of a world in which our articles will be submitted to q.e.d., improved in response to its criticism, and subsequently submitted and accepted by journals, based purely on this evaluation and corrections. One day, they imagine the entire scientific community may self-publish q.e.d. peer reviews together with the corrected articles in open access, a global extension of eLife’s model.

Even if we are far from such an open science vision of utopia, Mastboim and Rechavi are not the only ones to imagine AI tools in the academic review process. The past year has witnessed an explosion of such tools, perhaps because researchers themselves increasingly employ AI to conduct their peer reviews. A recent report from the publishing house Frontiers revealed that more than 50% of the 1600 researchers surveyed use AI in their reviewing:

29 % to generate a summary of the article,
28 % to detect malpractice (e.g. falsified data or image manipulation),
19 % to evaluate the methodology and content,
59 % to write their letter to the authors.

Overall, researchers report a 24% increased use of AI in their peer reviewing in the past year.

If peer reviewers already use AI in their reports, why not create and train specialized tools for the task? Moreover, why not make these tools directly available to authors, such that they can evaluate their work and target weak points before manuscript submission? Currently, there are two major categories of peer-review AI assistants:

1. Tools for editors

AI-generated writing detection (e.g. Geppetto from Springer Nature),
Manipulated image detection (e.g. SnappShot also from Springer Nature),
Article summarization (e.g. Eliza).

2. Tools for researchers (to improve their articles prior to submission)

Free-access tools, developed by companies (e.g. Nature Research Assistant from Springer Nature) or by researchers (e.g. q.e.d. and the currently experimental Reviewer#2, developed internally and ergo uniquely accessible for Pasteur Institute researchers),
Paid-access tools (e.g. PaperWizard).

But will these tools truly prove useful for writing our articles? To get an idea, I conducted a small investigation comparing three of them: two still in Beta development (Nature Research Assistant and the Pasteur Institute’s Reviewer#2) and the much-talked-about q.e.d.

Nature Research Assistant

Still in testing, the manuscript advisor is a LLM developed by Stephanie Preuß, Niki Scaplehorn, et Thomas van Dongen, to improve the quality of scientific communication in a manuscript. As a result, this tool seems to principally evaluate the writing itself: the fluidity of phrases, overall coherence, and the pertinence of titles and conclusions. The tool conducts a summary of the paper, generates keywords and an abstract, and identifies phrases which are too long or convoluted. It does not, however, conduct any evaluation of the scientific content.

Reviewer#2

In contrast, Reviewer#2, largely based on published literature, is adept at finding weak points in your work, often without mincing words. Fellow researchers, take heart- I think it will reject all our papers! This tool’s strength is its exhaustiveness: it correctly identified some complementary references (unfamiliar to me) and arguments in need of further development. That said, it failed to account for some journals’ word and reference limits, and it missed key information present in the article which already responded to several of its criticisms. For acceptance, the tool also requested a series of elaborate and time-consuming analyses, well beyond the scope of the original project (in this capacity, it is an accurate imitation of human reviewers)!

q.e.d.

Finally, let us turn to q.e.d. Different from tools created by publishing houses or companies, this tool was developed by researchers for researchers, at the university of Tel Aviv. At first glance, it seems to strike a balance between structure and content analysis. In my tests, q.e.d. proposed relevant, mostly feasible, supplementary experiments, through (like Reviewer#2) referring to published literature. Overall, q.e.d’s evaluation was the closest to the official reviewer reports we had received. Nevertheless, it did not suggest any experiment or improvement which human reviewers had not identified, and it was, like Reviewer#2, quite repetitive in its criticisms.

Currently, journals do not accept our manuscripts following changes based solely on AI criticism; rather, they all require review by a human peer. We may ask ourselves, then, is there any benefit to us, as researchers, adopting these tools? For now, I think AI review tools targeting language comprehension and clarity are the most useful prior to submission. Those like Reviewer#2 or q.e.d., conversely, are most adapted to use after initial submission, to anticipate potential reviewer requests and begin reflecting about possible supplementary analyses. If we try to respond to all of q.e.d.’s criticisms before submission, I fear we will simply create more work for ourselves. Even if we manage to satisfy a myriad of AI evaluations, human peer-reviewers are more creative in requesting analyses or experiments from other domains, not yet existing in the scope of a subject’s published literature.

Returning to my initial question: can these AI tools advance open science? Perhaps through reduction, or even elimination, of human biases in the review process? It seems plausible that biases due to conflict of interest, in favour of a famous last author, or which reflect gender and ethnicity biases could be reduced thanks to neutral programming of these tools. However, as long as we do not have full, transparent access to the datasets used to train these machine-learning tools, we cannot know to what extent their algorithms have already integrated biases. Instead, such biases could become all the more difficult to detect.

I do not think that a revolutionary, undisputedly equitable and transparent peer review will happen tomorrow, but I do believe q.e.d.’s reports could interest editors in complement to human reviews. In time, I hope that the development of q.e.d. and other such tools will indeed incite a more transparent, pertinent, and egalitarian review process, to the benefit of the global scientific community.

Caitlin Martin, postdoctoral researcher at the Institut Pasteur

References:

Frontiers Media, Unlocking AI’s untapped potential: responsible innovation in research and publishing (2025)
Tamara Welschot, AI in Research Integrity: Springer Nature’s Innovative Tools Geppetto and SnappShot, The Researcher’s Source (2024)
Flaminio Squazzoni et al., Peer review and gender bias: A study on 145 scholarly journals. Adv. 7, eabd0299 (2021)
Fengyuan Liu et al., Non-White scientists appear on fewer editorial boards, spend more time under review, and receive fewer citations, Proc. Natl. Acad. Sci. U.S.A. 120 (13) e2215324120 (2023).

Partager :

En savoir plus sur Open science : évolutions, enjeux et pratiques