
The difficulty of detecting AI-tool use in peer review is proving problematic.Credit: BrianAJackson/iStock via Getty
It’s almost impossible to know whether a peer-review report has been generated by artificial intelligence, according to a study that put AI-detecting tools to the test.
A research team based in China used the Claude 2.0 large language model (LLM), created by Anthropic, an AI company in San Francisco, California, to generate peer-review reports and other types of documentation for 20 published cancer-biology papers from the journal eLife1. The journal’s publisher makes papers freely available online as ‘reviewed preprints’, and publishes them alongside their referee reports and the original unedited manuscripts.
The authors fed the original versions into Claude and prompted it to generate referee reports. The team then compared the AI-generated reports with the genuine ones published by eLife.
The AI-written reviews “looked professional, but had no specific, deep feedback”, says Lingxuan Zhu, an oncologist at the Southern Medical University in Lianyungang, China, and a co-author of the study. “This made us realize that there was a serious problem.”
The study found that Claude could write plausible citation requests (suggesting papers that authors could add to their reference lists) and convincing rejection recommendations (made when reviewers think a journal should reject a submitted paper). The latter capability raises the risk of journals rejecting good papers, says Zhu. “An editor cannot be an expert in everything. If they receive a very persuasive AI-written negative review, it could easily influence their decision.”
The study also found that the majority of the AI reports fooled the detection tools: ZeroGPT erroneously classified 60% as written by a human, and GPTzero concluded this for more than 80%.
Differing opinions
A growing challenge for journals is the fact that LLMs could be used in many ways to produce a referee report. What is deemed an ‘acceptable’ use of AI also differs depending on whom you ask. In a survey of some 5,000 researchers conducted by Nature earlier this year, 66% of respondents said it wasn’t appropriate to use generative AI to create reviewer reports from scratch. But 57% said it was acceptable to use it to help with peer review by getting it to answer questions about papers.
And although AI-detection tools are improving, they struggle to determine how much of a document has been generated using AI. An analysis published last year of referee reports that were submitted to four computer-science conferences estimated that 17% had been substantially modified by chatbots2. It’s not clear, however, whether the referees used AI to improve the reports or to write them entirely.
Nature Index 2025 Research Leaders
Jeroen Verharen, a neuroscientist at the firm iota Biosciences in Alameda, California, says he is surprised that the AI detectors used by Zhu and his team weren’t better at spotting the AI-written referee reports.
But he adds that AI-written reports and associated materials are unlikely to become a widespread problem. If reviewers don’t want to review, he says, “they would just say no”.
Conversely, Mikołaj Piniewski, a hydrologist at the Warsaw University of Life Sciences, argues that it is a growing issue. He says he has already received referee reports that he suspects were written by AI.
“LLMs are increasingly being used by peer reviewers, although this is rarely disclosed,” he says. “When I spoke to my colleagues in the field of hydrology, it became clear that each of us had encountered at least one such case as an author in the past two years. At least one of the review reports we received looked very suspicious, and the AI-detection tools we used flagged it as potentially generated by LLMs.”
Piniewski adds that he is sure some journal editors are accepting AI-generated referee reports, unwittingly or otherwise. He suggests that a global shortage of peer reviewers could be causing some editors to be more lenient than they should be. “I’m afraid it is largely driven by convenience,” he says.
You Might Also Like
Quantum entanglement as a tool to image distant astronomical objects
RESEARCH BRIEFINGS 11 March 2026 Particles entangled over long distances can, in theory, improve the sensitivity of long-baseline interferometers that...
Monthly HIV-drug injections offer potent alternative to daily tablets
Nature, Published online: 08 March 2026; doi:10.1038/d41586-026-00669-8Long-lasting treatment suppresses HIV in people with mental illness and other conditions that make...
High-rise transistors can be used to build space-saving circuits
NEWS AND VIEWS 05 March 2026 Logical circuits have been built from nanosheet stacks of various transistors, which could make...
meet the female colleagues who inspire these award-winning women in science
Science was once the domain of researchers toiling in solitude. Today, however, research thrives through collaboration, and is perhaps at...









