peer reviews created using AI can avoid detection

Close up view of a magnifying glass placed on laptop keyboard. — The difficulty of detecting AI-tool use in peer review is proving problematic.Credit: BrianAJackson/iStock via Getty

It’s almost impossible to know whether a peer-review report has been generated by artificial intelligence, according to a study that put AI-detecting tools to the test.

A research team based in China used the Claude 2.0 large language model (LLM), created by Anthropic, an AI company in San Francisco, California, to generate peer-review reports and other types of documentation for 20 published cancer-biology papers from the journal eLife¹. The journal’s publisher makes papers freely available online as ‘reviewed preprints’, and publishes them alongside their referee reports and the original unedited manuscripts.

The authors fed the original versions into Claude and prompted it to generate referee reports. The team then compared the AI-generated reports with the genuine ones published by eLife.

The AI-written reviews “looked professional, but had no specific, deep feedback”, says Lingxuan Zhu, an oncologist at the Southern Medical University in Lianyungang, China, and a co-author of the study. “This made us realize that there was a serious problem.”

The study found that Claude could write plausible citation requests (suggesting papers that authors could add to their reference lists) and convincing rejection recommendations (made when reviewers think a journal should reject a submitted paper). The latter capability raises the risk of journals rejecting good papers, says Zhu. “An editor cannot be an expert in everything. If they receive a very persuasive AI-written negative review, it could easily influence their decision.”

The study also found that the majority of the AI reports fooled the detection tools: ZeroGPT erroneously classified 60% as written by a human, and GPTzero concluded this for more than 80%.

Differing opinions

A growing challenge for journals is the fact that LLMs could be used in many ways to produce a referee report. What is deemed an ‘acceptable’ use of AI also differs depending on whom you ask. In a survey of some 5,000 researchers conducted by Nature earlier this year, 66% of respondents said it wasn’t appropriate to use generative AI to create reviewer reports from scratch. But 57% said it was acceptable to use it to help with peer review by getting it to answer questions about papers.

And although AI-detection tools are improving, they struggle to determine how much of a document has been generated using AI. An analysis published last year of referee reports that were submitted to four computer-science conferences estimated that 17% had been substantially modified by chatbots². It’s not clear, however, whether the referees used AI to improve the reports or to write them entirely.

Nature Index 2025 Research Leaders

Jeroen Verharen, a neuroscientist at the firm iota Biosciences in Alameda, California, says he is surprised that the AI detectors used by Zhu and his team weren’t better at spotting the AI-written referee reports.

But he adds that AI-written reports and associated materials are unlikely to become a widespread problem. If reviewers don’t want to review, he says, “they would just say no”.

Conversely, Mikołaj Piniewski, a hydrologist at the Warsaw University of Life Sciences, argues that it is a growing issue. He says he has already received referee reports that he suspects were written by AI.

“LLMs are increasingly being used by peer reviewers, although this is rarely disclosed,” he says. “When I spoke to my colleagues in the field of hydrology, it became clear that each of us had encountered at least one such case as an author in the past two years. At least one of the review reports we received looked very suspicious, and the AI-detection tools we used flagged it as potentially generated by LLMs.”

Piniewski adds that he is sure some journal editors are accepting AI-generated referee reports, unwittingly or otherwise. He suggests that a global shortage of peer reviewers could be causing some editors to be more lenient than they should be. “I’m afraid it is largely driven by convenience,” he says.

Source link

Sun Ra and Symphony in Middle School: OUSD Music Teacher Retires After Four Decades

Charlie Sheen Played A Very Familiar Role In His One CSI Appearance

Reel Fathers Rights APC Welcomes Distinguished Family Law Attorney Maria Estela de Orduna as Senior Trial Attorney

Oil slides after US-Iran ceasefire deal to open Strait of Hormuz

peer reviews created using AI can avoid detection

Differing opinions

Isaiah Rashad Teases Return With It’s Been Awful

James Gadson, Prolific Funk and Disco Drummer, Dies at 86

Converge Announce Second Album of 2026, Share New Song

Chuquimamani-Condori Confirms New Los Thuthanaka Music, Shares Unreleased Songs

Latest Posts

Categories

peer reviews created using AI can avoid detection

Differing opinions

You Might Also Like