Influential study touting ChatGPT in education retracted over red flags

A study that claimed OpenAI’s ChatGPT can positively impact student learning has been retracted nearly one year after publication. The journal publisher, Springer Nature, cited “discrepancies” in the analysis and a lack of confidence in the conclusions—but not before the paper racked up hundreds of citations and made the rounds on social media.

“The paper’s authors made some very attention-grabbing claims about the benefits of ChatGPT on learning outcomes,” said Ben Williamson, a senior lecturer at the Centre for Research in Digital Education and the Edinburgh Futures Institute at the University of Edinburgh in Scotland, in an email to Ars. “It was treated by many on social media as one of the first pieces of hard, gold standard evidence that ChatGPT, and generative AI more broadly, benefits learners.”

The retracted paper attempted to quantify “the effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking” by analyzing results from 51 previous research studies. Its meta-analysis calculated the effect size between various studies’ experimental groups that used ChatGPT in education and control groups that did not use the AI chatbot.

That analysis supposedly showed how “ChatGPT has a large positive impact on improving learning performance” along with a “moderately positive impact on enhancing learning perception” and “fostering higher-order thinking,” according to the researchers who authored the paper. The now-retracted results first appeared in the journal Humanities & Social Sciences Communications, published by Springer Nature on May 6, 2025.

“In some cases it appears it was synthesizing very poor quality studies, or mixing together findings from studies that simply cannot be accurately compared due to very different methods, populations and samples,” Williamson told Ars. “It really seemed like a paper that should not have been published in the first place.”

Williamson also questioned the timing of the paper’s publication just two and a half years after OpenAI released ChatGPT in November 2022. “It is not feasible that dozens of high-quality studies about ChatGPT and learning performance could have been conducted, reviewed, and published in that time,” Williamson said.

A legacy that may outlive retraction

Since its publication, the study has been cited 262 times in other papers published by Springer Nature’s peer-reviewed journals and received a total of 504 citations from both peer-reviewed and non-peer-reviewed sources. It also attracted nearly half a million readers and received enough online attention to rank in the 99th percentile for journal articles in terms of attention score.

“Of course, the problem with this form of social media circulation is that all of the details about the study got stripped away,” Williamson said. “All that was left were the major claims, which certain social media users helped boost and propel. All this helped the paper get a huge amount of attention, even though the findings really were not supported by the underlying research at all.”

Williamson has not been alone in such concerns. When the paper was first published, Ilkka Tuomi, chief scientist of the research institute Meaning Processing Ltd., posted on LinkedIn about the pitfalls of such meta-analysis studies attempting to “draw conclusions about incompatible and ill-defined outcomes” from experimental results involving very different populations. “The only reason to do these studies seems to be that statistics and meta-analysis tools can crunch out numbers that look [like] science,” Tuomi wrote.

On April 22, 2026, Springer Nature posted a retracted article notice almost a year after initial publication. The journal publisher also stated that “the authors had not responded to correspondence regarding the retraction.”

“The Editor has decided to retract this paper owing to concerns regarding discrepancies in the meta-analysis,” said the Springer Nature retraction note. “These issues ultimately undermine the confidence the Editor can place in the validity of the analysis and resulting conclusions.”

The retraction notice itself received minimal attention until it was shared on Bluesky and LinkedIn by Williamson. He expressed concern that many researchers and others who initially read the paper will not realize it was retracted, meaning that the “headline finding that ChatGPT helps learning performance might persist despite its retraction.”

“All of this is hugely frustrating for those of us trying hard to make sense of what AI means for learning, teaching, and education more generally,” Williamson told Ars. “We have had several years of hype about AI in education, but what we have really needed is high-quality research that can actually show us what kinds of impacts AI is having in classrooms and learning practices.”

Many educators have scrambled to adapt their classes to prevent AI-enabled student cheating and expressed discouragement with how widely available generative AI tools have shifted many students’ mindsets away from learning and critical thinking. Tech companies continue to promote AI chatbot tools as “study mode” learning tools and for generating SAT practice tests. Meanwhile, at least one country is pivoting away from digital resources by reintroducing physical books and having students return to using pen and paper.