AI conference's papers contaminated by AI hallucinations

GPTZero, a detector of AI output, has found yet again that scientists are undermining their credibility by relying on unreliable AI assistance. The New York-based biz has identified 100 hallucinations in more than 51 papers accepted by the Conference on Neural Information Processing Systems (NeurIPS). This finding follows the company's prior discovery of 50 hallucinated citations in papers under review by the International Conference on Learning Representations (ICLR). GPTZero's senior machine-learning engineer Nazar Shmatko, head of machine learning Alex Adam, and academic writing editor Paul Esau argue in a blog post that the availability of generative AI tools has fueled "a tsunami of AI slop." "Between 2020 and 2025, submissions to NeurIPS increased more than 220 percent from 9,467 to 21,575," they observe. "In response, organizers have had to recruit ever greater numbers of reviewers, resulting in issues of oversight, expertise alignment, negligence, and even fraud." These hallucinations consist largely of authors and sources invented by generative AI models, and of purported AI-authored text. The legal community has been dealing with similar issues. More than 800 errant legal citations attributed to AI models have been flagged in various court filings, often with consequences for the attorneys, judges, or plaintiffs involved. Academics may not face the same misconduct sanctions as legal professionals, but the consequences from the careless application of AI can have consequences beyond squandered integrity.  The AI paper submission surge has coincided with an increase in the number of substantive errors in academic papers – mistakes like incorrect formulas, miscalculations, errant figures, and so on, as opposed to citations of non-existing source material. A pre-print paper published in December 2025 by researchers from Together AI, NEC Labs America, Rutgers University, and Stanford University looked specifically at AI papers from three major machine learning organizations: ICLR (2018–2025), NeurIPS (2021–2025), and TMLR (Transactions on Machine Learning Research) (2022-2025).  The authors found "published papers contain a non-negligible number of objective mistakes and that the average number of mistakes per paper has increased over time – from 3.8 in NeurIPS 2021 to 5.9 in NeurIPS 2025 (55.3 percent increase); from 4.1 in ICLR 2018 to 5.2 in ICLR 2025; and from 5.0 in TMLR 2022/23 to 5.5 in TMLR 2025." Correlation is not causation, but when the error rate in NeurIPS papers has increased 55.3 percent following the introduction of OpenAI's ChatGPT, the rapid adoption of generative AI tools cannot be ignored. The risk of unchecked AI usage for scientists is not just reputational. It may invalidate their work. A spokesperson for NeurIPS didn't immediately respond to our request for comment. We'll update this story if we hear back post-publication. GPTZero contends that its Hallucination Check software should be a part of a publisher's arsenal of AI-detection tools. That may help when attempting to determine whether a citation refers to actual research, but there are countermeasures that claim to be able to make AI authorship more difficult to detect. For example, a Claude Code skill called Humanizer says it "removes signs of AI-generated writing from text, making it sound more natural and human." And there are many other anti-forensic options. A recent report from the International Association of Scientific, Technical & Medical Publishers (STM) attempts to address the integrity challenges the scholarly community faces. The report says that the amount of academic communication reached 5.7 million articles in 2024, up from 3.9 million five years earlier. And it argues that publishing practices and policies need to adapt to the reality of AI-assisted and AI-fabricated research. "Academic publishers are definitely aware of the problem and are taking steps to protect themselves," said Adam Marcus, co-founder of Retraction Watch, which has documented many AI-related retractions, and managing editor of Gastroenterology & Endoscopy News, in an email to The Register. "Whether those will succeed remains to be seen. "We're in an AI arms race and it's not clear the defenders can withstand the siege. However, it's also important to recognize that publishers have made themselves vulnerable to these assaults by adopting a business model that has prioritized volume over quality. They are far from innocent victims." ®
AI Article