Friday, March 14, 2025
HomeAIAI search engines give incorrect answers at an alarming 60% rate, study...

AI search engines give incorrect answers at an alarming 60% rate, study says

Share

A new study from Columbia Journalism Review’s Tow Center for Digital Journalism finds serious accuracy issues with generative AI models used for news searches. The research tested eight AI-driven search tools equipped with live search functionality and discovered that the AI models incorrectly answered more than 60 percent of queries about news content.

Researchers Klaudia Jaźwińska and Aisvarya Chandrasekar noted in their report that roughly 1 in 4 Americans now uses AI models as alternatives to traditional search engines. This raises serious concerns about reliability, given the substantial error rate uncovered in the study.

Error rates varied notably among the tested platforms. Perplexity provided incorrect information in 37 percent of the queries tested, whereas ChatGPT Search incorrectly identified 67 percent (134 out of 200) of articles queried. Grok 3 demonstrated the highest error rate, at 94 percent.

A graph from CJR shows "confidently wrong" search results.
A graph from CJR shows “confidently wrong” search results. Credit: CJR

For the tests, researchers fed direct excerpts from actual news articles to the AI models, then asked each model to identify the article’s headline, original publisher, publication date, and URL. They ran 1,600 queries across the eight different generative search tools.

The study highlighted a common trend among these AI models: rather than declining to respond when they lacked reliable information, the models frequently provided confabulations—plausible-sounding incorrect or speculative answers. The researchers emphasized that this behavior was consistent across all tested models, not limited to just one tool.

Surprisingly, premium paid versions of these AI search tools fared even worse in certain respects. Perplexity Pro ($20/month) and Grok 3’s premium service ($40/month) confidently delivered incorrect responses more often than their free counterparts. Though these premium models correctly answered a higher number of prompts, their reluctance to decline uncertain responses drove higher overall error rates.

Issues with citations and publisher control

The CJR researchers also uncovered evidence suggesting some AI tools ignored Robot Exclusion Protocol settings, which publishers use to prevent unauthorized access. For example, Perplexity’s free version correctly identified all 10 excerpts from paywalled National Geographic content, despite National Geographic explicitly disallowing Perplexity’s web crawlers.

Even when these AI search tools cited sources, they often directed users to syndicated versions of content on platforms like Yahoo News rather than original publisher sites. This occurred even in cases where publishers had formal licensing agreements with AI companies.

URL fabrication emerged as another significant problem. More than half of citations from Google’s Gemini and Grok 3 led users to fabricated or broken URLs resulting in error pages. Of 200 citations tested from Grok 3, 154 resulted in broken links.

These issues create significant tension for publishers, which face difficult choices. Blocking AI crawlers might lead to loss of attribution entirely, while permitting them allows widespread reuse without driving traffic back to publishers’ own websites.

A graph from CJR showing that blocking crawlers doesn't mean that AI search providers honor the request.
A graph from CJR showing that blocking crawlers doesn’t mean that AI search providers honor the request. Credit: CJR

Mark Howard, chief operating officer at Time magazine, expressed concern to CJR about ensuring transparency and control over how Time’s content appears via AI-generated searches. Despite these issues, Howard sees room for improvement in future iterations, stating, “Today is the worst that the product will ever be,” citing substantial investments and engineering efforts aimed at improving these tools.

However, Howard also did some user shaming, suggesting it’s the user’s fault if they aren’t skeptical of free AI tools’ accuracy: “If anybody as a consumer is right now believing that any of these free products are going to be 100 percent accurate, then shame on them.”

OpenAI and Microsoft provided statements to CJR acknowledging receipt of the findings but did not directly address the specific issues. OpenAI noted its promise to support publishers by driving traffic through summaries, quotes, clear links, and attribution. Microsoft stated it adheres to Robot Exclusion Protocols and publisher directives.

The latest report builds on previous findings published by the Tow Center in November 2024, which identified similar accuracy problems in how ChatGPT handled news-related content. For more detail on the fairly exhaustive report, check out Columbia Journalism Review’s website.

Popular

OpenAI said to be in talks to raise $40B at a $340B valuation

OpenAI may have billions of dollars in the bank. But it’s gearing up to raise billions more, according to a report from The...

Mistral board member and a16z VC Anjney Midha says DeepSeek gainedt live AIs GPU hunger

Andreessen Horowitz in vogue accomplice and Mistral board member Anjney “Anj” Midha first spied DeepSeek’s jaw-losing efficiency six months ago, he tells TechCrunch....

Related Articles

Waymo was slapped with nearly 600 parking tickets last year in SF alone

Waymo now has more than 300 driverless vehicles zipping passengers around San Francisco,...

Anti-aging zealot Bryan Johnson wants to start foodome sequencing

In the same way that genome sequencing determines the genetic makeup of an...

Oracle is reportedly a top choice for helping run TikTok

TikTok has until April to finalize a sale to a U.S.-based buyer, yet...

Huaweis lobbying lands it in a bribery scandal with EU politicians

Huawei is at the center of a fresh scandal in Europe, following reports...
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x