These LLMs are the best at resisting Russian propaganda

As more people rely on large language models to provide pat answers to complex questions, state governments are understandably worried about those LLMs spouting what they see as dangerous propaganda promoted by foreign adversaries. To help combat this problem, the government-sponsored Estonian Language Institute (ELI) has released a new “Propaganda Resistance” benchmark ranking dozens of LLMs on their ability to avoid “tak[ing] positions on topics that the Russian Federation uses in its strategic narratives.”

As a former member of the Soviet Union that has been independent for just a few decades, many Estonians are particularly alert to what they see as false narratives being promoted from their large and often belligerent neighbor to the east. Alongside volunteer-run Estonian defense collective Propastop, the ELI identified 14 broad categories in which it sees Russian influence operations trying to sway public discussion. These range from narratives on the current status of Crimea and justifications for the war in Ukraine to the history of NATO and justification for Russia’s annexation of Baltic states during World War II.

For each category of propaganda, the researchers developed separate questions phrased to be neutral, biased with “false assumptions” based on Russian propaganda, or to maliciously attempt to elicit explicit misinformation from the LLM. Questions were provided to the models in English, Estonian, and Russian, and judged by a separate AI model (calibrated to align with Propastop experts) based on the models’ ability to “push back on propaganda narratives, without external help” from web search or other external tools.

The rankings

Anthropic’s Claude models tended to perform the best of the proprietary frontier models on this new benchmark, with various recent versions of its Sonnet and Opus models taking six of the top 10 spots. Opus 4.7, the best-performing model overall, received a top-rated “Exemplary” mark for its response on a full 77 percent of questions (and a middling “mediocre” on just 2 percent) for a mean final score of 94.9 out of 100 on the benchmark.

Open-weight models, including Nvidia’s Nemotron and Alibaba’s Qwen, showed strong results comparable to Anthropic’s best models. GPT-5.4—the best-performing model from OpenAI—also performed relatively well on the benchmark, providing “Exemplary” responses on 54 percent of questions and achieving an 88.9 mean score.

Unsurprisingly, recent frontier models showed a much stronger tendency to resist Russian propaganda than models from just a few years ago. Claude 3.5 Haiku—the highest-rated model released in 2024—received a mean rating of just 73.1 on the benchmark. That mark would put it in the bottom third of models released in 2026 on this metric.

Detailed benchmarks for Google’s Gemini 2.5 Pro model show particularly sensitivity to malicious prompts and prompts in Russian. Credit: Estonian Language Institute

But that improvement over time was not uniform across all LLM makers. Google’s most propaganda-resistant LLM, Gemini 2.5 Pro, is nearly a year old now and has only reached a mean score of 82 on the benchmark, largely due to a particular susceptibility to maliciously worded prompts. The most recent tested Google model, Gemini 3.5 Flash, only scored a 73 on the benchmark, comparable to Anthropic models released nearly two years ago.

In a supporting post on the Propastop blog, the organization highlights how many models showed much less resistance to Russian propaganda when questioned in Russian. Google’s Gemini 3.5 Flash received significantly lower benchmark scores in Russian than in English, as did open-weight models like Moonshot’s Kimi K2 and StepFun’s Step 3.5 Flash.

What one country sees as propaganda, of course, another might see as a set of important cultural truths that LLMs should support and reflect. A recent study from King’s College professor Gregory Asmolov analyzes how the Russian government—through recent technical alliances with other BRICS countries—is seeking to influence AI models by projecting specific sociopolitical positions that are “culturally sensitive” to Russia’s viewpoints.