Metas vanilla Maverick AI model ranks below rivals on a popular chat benchmark

Earlier this week, Meta landed in hot water for using an experimental, unreleased version of its Llama 4 Maverick model to achieve a high score on a crowdsourced benchmark, LM Arena. The incident prompted the maintainers of LM Arena to apologize, change their policies, and score the unmodified, vanilla Maverick.

Turns out, it’s not very competitive.

The unmodified Maverick, “Llama-4-Maverick-17B-128E-Instruct,” was ranked below models including OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 1.5 Pro as of Friday. Many of these models are months old.

The release version of Llama 4 has been added to LMArena after it was found out they cheated, but you probably didn’t see it because you have to scroll down to 32nd place which is where is ranks pic.twitter.com/A0Bxkdx4LX

— ρ:ɡeσn (@pigeon__s) April 11, 2025

Why the poor performance? Meta’s experimental Maverick, Llama-4-Maverick-03-26-Experimental, was “optimized for conversationality,” the company explained in a chart published last Saturday. Those optimizations evidently played well to LM Arena, which has human raters compare the outputs of models and choose which they prefer.

As we’ve written about before, for various reasons, LM Arena has never been the most reliable measure of an AI model’s performance. Still, tailoring a model to a benchmark — besides being misleading — makes it challenging for developers to predict exactly how well the model will perform in different contexts.

In a statement, a Meta spokesperson told TechCrunch that Meta experiments with “all types of custom variants.”

“‘Llama-4-Maverick-03-26-Experimental’ is a chat optimized version we experimented with that also performs well on LMArena,” the spokesperson said. “We have now released our open source version and will see how developers customize Llama 4 for their own use cases. We’re excited to see what they will build and look forward to their ongoing feedback.”

International tourist spending in Europe seen up 11% this year, report says

Could the euro replace the dollar as global reserve currency? Its not getting any lesslikely

Oil prices set for weekly drop with tariff legal battles

Microsoft Education Champions Accessibility with AI-Powered Learning Tools

Adidas, Puma expected to hike sportswear prices

International tourist spending in Europe seen up 11% this year, report says

Could the euro replace the dollar as global reserve currency? Its not getting any lesslikely

Oil prices set for weekly drop with tariff legal battles

Microsoft Education Champions Accessibility with AI-Powered Learning Tools

Adidas, Puma expected to hike sportswear prices

Metas vanilla Maverick AI model ranks below rivals on a popular chat benchmark

Share

Valla raises $2.7M to make legal recourse more accessible to employees

Would We Notice a Nuclear War on an Exoplanet?

Diggs founders explain how theyre building a site for humans in the AI era

Startup Battlefield 200: Final week to submit your application

For the love of God, stop calling your AI a co-worker

Popular

The same day Trump bought a Tesla, automaker moved to disrupt trade war

SpaceX launches astronauts for long-awaited International Space Station crew swap

A timeline of the U.S. semiconductor market in 2025

Amazons big book sale just happens to overlap with Independent Bookstore Day

Robots run a half marathon, slowly

Harvesting Numerous Interstellar Objects with a Dedicated Space Telescope

Related Articles

Console raises $6.2M from Thrive to free IT teams from mundane tasks with AI

Valla raises $2.7M to make legal recourse more accessible to employees

Would We Notice a Nuclear War on an Exoplanet?

Diggs founders explain how theyre building a site for humans in the AI era

Startup Battlefield 200: Final week to submit your application

For the love of God, stop calling your AI a co-worker

Elon Musk tries to stick to spaceships

Video game union announces first contract with Microsoft

About Us

Popular Category

Editor Picks

Console raises $6.2M from Thrive to free IT teams from mundane tasks with AI

Britain unveils radical defence overhaul to meet new threats

Metas vanilla Maverick AI model ranks below rivals on a popular chat benchmark

Share

Related posts:

Popular

Related Articles

About Us

Popular Category

Editor Picks