Monday, April 14, 2025
HomeAIMetas vanilla Maverick AI model ranks below rivals on a popular chat...

Metas vanilla Maverick AI model ranks below rivals on a popular chat benchmark

Share


Earlier this week, Meta landed in hot water for using an experimental, unreleased version of its Llama 4 Maverick model to achieve a high score on a crowdsourced benchmark, LM Arena. The incident prompted the maintainers of LM Arena to apologize, change their policies, and score the unmodified, vanilla Maverick.

Turns out, it’s not very competitive.

The unmodified Maverick, “Llama-4-Maverick-17B-128E-Instruct,” was ranked below models including OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 1.5 Pro as of Friday. Many of these models are months old.

Why the poor performance? Meta’s experimental Maverick, Llama-4-Maverick-03-26-Experimental, was “optimized for conversationality,” the company explained in a chart published last Saturday. Those optimizations evidently played well to LM Arena, which has human raters compare the outputs of models and choose which they prefer.

As we’ve written about before, for various reasons, LM Arena has never been the most reliable measure of an AI model’s performance. Still, tailoring a model to a benchmark — besides being misleading — makes it challenging for developers to predict exactly how well the model will perform in different contexts.

In a statement, a Meta spokesperson told TechCrunch that Meta experiments with “all types of custom variants.”

“‘Llama-4-Maverick-03-26-Experimental’ is a chat optimized version we experimented with that also performs well on LMArena,” the spokesperson said. “We have now released our open source version and will see how developers customize Llama 4 for their own use cases. We’re excited to see what they will build and look forward to their ongoing feedback.”

Popular

Trump admin freezes EV charging program that gave Tesla millions

The Department of Transportation (DOT) has paused funding for a $5 billion EV charging infrastructure program that Tesla has received at least $31...

OpenAI co-founder Ilya Sutskevers Safe Superintelligence reportedly valued at $32B

Safe Superintelligence (SSI), the AI startup led by OpenAI’s co-founder and former chief scientist Ilya Sutskever, has raised an additional $2 billion in...

Related Articles

UK founders grow frustrated over dearth of funding: the problem is getting worse

According to Dealroom data cited by the Financial Times, British start-ups raised just...

Tech tariff exemptions are only temporary, according to Trumps commerce secretary

The tech industry may not be safe from new tariffs, according to U.S....

Generation AI

Avi Loeb is the head of the Galileo Project, founding director of Harvard University’s — Black...

Cofertilitys radical model for women: Freeze your eggs for free by donating half of them

In recent years, focus on career and delayed marriage age is driving some...

OpenAI co-founder Ilya Sutskevers Safe Superintelligence reportedly valued at $32B

Safe Superintelligence (SSI), the AI startup led by OpenAI’s co-founder and former chief...

Could an Amazon driver could be the one who saves your life?

In a quiet transformation from merchandising giant to (also) healthcare giant, Amazon may...

Apples Mythic Quest is ending with an updated Season 4 finale

“Mythic Quest,” the Apple TV+ workplace comedy about the making of a popular...

The xAIX merger is a good deal if youre betting on Musks empire

When Elon Musk announced that his AI startup, xAI, had acquired his social...
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x