Meta exec denies the company artificially boosted Llama 4s benchmark scores

A Meta exec on Monday denied a rumor that the company trained its new AI models to present well on specific benchmarks while concealing the models’ weaknesses.

The executive, Ahmad Al-Dahle, VP of generative AI at Meta, said in a post on X that it’s “simply not true” that Meta trained its Llama 4 Maverick and Llama 4 Scout models on “test sets.” In AI benchmarks, test sets are collections of data used to evaluate the performance of a model after it’s been trained. Training on a test set could misleadingly inflate a model’s benchmark scores, making the model appear more capable than it actually is.

Over the weekend, an unsubstantiated rumor that Meta artificially boosted its new models’ benchmark results began circulating on X and Reddit. The rumor appears to have originated from a post on a Chinese social media site from a user claiming to have resigned from Meta in protest over the company’s benchmarking practices.

Reports that Maverick and Scout perform poorly on certain tasks fueled the rumor, as did Meta’s decision to use an experimental, unreleased version of Maverick to achieve better scores on the benchmark LM Arena. Researchers on X have observed stark differences in the behavior of the publicly downloadable Maverick compared with the model hosted on LM Arena.

Al-Dahle acknowledged that some users are seeing “mixed quality” from Maverick and Scout across the different cloud providers hosting the models.

“Since we dropped the models as soon as they were ready, we expect it’ll take several days for all the public implementations to get dialed in,” Al-Dahle said. “We’ll keep working through our bug fixes and onboarding partners.”

International tourist spending in Europe seen up 11% this year, report says

Could the euro replace the dollar as global reserve currency? Its not getting any lesslikely

Oil prices set for weekly drop with tariff legal battles

Microsoft Education Champions Accessibility with AI-Powered Learning Tools

Adidas, Puma expected to hike sportswear prices

International tourist spending in Europe seen up 11% this year, report says

Could the euro replace the dollar as global reserve currency? Its not getting any lesslikely

Oil prices set for weekly drop with tariff legal battles

Microsoft Education Champions Accessibility with AI-Powered Learning Tools

Adidas, Puma expected to hike sportswear prices

Meta exec denies the company artificially boosted Llama 4s benchmark scores

Share

Valla raises $2.7M to make legal recourse more accessible to employees

Would We Notice a Nuclear War on an Exoplanet?

Diggs founders explain how theyre building a site for humans in the AI era

Startup Battlefield 200: Final week to submit your application

For the love of God, stop calling your AI a co-worker

Popular

The same day Trump bought a Tesla, automaker moved to disrupt trade war

SpaceX launches astronauts for long-awaited International Space Station crew swap

A timeline of the U.S. semiconductor market in 2025

Amazons big book sale just happens to overlap with Independent Bookstore Day

Robots run a half marathon, slowly

OpenAI rolls out its AI agent, Operator, in several countries

Related Articles

Console raises $6.2M from Thrive to free IT teams from mundane tasks with AI

Valla raises $2.7M to make legal recourse more accessible to employees

Would We Notice a Nuclear War on an Exoplanet?

Diggs founders explain how theyre building a site for humans in the AI era

Startup Battlefield 200: Final week to submit your application

For the love of God, stop calling your AI a co-worker

Elon Musk tries to stick to spaceships

Video game union announces first contract with Microsoft

About Us

Popular Category

Editor Picks

Console raises $6.2M from Thrive to free IT teams from mundane tasks with AI

Britain unveils radical defence overhaul to meet new threats

Meta exec denies the company artificially boosted Llama 4s benchmark scores

Share

Related posts:

Popular

Related Articles

About Us

Popular Category

Editor Picks