Monday, April 14, 2025
HomeAIMeta exec denies the company artificially boosted Llama 4s benchmark scores

Meta exec denies the company artificially boosted Llama 4s benchmark scores

Share


A Meta exec on Monday denied a rumor that the company trained its new AI models to present well on specific benchmarks while concealing the models’ weaknesses.

The executive, Ahmad Al-Dahle, VP of generative AI at Meta, said in a post on X that it’s “simply not true” that Meta trained its Llama 4 Maverick and Llama 4 Scout models on “test sets.” In AI benchmarks, test sets are collections of data used to evaluate the performance of a model after it’s been trained. Training on a test set could misleadingly inflate a model’s benchmark scores, making the model appear more capable than it actually is.

Over the weekend, an unsubstantiated rumor that Meta artificially boosted its new models’ benchmark results began circulating on X and Reddit. The rumor appears to have originated from a post on a Chinese social media site from a user claiming to have resigned from Meta in protest over the company’s benchmarking practices.

Reports that Maverick and Scout perform poorly on certain tasks fueled the rumor, as did Meta’s decision to use an experimental, unreleased version of Maverick to achieve better scores on the benchmark LM Arena. Researchers on X have observed stark differences in the behavior of the publicly downloadable Maverick compared with the model hosted on LM Arena. 

Al-Dahle acknowledged that some users are seeing “mixed quality” from Maverick and Scout across the different cloud providers hosting the models.

“Since we dropped the models as soon as they were ready, we expect it’ll take several days for all the public implementations to get dialed in,” Al-Dahle said. “We’ll keep working through our bug fixes and onboarding partners.”

Popular

Related Articles

Bill Gates-backed Arnergy to expand solar access in Nigeria with $18M as demand surges

Demand for solar energy in power-starved Nigeria has soared in the last decade...

Access to future AI models in OpenAIs API may require a verified ID

OpenAI may soon require organizations to complete an ID verification process in order...

UK founders grow frustrated over dearth of funding: the problem is getting worse

According to Dealroom data cited by the Financial Times, British start-ups raised just...

Tech tariff exemptions are only temporary, according to Trumps commerce secretary

The tech industry may not be safe from new tariffs, according to U.S....

Generation AI

Avi Loeb is the head of the Galileo Project, founding director of Harvard University’s — Black...

Cofertilitys radical model for women: Freeze your eggs for free by donating half of them

In recent years, focus on career and delayed marriage age is driving some...

OpenAI co-founder Ilya Sutskevers Safe Superintelligence reportedly valued at $32B

Safe Superintelligence (SSI), the AI startup led by OpenAI’s co-founder and former chief...

Could an Amazon driver could be the one who saves your life?

In a quiet transformation from merchandising giant to (also) healthcare giant, Amazon may...
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x