Thursday, June 5, 2025
HomeAIOpenAI launches program to design new domain-specific AI benchmarks

OpenAI launches program to design new domain-specific AI benchmarks

Share


OpenAI, like many AI labs, thinks AI benchmarks are broken. It says it wants to fix them through a new program.

Called the OpenAI Pioneers Program, the program will focus on creating evaluations for AI models that “set the bar for what good looks like,” as OpenAI phrased it in a blog post.

“As the pace of AI adoption accelerates across industries, there is a need to understand and improve its impact in the world,” the company continued in its post. “Creating domain-specific evals are one way to better reflect real-world use cases, helping teams assess model performance in practical, high-stakes environments.”

As the recent controversy with the crowdsourced benchmark LM Arena and Meta’s Maverick model illustrate, it’s tough to know, these days, precisely what differentiates one model from another. Many widely-used AI benchmarks measure performance on esoteric tasks, like solving doctorate-level math problems. Others can be gamed, or don’t align well with most people’s preferences.

Through the Pioneers Program, OpenAI hopes to create benchmarks for specific domains like legal, finance, insurance, healthcare, and accounting. The lab says that, in the coming months, it’ll work with “multiple companies” to design tailored benchmarks and eventually share those benchmarks publicly, along with “industry-specific” evaluations.

“The first cohort will focus on startups who will help lay the foundations of the OpenAI Pioneers Program,” OpenAI wrote in the blog post. “We’re selecting a handful of startups for this initial cohort, each working on high-value, applied use cases where AI can drive real-world impact.”

Companies in the program will also have the opportunity to work with OpenAI’s team to create model improvements via reinforcement fine tuning, a technique that optimizes models for a narrow set of tasks, OpenAI says.

The big question is whether the AI community will embrace benchmarks whose creation was funded by OpenAI. OpenAI has supported benchmarking efforts financially before, and designed its own evaluations. But partnering with customers to release AI tests may be seen as an ethical bridge too far.

Popular

Related Articles

How to watch Apples WWDC 2025 keynote

Apple is hosting its 2025 Worldwide Developers Conference starting Monday, with the keynote...

Profitable African fintech PalmPay is in talks to raise as much as $100M

PalmPay, an African digital bank fintech, is in talks to raise between $50...

North America takes the bulk of AI VC investments, despite tough political environment

Despite what some experts have characterized as an environment increasingly hostile to AI...

iOS 19: All the rumored changes Apple could be bringing to its new operating system

As Apple prepares to unveil iOS 19 at WWDC 2025 on Monday, several...

Attacks on the Three Facets of My Identity

Avi Loeb is the head of the Galileo Project, founding director of Harvard University’s — Black...

Data breach at newspaper giant Lee Enterprises affects 40,000 people

Lee Enterprises, the newspaper publishing giant that was hit by a ransomware attack...

Windsurf says Anthropic is limiting its direct access to Claude AI models

Windsurf, the popular vibe coding startup that’s reportedly being acquired by OpenAI, says...

One of Africas most successful founders is back with a new AI startup and already raised $9M

In 2023, co-founders Karim Jouini and Jihed Othmani sold their expense management startup...