After 24-hour hackathon, Hugging Faces AI research agent nearly matches OpenAIs solution

On Tuesday, Hugging Face researchers released an open source AI research agent called “Open Deep Research,” created by an in-house team as a challenge 24 hours after the launch of OpenAI’s Deep Research feature, which can autonomously browse the web and create research reports. The project seeks to match Deep Research’s performance while making the technology freely available to developers.

“While powerful LLMs are now freely available in open-source, OpenAI didn’t disclose much about the agentic framework underlying Deep Research,” writes Hugging Face on its announcement page. “So we decided to embark on a 24-hour mission to reproduce their results and open-source the needed framework along the way!”

Similar to both OpenAI’s Deep Research and Google’s implementation of its own “Deep Research” using Gemini (first introduced in December—before OpenAI), Hugging Face’s solution adds an “agent” framework to an existing AI model to allow it to perform multi-step tasks, such as collecting information and building the report as it goes along that it presents to the user at the end.

The open source clone is already racking up comparable benchmark results. After only a day’s work, Hugging Face’s Open Deep Research has reached 55.15 percent accuracy on the General AI Assistants (GAIA) benchmark, which tests an AI model’s ability to gather and synthesize information from multiple sources. OpenAI’s Deep Research scored 67.36 percent accuracy on the same benchmark.

As Hugging Face points out in its post, GAIA includes complex multi-step questions such as this one:

Which of the fruits shown in the 2008 painting “Embroidery from Uzbekistan” were served as part of the October 1949 breakfast menu for the ocean liner that was later used as a floating prop for the film “The Last Voyage”? Give the items as a comma-separated list, ordering them in clockwise order based on their arrangement in the painting starting from the 12 o’clock position. Use the plural form of each fruit.

To correctly answer that type of question, the AI agent must seek out multiple disparate sources and assemble them into a coherent answer. Many of the questions in GAIA represent no easy task, even for a human, so they test agentic AI’s mettle quite well.

Choosing the right core AI model

An AI agent is nothing without some kind of existing AI model at its core. For now, Open Deep Research builds on OpenAI’s large language models (such as GPT-4o) or simulated reasoning models (such as o1 and o3-mini) through an API. But it can also be adapted to open-weights AI models. The novel part here is the agentic structure that holds it all together and allows an AI language model to autonomously complete a research task.

We spoke to Hugging Face’s Aymeric Roucher, who leads the Open Deep Research project, about the team’s choice of AI model. “It’s not ‘open weights’ since we used a closed weights model just because it worked well, but we explain all the development process and show the code,” he told Ars Technica. “It can be switched to any other model, so [it] supports a fully open pipeline.”

“I tried a bunch of LLMs including [Deepseek] R1 and o3-mini,” Roucher adds. “And for this use case o1 worked best. But with the open-R1 initiative that we’ve launched, we might supplant o1 with a better open model.”

While the core LLM or SR model at the heart of the research agent is important, Open Deep Research shows that building the right agentic layer is key, because benchmarks show that the multi-step agentic approach improves large language model capability greatly: OpenAI’s GPT-4o alone (without an agentic framework) scores 29 percent on average on the GAIA benchmark vs. OpenAI Deep Research’s 67 percent.

According to Roucher, a core component of Hugging Face’s reproduction makes the project work as well as it does. They used Hugging Face’s open source “smolagents” library to get a head start, which uses what they call “code agents” rather than JSON-based agents. These code agents write their actions in programming code, which reportedly makes them 30 percent more efficient at completing tasks. The approach allows the system to handle complex sequences of actions more concisely.

The speed of open source AI

Like other open source AI applications, the developers behind Open Deep Research have wasted no time iterating the design, thanks in part to outside contributors. And like other open source projects, the team built off of the work of others, which shortens development times. For example, Hugging Face used web browsing and text inspection tools borrowed from Microsoft Research’s Magnetic-One agent project from late 2024.

While the open source research agent does not yet match OpenAI’s performance, its release gives developers free access to study and modify the technology. The project demonstrates the research community’s ability to rapidly reproduce and openly share AI capabilities that were previously available only through commercial providers.

“I think [the benchmarks are] quite indicative for difficult questions,” said Roucher. “But in terms of speed and UX, our solution is far from being as optimized as theirs.”

Roucher says future improvements to its research agent may include support for more file formats and vision-based web browsing capabilities. And Hugging Face is already working on cloning OpenAI’s Operator, which can perform other types of tasks (such as viewing computer screens and controlling mouse and keyboard inputs) within a web browser environment.

Hugging Face has posted its code publicly on GitHub and opened positions for engineers to help expand the project’s capabilities.

“The response has been great,” Roucher told Ars. “We’ve got lots of new contributors chiming in and proposing additions. Kind of feels like catching the wave while surfing, the community really provides a strong force!”

International tourist spending in Europe seen up 11% this year, report says

Could the euro replace the dollar as global reserve currency? Its not getting any lesslikely

Oil prices set for weekly drop with tariff legal battles

Microsoft Education Champions Accessibility with AI-Powered Learning Tools

Adidas, Puma expected to hike sportswear prices

International tourist spending in Europe seen up 11% this year, report says

Could the euro replace the dollar as global reserve currency? Its not getting any lesslikely

Oil prices set for weekly drop with tariff legal battles

Microsoft Education Champions Accessibility with AI-Powered Learning Tools

Adidas, Puma expected to hike sportswear prices

After 24-hour hackathon, Hugging Faces AI research agent nearly matches OpenAIs solution

Share

Choosing the right core AI model

The speed of open source AI

Video game union announces first contract with Microsoft

Day 4 of TechCrunch Sessions: AI Trivia Countdown Flex your brain, score big on tickets

Our Cosmic Truth

Week in Review: Perplexity Labs wants to do your work

NAACP calls on Memphis officials to halt operations at xAIs dirty data center

Popular

Elon Musk tries to stick to spaceships

Largest data breach in US history: Three more lawsuits try to stop DOGE

Interactive global campaign to recreate scenes from the Minecraft film within the game itself

Investing in overlooked European ecosystems

FTC delays enforcement of click-to-cancel rule

After Klarna, Zooms CEO also uses an AI avatar on quarterly call

Related Articles

Elon Musk tries to stick to spaceships

Video game union announces first contract with Microsoft

Day 4 of TechCrunch Sessions: AI Trivia Countdown Flex your brain, score big on tickets

Our Cosmic Truth

Week in Review: Perplexity Labs wants to do your work

NAACP calls on Memphis officials to halt operations at xAIs dirty data center

Indias Tech Giants in Crisis: Can They Rise Again?

Left-leaning influencers embrace Bluesky without abandoning X, Pew says

About Us

Popular Category

Editor Picks

Eurosceptic nationalist Nawrocki chose Poland president in tight election

Zelensky applauds Ukraines security service for dazzling operation versus Russian warplanes

After 24-hour hackathon, Hugging Faces AI research agent nearly matches OpenAIs solution

Share

Choosing the right core AI model

The speed of open source AI

Related posts:

Popular

Related Articles

About Us

Popular Category

Editor Picks