Saturday, February 22, 2025
Homeagentic AIAfter 24-hour hackathon, Hugging Faces AI research agent nearly matches OpenAIs solution

After 24-hour hackathon, Hugging Faces AI research agent nearly matches OpenAIs solution

Share

On Tuesday, Hugging Face researchers released an open source AI research agent called “Open Deep Research,” created by an in-house team as a challenge 24 hours after the launch of OpenAI’s Deep Research feature, which can autonomously browse the web and create research reports. The project seeks to match Deep Research’s performance while making the technology freely available to developers.

“While powerful LLMs are now freely available in open-source, OpenAI didn’t disclose much about the agentic framework underlying Deep Research,” writes Hugging Face on its announcement page. “So we decided to embark on a 24-hour mission to reproduce their results and open-source the needed framework along the way!”

Similar to both OpenAI’s Deep Research and Google’s implementation of its own “Deep Research” using Gemini (first introduced in December—before OpenAI), Hugging Face’s solution adds an “agent” framework to an existing AI model to allow it to perform multi-step tasks, such as collecting information and building the report as it goes along that it presents to the user at the end.

The open source clone is already racking up comparable benchmark results. After only a day’s work, Hugging Face’s Open Deep Research has reached 55.15 percent accuracy on the General AI Assistants (GAIA) benchmark, which tests an AI model’s ability to gather and synthesize information from multiple sources. OpenAI’s Deep Research scored 67.36 percent accuracy on the same benchmark.

As Hugging Face points out in its post, GAIA includes complex multi-step questions such as this one:

Which of the fruits shown in the 2008 painting “Embroidery from Uzbekistan” were served as part of the October 1949 breakfast menu for the ocean liner that was later used as a floating prop for the film “The Last Voyage”? Give the items as a comma-separated list, ordering them in clockwise order based on their arrangement in the painting starting from the 12 o’clock position. Use the plural form of each fruit.

To correctly answer that type of question, the AI agent must seek out multiple disparate sources and assemble them into a coherent answer. Many of the questions in GAIA represent no easy task, even for a human, so they test agentic AI’s mettle quite well.

Choosing the right core AI model

An AI agent is nothing without some kind of existing AI model at its core. For now, Open Deep Research builds on OpenAI’s large language models (such as GPT-4o) or simulated reasoning models (such as o1 and o3-mini) through an API. But it can also be adapted to open-weights AI models. The novel part here is the agentic structure that holds it all together and allows an AI language model to autonomously complete a research task.

We spoke to Hugging Face’s Aymeric Roucher, who leads the Open Deep Research project, about the team’s choice of AI model. “It’s not ‘open weights’ since we used a closed weights model just because it worked well, but we explain all the development process and show the code,” he told Ars Technica. “It can be switched to any other model, so [it] supports a fully open pipeline.”

“I tried a bunch of LLMs including [Deepseek] R1 and o3-mini,” Roucher adds. “And for this use case o1 worked best. But with the open-R1 initiative that we’ve launched, we might supplant o1 with a better open model.”

While the core LLM or SR model at the heart of the research agent is important, Open Deep Research shows that building the right agentic layer is key, because benchmarks show that the multi-step agentic approach improves large language model capability greatly: OpenAI’s GPT-4o alone (without an agentic framework) scores 29 percent on average on the GAIA benchmark vs. OpenAI Deep Research’s 67 percent.

According to Roucher, a core component of Hugging Face’s reproduction makes the project work as well as it does. They used Hugging Face’s open source “smolagents” library to get a head start, which uses what they call “code agents” rather than JSON-based agents. These code agents write their actions in programming code, which reportedly makes them 30 percent more efficient at completing tasks. The approach allows the system to handle complex sequences of actions more concisely.

The speed of open source AI

Like other open source AI applications, the developers behind Open Deep Research have wasted no time iterating the design, thanks in part to outside contributors. And like other open source projects, the team built off of the work of others, which shortens development times. For example, Hugging Face used web browsing and text inspection tools borrowed from Microsoft Research’s Magnetic-One agent project from late 2024.

While the open source research agent does not yet match OpenAI’s performance, its release gives developers free access to study and modify the technology. The project demonstrates the research community’s ability to rapidly reproduce and openly share AI capabilities that were previously available only through commercial providers.

“I think [the benchmarks are] quite indicative for difficult questions,” said Roucher. “But in terms of speed and UX, our solution is far from being as optimized as theirs.”

Roucher says future improvements to its research agent may include support for more file formats and vision-based web browsing capabilities. And Hugging Face is already working on cloning OpenAI’s Operator, which can perform other types of tasks (such as viewing computer screens and controlling mouse and keyboard inputs) within a web browser environment.

Hugging Face has posted its code publicly on GitHub and opened positions for engineers to help expand the project’s capabilities.

“The response has been great,” Roucher told Ars. “We’ve got lots of new contributors chiming in and proposing additions. Kind of feels like catching the wave while surfing, the community really provides a strong force!”

Popular

Related Articles

The pain of discontinued items, and the thrill of finding them online

We’ve all been there. A favorite item is suddenly unavailable for purchase. Couldn’t...

iOS 18.4 will bring Apple Intelligence-powered Priority Notifications

Apple on Friday released its first developer beta for iOS 18.4, which adds...

Brian Armstrong says Coinbase spent $50M fighting SEC lawsuit and beat it

Coinbase on Friday said the SEC has agreed to drop the lawsuit against...

Meta, X approved ads containing violent anti-Muslim, antisemitic hate speech ahead of German election, study finds

Social media giants Meta and X approved ads targeting users in Germany with...
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x