Sunday, May 4, 2025
HomeAIGoogles Gemini has beaten Pokmon Blue (with a little help)

Googles Gemini has beaten Pokmon Blue (with a little help)

Share


Google’s most expensive AI model seems to have crossed a major milestone: Beating a 29-year-old video game.

Last night, Google CEO Sundar Pichai posted triumphantly on X, “What a finish! Gemini 2.5 Pro just completed Pokémon Blue!”

To be clear, the Gemini Plays Pokemon livestream was created by (in his own words) “a 30 year old software engineer unaffiliated with Google” who goes by Joel Z. But Google executives have been cheering the effort on.

For example, Logan Kilpatrick, the product lead for Google AI Studio, posted last month that Gemini was “making great progress at completing Pokémon” and had “earned its 5th badge (next best model only has 3 so far, though with a different agent harness),” leading Pichai to joke, “We are working on API, Artificial Pokémon Intelligence:)”

Why Pokémon? Back in February, Anthropic highlighted progress that its Claude AI models were making in “Pokémon Red,” writing that Claude’s “extended thinking and agent training” gives it “a major boost” on “more unexpected” tasks, like playing a classic game. (“Pokémon Red” and “Blue” are different versions of a GameBoy title first released in 1996 and tied to the long-running Pokémon franchise). There’s even a Claude Plays Pokemon Twitch channel that Joel Z cited as an inspiration.

Despite its progress, Claude does not appear to have beaten “Pokémon Red” yet. Does that mean Gemini is objectively better at the game? On his Twitch page, Joel Z urged viewers, “Please don’t consider this a benchmark for how well an LLM can play Pokemon. You can’t really make direct comparisons — Gemini and Claude have different tools and receive different information.”

And both AI models need help to play the game — that’s where the aforementioned agent harnesses come in, providing the models with game screenshots overlaid with additional information, allowing the model to decide how to respond (which may involve calling specialized agents), and then pressing the button that corresponds with the AI’s instruction.

Techcrunch event

Exhibit at TechCrunch Sessions: AI

Secure your spot at TC Sessions: AI and show 1,200+ decision-makers what you’ve built — without the big spend. Available through May 9 or while tables last.

Exhibit at TechCrunch Sessions: AI

Secure your spot at TC Sessions: AI and show 1,200+ decision-makers what you’ve built — without the big spend. Available through May 9 or while tables last.

Berkeley, CA | June 5

BOOK NOW

Joel Z acknowledged that there were other “dev interventions” to help Gemini complete the game, but insisted that it’s not cheating.

“My interventions improve Gemini’s overall decision-making and reasoning abilities,” he says. “I don’t give specific hints — there are no walkthroughs or direct instructions for particular challenges like Mt. Moon. The only thing that comes even close is letting Gemini know that it needs to talk to a Rocket Grunt twice to obtain the Lift Key, which was a bug that was later fixed in Pokemon Yellow.”

Plus, he said, “Gemini Plays Pokémon is still actively being developed, and the framework continues to evolve.”

Popular

Related Articles

Week in Review: Apple wont raise prices yet

Welcome back to Week in Review! We’ve got lots of news for you...

Temu stops shipping products from China to the U.S.

Chinese retailer Temu has shifted strategy in the face of U.S. tariffs. ...

Warren Buffett will step down as Berkshire Hathaway CEO

Legendary investor Warren Buffett announced today that he will ask the Berkshire Hathaway...

How Riot Games is fighting the war against video game hackers

For as long as there have been video games, there have been people...

TechCrunch Mobility: Aurora launches its driverless commercial trucking service, and a surprise bidder joins Canoos bankruptcy case

Welcome back to TechCrunch Mobility — your central hub for news and insights...

AI chatbots are juicing engagement instead of being useful, Instagram co-founder warns

Instagram co-founder Kevin Systrom says AI companies are trying too hard to “juice...

Aurora launches its driverless commercial trucking service, and a surprise bidder joins Canoos bankruptcy case

Welcome back to TechCrunch Mobility — your central hub for news and insights...

OpenAI pledges to make changes to prevent future ChatGPT sycophancy

OpenAI says it’ll make changes to the way it updates the AI models...
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x