x

LESSWRONG

LW

arterst — LessWrong

arterst

arterst

Message

1

1y

arterst

1y

Is Gemini now better than Claude at Pokémon?

Now, Google claims Gemini 2.5 Pro has substantially surpassed Claude's progress on that benchmark.

The premise of this post appears to be a straw man - that is not what is claimed in either of the tweets at the top of the post. Similarly, I have seen precisely no one claim this is a rigorous test - it's obviously for fun. Why do you think it's not a win for a model to get this far through the game with a somewhat reasonably lightweight scaffolding? It seems rather gatekeep-y to require people to use something similar to ClaudePlaysPokemon, which 1) isn't open-source and 2) is so clearly deadly stuck in Mt. Moon.