LESSWRONG
LW

556
arterst
0010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Is Gemini now better than Claude at Pokémon?
arterst5mo1-8

Now, Google claims Gemini 2.5 Pro has substantially surpassed Claude's progress on that benchmark.

 

The premise of this post appears to be a straw man - that is not what is claimed in either of the tweets at the top of the post. Similarly, I have seen precisely no one claim this is a rigorous test - it's obviously for fun. Why do you think it's not a win for a model to get this far through the game with a somewhat reasonably lightweight scaffolding? It seems rather gatekeep-y to require people to use something similar to ClaudePlaysPokemon, which 1) isn't open-source and 2) is so clearly deadly stuck in Mt. Moon.

Reply
No posts to display.