What about negative effects on the symbiotic microbiome?
What if it's not worth seeking power? What if the world isn't worth taking over? Saints seem to devote their lives to teaching that it isn't worth getting caught up in ambitions and desires to control.
At first I was really surprised by this because it seemed weird, but I find myself wondering if it's actually quite similar to an analogous form of behavior in humans: stereotyping. The model jumps to the most "obvious" looking conclusion based on its associations without necessarily reflecting on what it's doing or why. This makes me wonder if building in such loops with guidance on how to think about its own training could mitigate these effects.
GPT-5.1 beating crystal in 108 hours is very interesting. I wonder why that's the case compared to Gemini 3 Pro, which took ~424.5 hours. Do you have any thoughts?
Yes, they both started with the same harness but there's room for each model to customize its own setup so I'm not sure how much they might have diverged over time. I have 4x speedup as probably an upper bound but I was only counting since the final 2.5 stable release in June, which might be too short. Gemini 2.5 has 6 badges now compared to yesterday, so it's probably too early to assume 4x is certain. But if it was 4x every 8 months then it should be able to match average human playtime by early 2027.
From the Gemini_Plays_Pokemon - Twitch:
"v2 centers on a smaller, flexible toolset (Notepad, Map Markers, code execution, on‑the‑fly custom agents) so Gemini can build exactly what it needs when it needs it."
"The AI has access to a set of built-in tools to interact with the game and its own internal state:
Custom Tools & Agents
The most powerful feature of the system is its ability to self-improve by creating its own tools and specialized agents. You can view the live Notepad and custom tools/agents tracker on GitHub.
Looking at the step count comparisons instead of time is interesting. Claude Opus 4.5 is currently at ~44,500 steps in Silph co., where it has been stuck for several days. So that should now be about 50% higher. The others look roughly right for Opus. It beat Mt. Moon in around 5 hours and was stuck at the Rocket Hideout for days.
I think the Gemini 3 Pro vs 2.5 Pro matchup in Pokemon Crystal was interesting. Gemini 3 cleared the game in ~424.5 hours last night while 2.5 only had 4/16 badges at 435 hours.
This is a really valuable post that clarifies some things I've found hard to articulate to people on each side. I think it's difficult for people to balance when to use each of these epistemic frames without getting too sucked into one. And I imagine most people use these to different degrees at different times even if they may not realize it or one is rarer for them.
Looking forward to what you write next!
Something similar I've been thinking about is putting models in environments with misalignment "temptations" like an easy reward hack and training them to recognize what this type of payoff pattern looks like (e.g. easy win but sacrifice principle) and NOT take it. Recent work shows some promising efforts getting LLMs to explain their reasoning, introspect, and so forth. I think this could be interesting to do some experiments with and am trying to write up my thoughts on why this might be useful and maybe what those could look like.
Gotta account for wordflation since the old days. Might have been 1000 back then
Have you thought about having the AI navigate stories/scenarios/environments in a CYOA fashion? It could involve picking between positive options and eventually opportunities to choose good options even when bad ones are easy or there is even strong pressure to choose them. Perhaps taking some inspiration from the kind of strategy used in Recontextualization Mitigates Specification Gaming https://arxiv.org/abs/2512.19027