A Bear Case: My Predictions Regarding AI Progress
This isn't really a "timeline", as such – I don't know the timings – but this is my current, fairly optimistic take on where we're heading. I'm not fully committed to this model yet: I'm still on the lookout for more agents and inference-time scaling later this year. But Deep Research, Claude 3.7, Claude Code, Grok 3, and GPT-4.5 have turned out largely in line with these expectations[1], and this is my current baseline prediction. The Current Paradigm: I'm Tucking In to Sleep I expect that none of the currently known avenues of capability advancement are sufficient to get us to AGI[2]. * I don't want to say the pretraining will "plateau", as such, I do expect continued progress. But the dimensions along which the progress happens are going to decouple from the intuitive "getting generally smarter" metric, and will face steep diminishing returns. * Grok 3 and GPT-4.5 seem to confirm this. * Grok 3's main claim to fame was "pretty good: it managed to dethrone Claude Sonnet 3.5.1 for some people!". That was damning with faint praise. * GPT-4.5 is subtly better than GPT-4, particularly at writing/EQ. That's likewise a faint-praise damnation: it's not much better. Indeed, it reportedly came out below expectations for OpenAI as well, and they certainly weren't in a rush to release it. (It was intended as a new flashy frontier model, not the delayed, half-embarrassed "here it is I guess, hope you'll find something you like here".) * GPT-5 will be even less of an improvement on GPT-4.5 than GPT-4.5 was on GPT-4. The pattern will continue for GPT-5.5 and GPT-6, the ~1000x and 10000x models they may train by 2029 (if they still have the money by then). Subtle quality-of-life improvements and meaningless benchmark jumps, but nothing paradigm-shifting. * (Not to be a scaling-law denier. I believe in them, I do! But they measure perplexity, not general intelligence/real-world usefulness, and Goodhart's Law is n
... Do people not do that already? LLMs having superhuman truesight, I always check whether I'm writing from the correct mindset when interacting with them, so as to ensure they're not adjusting their response for things I don't want them to adjust for. E. g., if I want an overview of a contentious issue where I'd prefer one of the sides to be correct, I deliberately aim to "fold away" the parts of my mindset that are polarized, and to write purely from a place of curiosity/truth-seeking instead.
IMO, HPMoR-style Occlumency should be a basic tool in your toolbox for LLM interactions.