Interesting point about personality improvements being a "one-off unhobbling" with diminishing returns. But I wonder if this reflects a measurement bias rather than an actual capability ceiling: we have clear benchmarks for evaluating math skills - it's easy to measure 100x improvement when a model goes from solving basic algebra to proving novel theorems. But how do we quantify personality improvements? There's a vast gap between "helpful but generic" and "perfectly attuned to individual users' needs, communication styles, and thinking patterns."
I can ima...
I enjoyed the post. The framework challenged some of my core assumptions about AI progress, particularly given the rapid acceleration we’ve seen in the past few months with OpenAI’s GPT-o3 and Deep Research tool, and Anthropic’s Claude Code model. My mental model has been that rapid progress would continue, shortening AGI timelines—but your post makes me reconsider how much of that is genuine frontier expansion versus polish and UX improvements.
A few points where your arguments challenge my mental model and warrant further discussion:
One move is to notice when a question structurally demands an EFA and redirect. "What's your p(doom)?" and "What are your timelines?" are illustrative examples.
P(doom) conflates misalignment, misuse, structural risks, accidents—each with different threat models, different interventions, different probability estimates. You're not searching one space incompletely; you're searching multiple ill-defined spaces simultaneously. Better questions: "Doom meaning extinction, or including permanent dystopia/lock-in?", "This century, or ever?", or "From misalig... (read more)