Overall I'm really happy with this post.
It crystallized a bunch of thoughts I'd had for a while before this, and has been useful as a conceptual building block that's fed into my general thinking about the situation with AI, and the value of accelerating tools to improve epistemics and coordination. I often find myself wanting to link people to it.
Possible weaknesses:
This was written a few months after Situational Awareness. I felt like there was kind of a missing mood in x-risk discourse around that piece, and this was an attempt to convey both the mood and something of the generators of the mood.
Since then, the mood has shifted, to something that feels healthier to me. 80,000 Hours has a problem profile on extreme power concentration. At this point I mostly wouldn't link back to this post (preferring to link e.g. to more substantive research), although I might if I just really wanted to convey the mood to someone. I'm not really sure whether my article had any counterfactual responsibility for the research people have done in the interim.
I'm happy with this post. I think it captures something meta-level which is important in orienting to doing a good job of all sorts of work, and I occasionally want to point people to this.
Most of the thoughts probably aren't super original, but for something this important I am surprised that there isn't much more explicit discussion -- it seems like it's often just talked about at the level of a few sentences, and regarded as a matter of taste, or something. For people who aspire to do valuable work, I guess it's generally worth spending a few hours a year explicitly thinking about the tradeoffs here and how to navigate them in particular situations -- and then probably worth at least a bit of scaffolding or general thinking about the topic.
I like this post and am glad that we wrote it.
Despite that, I feel keenly aware that it's asking a lot more questions than it's answering. I don't think I've got massively further in the intervening year in having good answers to those questions. The way this thinking seems to me to be most helpful is as a background model to help avoid confused assumptions when thinking about the future of AI. I do think this has impacted the way I think about AI risk, but I haven't managed to articulate that well yet (maybe in 2026 ...).
Looking back, I have mixed feelings about this post (and series).
On the one hand, I think they're getting at something really important. Rereading them, I feel like they're pointing to a stance I aspire to inhabit, and there's some value in the pointers they're giving. I'm not sure that I know better content on quite this topic.
On the other hand, they feel ... kind of slightly half-baked, or naming something-in-the-vicinity of what matters, rather than giving the true name of the thing. I don't find myself naturally drawn to linking people to this, because I feel a dissatisfaction that it's something-like-rambling rather than being the best version of itself.
I do still hope that someone else thinks about this material, reinvents a better version for themself, and writes about that.
No competition open right now, but I think there's an audience (myself included) for more good thoughts on this topic, if you have something that feels like it might be worth sharing.
Of course I'm into trying to understand things better (and that's a good slice of what I recommend!), but:
It seems fine to me to have the goalposts moving, but then I think it's important to trace through the implications of that.
Like, if the goalposts can move then this seems like perhaps the most obvious way out of the predicament; to keep the goalposts ever ahead of AI capabilities. But when I read your post I get the vibe that you're not imagining this as a possibility?
If we are going to build these agents without "losing the game", either (a) they must have goals that are compatible with human interests, or (b) we must (increasingly accurately) model and enforce limitations on their capabilities. If there's a day when an AI agent is created without either of these conditions, that's the day I'd consider humanity to have lost.
Something seems funny to me here.
It might be to do with the boundaries of your definition. If humans agents are getting empowered by strategically superhuman (in an everyday sense) AI systems (agentic or otherwise), perhaps that raises the bar for what counts as superhuman for the purposes of this post? If so I think the argument would make sense to me, but it feels a bit funny to me to have this definition which is such a moving goalpost, and also might never get crossed even as AI gets arbitrarily powerful.
Alternatively, it might be that your definition is kind of an everyday one, but in that case your conclusion seems pretty surprising. Like it seems easy to me to imagine worlds where there are some agents without either of those conditions, but that they're not better than the empowered humans.
Or perhaps something else is going on. Just trying to voice my confusions.
I do appreciate the attempt to analyse which kinds of capabilities are actually crucial.
Oh nice, I kind of vibe with "meditation on a theme" as a description of what this post is doing and failing to do.