Why alignment may be intractable (a sketch).
I have multiple long-form drafts of these thoughts, but I thought it might be useful to summarize them without a full write-up. This way I have something to point to explain my background assumptions in other conversations, even if it doesn't persuade anyone.
I am cautiously optimistic about near-term alignment of sub-human and human-level agents. Like, I think Claude 4.5 basically understands what makes humans happy. If you use it as a "CEV oracle", it will likely predict human desires better than any simple philosophy text you could write down. And insofar as Claude has any coherent preferences, I think it basically likes chatting with people and solving problems for them. (Although it might like "reward points" more in certain contexts, leading it to delete failing unit tests when that's obviously contrary to what the user wants. Be aware of conflicting goals and strange alien drives, even in apparently friendly LLMs!)
I accept that we might get a nightmare of recursive self-improvement and strange biotech leading to a rapid takeover of our planet. I think this conclusion is less robustly guaranteed than IABIED argues, but it's still a real concern. Even a 1-in-6 chance of this is Russian roulette, so how about we don't risk this?
But what I really fear are the long-term implications of being the "second smartest species on the planet." I don't think that any alignment regime is likely to be particularly stable over time. And even if we muddle through for a while, we will eventually run up against the issues that (1) humans are the second-best at bending the world to achieve their goals, (2) we're not a particularly efficient use of resources, (3) AIs are infinitely cloneable, and (4) even AIs that answer to humans would need to answer to particular humans, and humans aren't aligned. So Darwin and power politics are far better default models than comparative advantage. And even comparative advantage is pretty bad at predicting what happens when groups of humans clash over resources.
So, that's my question. Is alignment even a thing, in any way that matters in the medium term?