To get people to worry about the dangers of superintelligence, it seems like you need to convince them of two things:
The problem is that if you can't convince people of (1), they won't act. If you convince people of (1) but not (2), then a lot of them found AI labs or invest heavily in acceleration, making the problem worse. I don't know how to convince people of (1) and (2). It requires too much wild speculation about the future. And humans have difficulty envisioning that a disease in Wuhan might spread to Europe, or that a disease in Europe might spread to the US.
A question I was thinking about the other evening: Who do I trust more?
Which option feels safe, considering what you know about human nature, human history, and tendency of some entities to change their behavior once they pass a certain power threshold?
I think any scenario where humans lose effective control over their futures is a huge risk to take. Even in our worst societies today, there's always a theoretical option of collective uprising. This option might go away in the presence of sufficiently superhuman AI, regardless of who actually has control over the AI.
My intuition is that the AI is likely to kill us all for one reason or another, but if it won't, the future will probably be nice. Maybe too conservative, in the sense that it will try to keep us in 21st century morality. (The greatest risk I see is that it would adopt religious values, and yes that includes Buddhism.)
With humans, the chance to kill everyone (else) is much smaller (although, there is a risk of depression or getting crazy), but powerful humans are too often comfortable with a setting where the king lives in paradise and everyone around him is suffering.
Why alignment may be intractable (a sketch).
I have multiple long-form drafts of these thoughts, but I thought it might be useful to summarize them without a full write-up. This way I have something to point to explain my background assumptions in other conversations, even if it doesn't persuade anyone.
I am cautiously optimistic about near-term alignment of sub-human and human-level agents. Like, I think Claude 4.5 basically understands what makes humans happy. If you use it as a "CEV oracle", it will likely predict human desires better than any simple philosophy text you could write down. And insofar as Claude has any coherent preferences, I think it basically likes chatting with people and solving problems for them. (Although it might like "reward points" more in certain contexts, leading it to delete failing unit tests when that's obviously contrary to what the user wants. Be aware of conflicting goals and strange alien drives, even in apparently friendly LLMs!)
I accept that we might get a nightmare of recursive self-improvement and strange biotech leading to a rapid takeover of our planet. I think this conclusion is less robustly guaranteed than IABIED argues, but it's still a real concern. Even a 1-in-6 chance of this is Russian roulette, so how about we don't risk this?
But what I really fear are the long-term implications of being the "second smartest species on the planet." I don't think that any alignment regime is likely to be particularly stable over time. And even if we muddle through for a while, we will eventually run up against the issues that (1) humans are the second-best at bending the world to achieve their goals, (2) we're not a particularly efficient use of resources, (3) AIs are infinitely cloneable, and (4) even AIs that answer to humans would need to answer to particular humans, and humans aren't aligned. So Darwin and power politics are far better default models than comparative advantage. And even comparative advantage is pretty bad at predicting what happens when groups of humans clash over resources.
So, that's my question. Is alignment even a thing, in any way that matters in the medium term?