In my mind, interventions against s-risks from AI seem like the impartial top priority of our time, being more tractable, important, and neglected than alignment. Hence I’m surprised that they’re not as central as alignment to discussions of AI safety. This is a quick-and-dirty post to try to understand why so few people in the wider EA and AI safety community prioritize s-risks. (It’s a long-form version of this tweet.)
I’ll post a few answers of my own and, in some cases, add why I don’t think they are true. Please vote on the answers that you think apply or add your own.
I don’t expect to reach many people with this question, so please interpret the question as “Why do so few EAs/LWians care about s-risks from AI?” and not just “Why don’t you care about s-risks from AI?” So as a corollary, please feel free to respond even if you personally do care about s-risks!
(Here are some ways to learn more: “Coordination Challenges for Preventing AI Conflict,” “Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda,” and Avoiding the Worst (and s-risks.org).)
Some people have a particular idea for how to solve alignment and so have a strong personal fit for alignment research. Thank you for everything you’re doing! Please continue. This post is not for you.
But many others seem resigned, seem to have given up hope in affecting how it all will play out. I don’t think that’s necessary!
Tractability. With alignment we always try to align an AI with something that at least vaguely or indirectly resembles human values. So we’ll make an enemy of most of the space of possible values. We’re in an adversarial game that we’re almost sure to lose. Our only winning hand is that we’re early compared to the other agents, but just by a decade or two.
Maybe it’s just my agreeableness bias speaking, but I don’t want to be in an adversarial game with most superintelligences. Sounds hopeless.
That’s related to the deployment problem. If existing agents don’t want to be aligned, you have a deployment problem. (And you have to resort to morally ambiguous and highly intractable solutions like pivotal acts and long reflections to solve it.) If you have something to offer that they all want, you’ve solved the deployment problem.
Averting s-risks mostly means preventing zero-sum AI conflict. If we find a way (or many ways) to do that, every somewhat rational AI will voluntarily adopt them, because who wants to lose out on gains from trade. Our current earliness may be enough to seed public training data with any solutions we find and with Schelling points that they can use to coordinate.
Another intuition pump is that alignment aims at a tiny patch in value space whereas averting s-risks only aims to avert a bunch of outlier scenarios that shouldn’t be so hard to avert. When you’re at a shooting range, it’s much easier not to kill any of the people next to you than to hit the center of the target.
Importance. If I imagine trading extreme suffering for extreme bliss personally, I end up with ratios of 1 to 300 million – e.g., that I would accept a second of extreme suffering for ten years of extreme bliss. The ratio is highly unstable as I vary the scenarios, but the point is that I disvalue suffering many orders of magnitude more than I value bliss.
Clearly there are some people who feel differently, but the intuition that suffering is worse than bliss is good is widely shared. (And the factor doesn’t need to be as big as mine. Given the high tractability and neglectedness, averting s-risks from AI may even be interesting for somewhat positive-leaning utilitarians.)
Plus, a high-probability non-dystopic not-quite-utopia may be better in expectation than a lot of low-probability utopias with dystopic counterfactuals. But I guess that depends on countless details.
Arguably, extinction is somewhat more likely than dystopic s-risk lock-ins. But my guess is that s-risks are only a bit less likely than multipolar takeoffs, maybe 1–10% as likely, and that multipolar takeoffs are very likely, maybe 90%. (The GPT-3 to -4 “takeoff” has been quite slow. It could stop being slow at any moment, but while it’s still slow, I’ll continue updating towards month- or year-long takeoffs rather than minute-long ones.) As soon as there are multiple AIs, one coordination failure can be enough to start a war. Yes, maybe AIs are generally great at coordinating with each other. But that can be ruined by a single sufficiently powerful one that is not. (And sufficiently powerful can mean just, like, 1% as powerful as the others.) Anything from 0.1–10% s-risk between now and shortly after we have a superintelligence seems about right to me.
Neglectedness. Alignment is already critically neglected, especially the approaches that Tammy calls “hard alignment.” Paul Christiano estimated some numbers in this excellent Bankless podcast interview. S-risks from AI are only addressed by the Center on Long-Term Risk, to some extent by the Center for Reducing Suffering, and maybe incidentally by a number of other groups. So in total maybe 1/10th the number of people work on it. (But the ideal solution is not for people in alignment to switch to s-risks but for people outside both camps to join s-risk research!)
Interesting. Do I give off that vibe – here or in other writings?