IMO worrying about s-risks is a natural ally of worrying about x-risks and alignment to capture the potential immense upside of human-aligned AGI. They're different perspectives on the same primary problem: how do we align our first AGIs and society around them?
I'd say there's no consensus on s-risks (as with most things in alignment and AGI prediction. They're young and fast-moving fields, and probably dramatically understaffed).
I think these concerns are important and neglected- but most things in alignment theory are currently neglected because we don't have enough funding or volunteer experts.
My own opinion is that these risks are somewhat neglected, because there's a common attitude that they're very unlikely relative to either extinction or successful alignment. When we were thinking about abstract models of AGI as a utility maximizer, it seemed pretty plausible that you'd have to "flip the sign" by accident, and somehow get your alignment to work backward so that your AGI cared a lot about humans, but wanted the worst for them instead of the best. If alignment failed in the infinite other ways, where it cared about anything other than humans, you'd just get human extinction as it used our habitat and our atoms for other things.
With updated, nuanced views about the future, s-risks can happen in more ways that are more plausible. I'm particularly worried about -intent-aligned ASI under the control of a sadistic, sociopathic human - there are theories that sociopathy is overrepresented in powerful humans, and I'm worried they're true. Sociopathy doesn't guarantee sadism, but it removes/reduces the empathy that usually counterbalances sadism.
If we fail alignment in other ways, I'm afraid there are broader chances for massive s-risks. Curiosity directed at humans implies equal interest in how humans can experience joy and suffering. (e.g., Musk's alignment suggestion - that guy is a paradox of brilliance and idiocy).
I think there's a common misunderstanding that s-risks dwarf happiness opportunities, because suffering is more intense or "goes higher on the dial" than joy/happiness. The first part of the reasoning is sound: humans do seem to have a strong negativity bias and experience pain more strongly than pleasure because death or crippling injury is so much worse from evolution's perspective than any pleasure. It's game over; every pleasure is just a step in the right direction.
BUT that's our starting point from evolution. We're not stuck with that. If we get the glorious transhumanist future, I fully expect us to sooner or later rewire our minds so we can experience a lot more pleasure. David Pearce has talked about this as "gradients of pleasure" replacing the negative emotions that currently drive much of our average cognition and decision-making.
At least that's a possibility, which makes the amount we stand to win as large as the (approximate infinity) we have to lose.
Anyway, that's my two cents. Reasonable (informed) people could disagree.
Here are some more resources from a quick search of LW:
New book on s-risks - as of 2022 - I don't know how much this covers AGI because the short summary doesn't talk about it
S-Risks: Fates Worse Than Extinction - has more refs
Risks of Astronomical Suffering (S-risks) - the LW wikitag has lots of resources.
Thanks for your detailed and nuanced answer. I really appreciate how you distinguish between different forms of misalignment and how s-risks fit within that picture. Your comment helped clarify a lot.
If you have time, I’d love to hear you expand a bit more on the likelihood of s-risks relative to other AGI outcomes. You mentioned that s-risks seem much less likely than extinction or successful alignment, but could you give a rough probability estimate (even if it’s just an intuitive order-of-magnitude guess, like “1 in a thousand” or “1 in a million”)?
It w...
I’ve been reading about existential risks from advanced AI systems, including the possibility of “worse-than-death” scenarios sometimes called suffering risks (“s-risks”). These are outcomes where a misaligned AI could cause immense or astronomical amounts of suffering rather than simply extinguishing humanity.
My question: Do researchers working on AI safety and longtermism have any informed sense of the relative likelihood of s-risks compared to extinction risks from unaligned AI?
I’m aware that any numbers here would be speculative, and I’m not looking for precise forecasts, but for references, models, or qualitative arguments. For example:
Do most experts believe extinction is far more likely than long-lasting suffering scenarios?
Are there published attempts to put rough probabilities on these outcomes?
Are there any major disagreements in the field about this?
I’ve come across Kaj Sotala’s “Suffering risks: An introduction” and the work of the Center on Long-Term Risk, but I’d appreciate more recent or deeper resources.
Thanks in advance for any guidance.