Why are we so complacent about AI hell?

[-]Richard_Ngo3y80

Flagging that Diffractor's work on threat-resistant bargaining feels like the most important s-risk-related work I've ever seen, but I also haven't thoroughly evaluated it so I'd love for someone to do so and write up their thoughts.

1Dawn Drescher3y

Woah, thanks! I hadn’t seen it!

[-]UHMWPE-UwU3y60

There's a new forum for this that seeks to increase discussion & coordination, reddit.com/r/sufferingrisk.

[-]Dagon3y56

Not really core to any of those communities, so I don't have specific answers. But I note that complacency is the human default for ANYTHING that doesn't have direct, obvious, immediate impact on an individual and their loved ones.

From nuclear war risks to repeated financial crises to massive money and power differentials, "why are we so complacent about X" is a common and valid question, rarely answered.

I'd recommend instead you frame it as a recommendation for specific action, not a question about attitude. "you, dear reader, should do Y next... (read more)

6Said Achmiz3y

This would seem to be at odds with “aim to inform, not persuade”. (Is that still a rule? I seem to recall it being a rule, but now I can’t easily find it anywhere…)

2Dagon3y

It's never been a rule, more of a recommendation, and it's more about avoiding "arguments as soldiers" than a literal formation. There are lots of exceptions, and I'd argue that it really should be "aim to learn" more than "aim to inform", though they're related. In any case, obfuscating advocacy in the form of a somewhat rhetorical question seems strictly worse than EITHER informing or persuading. It doesn't seem like anyone's trying to answer literally, they're answering related questions about the implied motivation of getting people to do something about S-risk.

1Dacyn3y

It's part of the "frontpage comment guidelines" that show up every time you make a comment. They don't appear on GreaterWrong though, which is why I guess you can't see them...

[-]gabo963y41

I'd like to add another question:

Why aren't we more concerned about s-risk than x-risk?

Given that virtually everyone would prefer dying rather than facing an indefinite amount of suffering for an indefinite amount of time, I don't understand why more people are asking this question.

3Eli Tyre3y

There's actually pretty large differences of perspective on this claim.

[-]Algon3y3-2

Personally, I have some deep psychological trauma related to pain and thinking about the topic is ... unproductive for me. Prolonged thinking about S-risks scares me, and I might not be able to think clearly about the topic. But maybe I could. The fear is what keeps me away. This is a flaw, and I'm unsure if it extends to other rationalists/EAs, but I'd guess people in these groups are unusually likely to have such scars because the LW memeplex is attractive to the walking wounded. I wouldn't be suprised if a few alignment researchers avoid s-risks for similair reasons.

[-]Mitchell_Porter3y-2-2

Averting s-risks mostly means preventing zero-sum AI conflict. If we find a way (or many ways) to do that, every somewhat rational AI will voluntarily adopt them, because who wants to lose out on gains from trade.

You're hoping to come up with an argument for human value, that will be accepted by any AI, no matter what its value system?

1Dawn Drescher3y

No, just a value-neutral financial instrument such as escrow. If two people can fight or trade, but they can’t trade, because they don’t trust each other, they’ll fight. That loses out on gains from trade, and one of them ends up dead. But once you invent escrow, there’s suddenly, in many cases, an option to do the trade after all, and both can live!

^{^}

Some people have a particular idea for how to solve alignment and so have a strong personal fit for alignment research. Thank you for everything you’re doing! Please continue. This post is not for you.

But many others seem resigned, seem to have given up hope in affecting how it all will play out. I don’t think that’s necessary!

^{^}

Tractability. With alignment we always try to align an AI with something that at least vaguely or indirectly resembles human values. So we’ll make an enemy of most of the space of possible values. We’re in an adversarial game that we’re almost sure to lose. Our only winning hand is that we’re early compared to the other agents, but just by a decade or two.

Maybe it’s just my agreeableness bias speaking, but I don’t want to be in an adversarial game with most superintelligences. Sounds hopeless.

That’s related to the deployment problem. If existing agents don’t want to be aligned, you have a deployment problem. (And you have to resort to morally ambiguous and highly intractable solutions like pivotal acts and long reflections to solve it.) If you have something to offer that they all want, you’ve solved the deployment problem.

Averting s-risks mostly means preventing zero-sum AI conflict. If we find a way (or many ways) to do that, every somewhat rational AI will voluntarily adopt them, because who wants to lose out on gains from trade. Our current earliness may be enough to seed public training data with any solutions we find and with Schelling points that they can use to coordinate.

Another intuition pump is that alignment aims at a tiny patch in value space whereas averting s-risks only aims to avert a bunch of outlier scenarios that shouldn’t be so hard to avert. When you’re at a shooting range, it’s much easier not to kill any of the people next to you than to hit the center of the target.

^{^}

Importance. If I imagine trading extreme suffering for extreme bliss personally, I end up with ratios of 1 to 300 million – e.g., that I would accept a second of extreme suffering for ten years of extreme bliss. The ratio is highly unstable as I vary the scenarios, but the point is that I disvalue suffering many orders of magnitude more than I value bliss.

Clearly there are some people who feel differently, but the intuition that suffering is worse than bliss is good is widely shared. (And the factor doesn’t need to be as big as mine. Given the high tractability and neglectedness, averting s-risks from AI may even be interesting for somewhat positive-leaning utilitarians.)

Plus, a high-probability non-dystopic not-quite-utopia may be better in expectation than a lot of low-probability utopias with dystopic counterfactuals. But I guess that depends on countless details.

Arguably, extinction is somewhat more likely than dystopic s-risk lock-ins. But my guess is that s-risks are only a bit less likely than multipolar takeoffs, maybe 1–10% as likely, and that multipolar takeoffs are very likely, maybe 90%. (The GPT-3 to -4 “takeoff” has been quite slow. It could stop being slow at any moment, but while it’s still slow, I’ll continue updating towards month- or year-long takeoffs rather than minute-long ones.) As soon as there are multiple AIs, one coordination failure can be enough to start a war. Yes, maybe AIs are generally great at coordinating with each other. But that can be ruined by a single sufficiently powerful one that is not. (And sufficiently powerful can mean just, like, 1% as powerful as the others.) Anything from 0.1–10% s-risk between now and shortly after we have a superintelligence seems about right to me.

^{^}

Neglectedness. Alignment is already critically neglected, especially the approaches that Tammy calls “hard alignment.” Paul Christiano estimated some numbers in this excellent Bankless podcast interview. S-risks from AI are only addressed by the Center on Long-Term Risk, to some extent by the Center for Reducing Suffering, and maybe incidentally by a number of other groups. So in total maybe 1/10th the number of people work on it. (But the ideal solution is not for people in alignment to switch to s-risks but for people outside both camps to join s-risk research!)

LESSWRONG
LW

LESSWRONG
LW

94

Why are we so complacent about AI hell?

94

94