Suppose you're an AI-doomer like Yudkowsky — you really care about humans surviving this century but think that nothing you can do will likely achieve your goal. It's an unfortunate fact of human psychology that when someone really cares about something but thinks that nothing they can do will likely achieve their goal, they sometimes do nothing rather than do the thing which most likely will achieve their goal. So there's a risk you give up on alignment — maybe you lie in bed all day with paralysing depression, or maybe you convert FAANG income into short-term pleasures. To avoid that, you have three options: change your psychology, change your beliefs, or change your goals. Which option is best?

  • Change your psychology. This would be the ideal option — if you can do that, then do that. But the historic track record suggests this is really hard.
  • Change your beliefs. This is called "hope", and it's a popular trick among many AI-doomers. Humans are surprisingly good at changing their beliefs. You change your belief from "there's nothing I can do which makes survival likely" to "there is something I can do which makes survival likely".
  • Change your goals. This is what Yudkwosky proposes. Humans are surprisingly good at changing their goals also. You change your goal from "humans survive this century" to "amount my actions increase the log-odds of that humans survive this century". Yudkowsky calls that new goal "dignity". The old goal had only two possible values,  or , but the new goal has possible values anywhere between  to 

Now, it's risky to change your beliefs or to change your goals — you might face a situation where the optimal policy after the change differs from the optimal policy before the change. But Yudkowsky thinks that changing your goals is less optimal-policy-corrupting than changing your beliefs.

Why's that? If you force yourself to believe something unlikely, then that inaccuracy can "leak" into other beliefs, because our beliefs are connected together by a web of inferences. But changing your goal from "survival" to "dignity" is like Trying to Try rather than trying — it's relatively less optimal-policy-corrupting.

51

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 2:06 AM

“Oh and btw, and while you are trying to increase the log-odds that humanity survives this century, don’t do anything stupid and rash that is way out-of distribution of normal actions. You are not some God who can do the full utilitarian calculus. If an action you are thinking about is far out-of-distribution and looks probably bad to a lot of people, it’s likely because it is. In other words, don’t naively take rash actions thinking it’s for the good of humanity. Default to 3/4 utilitarian.”

Connor Leahy’s opinion on the post (55:33): 

Yeah I mostly agree with Connor's interpretation of Death with Dignity.

I know a lot of the community thought it was a bad post, and some thought it was downright infohazardous, but the concept of "death with dignity" is pretty lindy actually. When a group of soldiers are fighting a battle with awful odds, they don't change their belief to "a miracle with save us", they change their goal to "I'll fight till my last breath".

If people find the mindset harmful, then they won't use it. If people find the mindset helpful, then they will use it. But I think everyone should try out the mindset for an hour or two.

Strongly upvoted. I unironically think it's a pretty good distillation (I listened to the original post in the background).

Strongly upvoted. I unironically think it's a pretty good distillation (I listened to the original post in the background).

New to LessWrong?