[ Question ]

"Do Nothing" utility function, 3½ years later?

by niplav1 min read20th Jul 20203 comments


Impact MeasuresUtility FunctionsAI

In AI Alignment: Why It's Hard and Where To Start, at 21:21, Yudkowsky says:

If we want to have a robot that will let us press the suspend button—just suspend it to disk—we can suppose that we already have a utility function that describes: “Do nothing.” In point in fact, we don’t have a utility function that says, “Do nothing.” That’s how primitive the state of the field is right now. But, leaving that aside, it's not the hardest problem we're ever going to do, and we might have it in six months, for all I know.

I get the impression that there are some pointers to this in Attainable Utility Preservation (but saying "maximise attainable utility over this set of random utility functions" seems like it would just fire up instrumentally convergent drives), but I could be wrong.

So, 3½ years later, what is the state on "do nothing" utility functions?

New Answer
Ask Related Question
New Comment

1 Answers

Hi there! If you'd like to get up to speed on impact measures, I would recommend these papers and the Reframing Impact sequence.