85

LESSWRONG
Petrov Day
LW

84
Impact RegularizationUtility FunctionsAI
Frontpage

5

[ Question ]

"Do Nothing" utility function, 3½ years later?

by niplav
20th Jul 2020
1 min read
A
1
3

5

5

"Do Nothing" utility function, 3½ years later?
7Vika
1niplav
2Pattern
New Answer
New Comment

1 Answers sorted by
top scoring

Vika

Jul 20, 2020

70

Hi there! If you'd like to get up to speed on impact measures, I would recommend these papers and the Reframing Impact sequence.

Add Comment
[-]niplav5y10

Thanks for the links! I'll check them out.

Reply
Rendering 1/2 comments, sorted by
top scoring
(show more)
Click to highlight new comments since: Today at 4:29 AM
[-]Pattern5y20

I think there are proposals that (are hoped? with more research?) might lead to changeable utility functions, i.e. an agents won't try to stop you from changing their utility function.

'Don't self modify' utility functions, I don't think are around yet - the tricky part might be in getting the agent recognize itself, the goal, or something.

 

Most of what I've seen has revolved around thought experiments (with math).

Reply
Moderation Log
More from niplav
View more
Curated and popular this week
A
1
1
Impact RegularizationUtility FunctionsAI
Frontpage

In AI Alignment: Why It's Hard and Where To Start, at 21:21, Yudkowsky says:

If we want to have a robot that will let us press the suspend button—just suspend it to disk—we can suppose that we already have a utility function that describes: “Do nothing.” In point in fact, we don’t have a utility function that says, “Do nothing.” That’s how primitive the state of the field is right now. But, leaving that aside, it's not the hardest problem we're ever going to do, and we might have it in six months, for all I know.

I get the impression that there are some pointers to this in Attainable Utility Preservation (but saying "maximise attainable utility over this set of random utility functions" seems like it would just fire up instrumentally convergent drives), but I could be wrong.

So, 3½ years later, what is the state on "do nothing" utility functions?

Mentioned in
40Planning to build a cryptographic box with perfect secrecy