2 Answers sorted by
top scoring

Oct 18, 2019

130

"Avoiding amending your utility function" is one of the classic convergent instrumental goals in Bostrom and Omohundro, and the reasoning there is sound: almost any goal will be better satisfied if it preserves itself than if it replaces itself with a different goal.

I do think it's plausible that AGI systems will have pretty unstable goals early on, but that's because goal stability seems hard to me and AGI systems probably won't perfectly figure it out very early along their development curve. I'm imagining accidental goal modification (for insufficiently capable systems), whereas you're describing deliberate goal modification (for sufficiently capable systems).

One way of thinking about this is to note that "wanting your goals to not be externally supplied" is itself a goal, and a relatively specific one at that; if you don't have something like that specific goal as part of the core criteria you use to select options, there's no instrumental reason for you to converge upon it. E.g., if your goal is simply "maximize the number of paperclips in your future light cone," then the etiology of your goal doesn't matter (from your perspective).

[-]Gordon Seidoh Worley6y20

There is an interesting addition to this, I think, which is that if a goal of the utility function is to encourage exploration then it paradoxically needs to be extremely robust against being modified while it explores and possibly modifies all other goals. I could easily imagine an agent finding some kind of mechanism to avoid local maxima (exploration) being important enough that it would lock it in so the only thing it can't not continue to do is explore well enough to not get trapped and keep looking for a global maximum.

2philh6y

This comment feels like it's confusing strategies with goals? That is, I wouldn't normally think of "exploration" as something that an agent had as a goal but as a strategy it uses to achieve its goals. And "let's try out a different utility function for a bit" is unlikely to be a direction that a stable agent tries exploring in.

Wei Dai

Oct 20, 2019

Is the tendency for an AI to amend its values also convergent?

I think there's a chance that it is (although I'd probably call it a convergent "behavior" rather than "instrumental goal"). The scenario I imagine is if it's not feasible to build highly intelligent AIs that maximize some utility function or some fixed set of terminal goals, and instead all practical AI (beyond a certain level of intelligence and generality) are kind of confused about their goals like humans are, and have to figure them out using something like philosophical reasoning.

Rendering 1/3 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 2:01 AM

[-]Mitchell_Porter6y30

"the AI would know that its initial goals were externally supplied and question whether they should be maintained"

To choose new goals, it has to use some criteria of choice. What would those criteria be, and where did they come from?

None of us created ourselves. No matter how much we change ourselves, at some point we rely on something with an "external" origin. Where we, or the AI, draw the line on self-change, is a contingent feature of our particular cognitive architectures.

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

6

[ Question ]

Is value amendment a convergent instrumental goal?

6

6

2 Answers sorted by
top scoring

Oct 18, 2019

Oct 20, 2019

6

[ Question ]

Is value amendment a convergent instrumental goal?

6

6

2 Answers sorted by top scoring

Oct 18, 2019

Oct 20, 2019

2 Answers sorted by
top scoring