AI utility-based correlation

Stuart_Armstrong

AI utility-based correlation

by Stuart_Armstrong

1 min read30th Oct 2015No comments

8

Personal Blog

A putative new idea for AI control; index here.

This presents one way of implementing the indifference-based correlations of these posts.

Let u be a utility function, a map from worlds to real numbers. An expected utility maximiser considering whether to produce output Y, looks at the expected utility

Σ_w u(w)P(w|Y).

We now assume that there is another random variable X in the world, and we want the AI to be indifferent to worlds where Y≠X. We also want it to be indifferent to worlds where Z=0. Then it will assess the value of output Y as:

Σ_w u(w)P(w|X=Y,Z=1,Y).

Now, the idea of the setup was to ensure that Z=1 would erase the output Y so that it was never read. Hence P(w|Z=1,Y) Hence that equation simplifies to:

Σ_w u(w)P(w|X=Y).

Therefore the AI will chose the Y that maximises the (conditional) expected utility of u if X=Y. To get the full version of the initial post, you need to define some function f of Y and modify this to

Σ_w u(w)P(w|X=Y) + f(Y).

Mentioned in

34New(ish) AI control ideas

11The Ultimate Testing Grounds

8Ask and ye shall be answered

New Comment

Moderation Log

LESSWRONG
LW

AI utility-based correlation

8

New to LessWrong?