Safe probability manipulation, superweapons, and stable self-improvement research

Stuart_Armstrong

Safe probability manipulation, superweapons, and stable self-improvement research

1 min read12th Feb 20151 comment

0

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

When I first discovered that even UDT would fail on certain "fair" problems (via "Evil" decision problems in provability logic), I was disappointed. It seemed to make things harder, in that there was no ideal decision theory, even on "fair" problems.

But then I realised I'd been handed a superweapon of extreme power. It's a tool that allows to constrain an AI of arbitrary intelligence to have the probabilistic beliefs that we desire.

For instance, take the " $X$ " procedure that I used in the rather under-defined "counterfactual" post. We want the real probability of $X = 0$ to be low, but for the AI to be unable to influence it (and, incidentally, unable to calculate it precisely).

Suppose that $X$ could be 0, 1, etc... up to $n$ . The AI runs a certain probability algorithm $P$ . Then the procedure for calculating $X$ is to run $P$ on $X = 0$ , $X = 1$ , etc... $X = n$ , and output the m for which P was lowest. In case of a tie, it outputs the highest number.

It seems clear that the AI must put $P (X = 0) = P (X = 1) = . . . = P (X = n) = 1 / n$ , however we know that $X = n$ (since the values are tied).

Basically we can use this to force the AI to be unable to influence or estimate certain events, even if it's much smarter than us.

This might provide a partial answer to Paul Christiano's point about the point of researching stable self-improvement. Because this kind of trick is not easily stable under self-improvement (eg if the AI upgrades $P$ to $P^{'}$ which can calculate stuff about $P$ ).

Similarly, my ideas about reduced impact require using the AI's predictive ability against the AI.

These seem to be cases where we can't just build a human level AI and "let it sort out stable self-improvement". Because here we've added constraints that the AI might "want" to get rid of, and that it potentially could. So it would be useful to know these constraints are stable.

Personal Blog

0

New Comment

1 comment, sorted by

top scoring

Click to highlight new comments since: Today at 11:04 AM

[-]paulfchristiano9yΩ000

(This was used as the source of pseudorandomness here, and I think the idea has been floating around the LW community for a while.)

Reply

Moderation Log

LESSWRONG
LW

Safe probability manipulation, superweapons, and stable self-improvement research

0

New to LessWrong?

0