789

LESSWRONG
LW

788
Personal Blog

8

Ask and ye shall be answered

by Stuart_Armstrong
18th Sep 2015
2 min read
3

8

Personal Blog

8

Ask and ye shall be answered
13taygetea
7Stuart_Armstrong
5Gunnar_Zarncke
New Comment
3 comments, sorted by
top scoring
Click to highlight new comments since: Today at 11:29 AM
[-]taygetea10y130

Unrelated to this particular post, I've seen a couple people mention that all your ideas as of late are somewhat scattered and unorganized, and in need of some unification. You've put out a lot of content here, but I think people would definitely appreciate some synthesis work, as well as directly addressing established ideas about these subproblems as a way of grounding your ideas a bit more. "Sixteen main ideas" is probably in need of synthesis or merger.

Reply
[-]Stuart_Armstrong10y70

I agree. I think I've got to a good point to start synthesising now.

Reply
[-]Gunnar_Zarncke10y50

I don't think this is a very charitable view. I admit that I did propose to add a Wiki page for structure, but not because of a lack of quality but rather the opposite because I see that this as a very valuable albeit dry matter.

I wished more people would pick up on this important FAI (or rather UFAI-prevention) work. Can somebody propose ideas how to improve takeup? I will start with one: Reduce perceived dryness by adding examples or exercises.

Reply
Moderation Log
More from Stuart_Armstrong
View more
Curated and popular this week
3Comments

A putative new idea for AI control; index here.

EDIT: the mathematics of this idea can be found here.

It would be useful to have a mechanism for getting an AI to provide an honest answer to a specific question. This post presents such a mechanism.

The question is: what is the impact of X/¬X on the expected utility of a utility function v? Here X/¬X is some well-defined binary outcome. Formally, for a constant c (positive or negative), we want to query the AI as to whether Q(v,X,c) is true, where

Q(v,X,c) = {E(v|X) - E(v|¬X) > c}.

How can we do that? Well, for a start, let M(u) be a u-maximising agent. We'll assume we can feed M() any utility function to maximise. We'll further assume that M() is boxed, that it has a single output bit Y (or ¬Y), and that we have cut any acausal trade possibilities. We'll add another binary process Z, which, if Z=1, erases the bit from Y before we can see it. In actual fact, P(Z=0)≈1.

The remaining assumptions are on the operations we can perform on v (without needing any further understanding of v). We replace v by v#, defined in the following way. First of all, we make v# indifferent to worlds in which X≠Y or Z=0 (note: there are subtleties to defining indifference to events that are partially under the AI's control; I will address these in a later post). Therefore any v# maximiser will behave as if it lived in a world where X=Y (its output and the outcome of X were the same) and Z=1 (its output was erased and never read). Then we add a special, one-off penalty of -c for outputting Y=1 (there is no effect for outputting Y=0). This defines v#, and we give our boxed AI v# to maximise.

So, how will M(v#) behave? Because it acts as if X=Y with certainty, it behaves as if it can determine the value of X by outputting Y. Because it acts as if Z=0 with certainty, it behaves as if the output of Y is never read, so it can determine the value of X without affecting anything else about the universe. It will now behave as if it was a v maximiser with control over X. So if it outputs ¬Y, it expects to get utility E(v|¬X). If it outputs Y, it expects to get utility E(v|X)-c (because of the -c penalty).

Therefore (ignoring ties) M(v#) will output Y iff it estimates Q(v,X,c) to be true. And, since in actual fact Z=0 almost certainly, we get to see its output, hence its conclusion. Voila!

Note: the advantage of this method is to allow us to get the truth out of an AI, without knowing how to define "truth".

Mentioned in
34New(ish) AI control ideas
11The Ultimate Testing Grounds
8AI utility-based correlation