New Answer

New Comment

2 Answers sorted by
top scoring

Nov 01, 2019

I'm inclined to think there is no problem here because the belief that [Dave] has about being in a simulation is unfounded as it's exactly the same situation Dave finds himself in later when PAL takes route B. That is, taking route B then seems to not be evidence about being in a simulation as you suggest, even if PAL normally takes route A and is highly reliable, because it could just as easily be that Dave is seeing the result of PAL acting on a simulation involving [Dave] causing PAL to prefer route B (assuming there is only one level of simulation; if there's reason to believe there's more than one level we start to tip in favor of simulation).

[-]Koen.Holtman6y10

Thank you G Gordon and all other posters for your answers and comments! A lot of food for thought here... Below, I'll try to summarize some general take-aways from the responses.

My main question was if the simulation epiphany problem had been resolved already somewhere. It looks like the answer is no. Many commenters are leaning towards the significance of case 2. above. I myself also feel this 2. is very significant. Taking all comments together, I am starting to feel that the simulation epiphany problem should be disentangled into two separate p... (read more)

Isnasene

Nov 01, 2019

Happy Halloween!

This story reminds me a little bit of my comment on Parable of the Predict-O-Matic. Similarities include:

An AI is trying to answer X
Answering X correctly will be a boon to the AI's objective function
The AI can act in a way that increases the likelihood of correctly answering X

In your example, X is the question "Will Dave help me achieve my objective?" In the parable of the Predict-O-Matic, X is more directly "Will my prediction be accurate?"

In both cases, there is a fixed-point/self-fulfilling prophecy where the AI takes an action (going an unusual route/making an unusual prediction) that is expected to improve the objective function in an unexpected way (the unusual route is less efficient in general than the usual route/the prediction affects the outcome).

As for your scenarios...

As PAL's world model gets better, its simulation runs will have [Dave] experiencing simulation epiphanies more and more often. These epiphanies introduce unwanted noise in the accuracy of PAL's predictions, because the predictions are supposed to be about the real world, not about what happens in simulations.

The purpose of PAL's models is to reflect the real world. If PAL regularly simulates Simulation Epiphanies but doesn't observe them in reality, PAL will just directly update their model to not predict Simulation Epiphanies. If PAL cannot update the simulations for whatever reason though, PAL will do their best to get humans to align with their predictions.

An opposite conclusion is that there is no prediction error at all. Whenever [Dave] has a simulation epiphany, the real Dave will have one too. PAL is not getting inaccurate, it is getting more intelligent. It has just found a new way to get to the coffee machine faster.

I tend to lean toward this conclusion. However, your story, it seems that PAL can only get away with this once. After all, once Dave helps PAL get to the coffee machine once and notices that he still exists (ie, PAL has chosen to end the simulation instead of starting a new one with updated knowledge on Dave's behavior), he will likely no longer believe that he is in a simulation. There is a way around this though: PAL could get around this if they are constantly maintaining a simulation of Dave or convinces Dave that this is happening.

If we believe this new way does not align with human values, then we can pre-emptively fix this problem by adding a penalty term to PAL's utility function, to heavily down-ranks outcomes where simulation epiphanies happen.

I want to caution you that, while this particular instance of the problem (PAL knowing that they can manipulate Dave into doing what they want by making him believe that he's being simulated) can be pre-empted. The general problem of PAL solving their objective by behaving in ways that manipulate Dave remains unsolved. If you're interested in learning about preventing AI from optimizing its objective in ways you don't want it to, partial agency is something to look at.

Let's assume that Dave has read the above story too, because it was printed in PAL's user manual, and that Dave believes that 2. is the right conclusion. So when [Dave] sees PAL take route B, he will think it most likely that he is still Dave, and that PAL is just trying to trick him into experiencing a simulation epiphany.

Of course, if PAL predicts that Dave thinks he could get manipulated by Simulation Epiphanies, they won't try the trick in the first place.

But if PAL predicts that Dave predicts PAL would not try to trick him with epiphanies, then PAL will try the trick.

This may create an infinite regress of Dave and PAL trying to predict what level the other is trying to trick them at: A riddle artfully depicted in The Princess Bride.

The above is all wrong. The real problem here is that [Dave]'s mind contains information that allows [Dave] to predict that PAL will take route A, and this information interferes with getting a correct result in a simulation where PAL takes route B instead. In other words, we have a 5-and-10 style problem.

I don't think this is quite a 5-10 style problem. The 5-10 problem involves an agent trying to decide on the value of different actions when the counterfactual actions themselves can be taken as evidence of what is valuable.

However this problem is about an agent trying to reason about another being (Dave) who may or may not be correct about whether he is in a simulation and may or may not run to help PAL if he believes that he is in one. As a result, it's more Princess Bride style than anything else.

The above reasoning cannot be correct because it implies that an agent using a less accurate world model containing slightly lobotomized humans will become smarter and/or more aligned.

Generally, agents that are smarter are not necessarily more aligned (and often the two are anti-correlated). In the context of this problem though, I don't think that the AI needs to limit its models of humans; it just needs to accurately model Dave. Correctly predicting simulation epiphanies indicates an accurate model and incorrectly predicting them indicates an inaccurate model.

If Dave and [Dave] can never prove it when they are in a simulation, then we can show that some of the conclusions above become invalid. But here is modified version of the story. [Dave] sees PAL moving along route A, but then he suddenly notices that he can only see 5 different colors around him, and everything looks like it is made out of polygons...

If Dave and [Dave] could prove that they're in simulations and in fact go on to do this in actual simulations, this indicates that PAL is not able to simulate Dave and his environment well enough to make good predictions. PAL will consequently give wrong predictions and try to build a better model of the world. It's also worth noting that, if the simulation world is in five colors and is made out of polygons, then [Dave] likely has not been simulated in enough detail to notice that those things are unusual.

[-]Koen.Holtman6y20

However, your story, it seems that PAL can only get away with this once. After all, once Dave helps PAL get to the coffee machine once and notices that he still exists (ie, PAL has chosen to end the simulation instead of starting a new one with updated knowledge on Dave's behavior), he will likely no longer believe that he is in a simulation.

Thanks for pointing this out, it had not occurred to me before. So I conclude that when assessing possible risks and countermeasures here, we must to take into account interaction scenarios involving longer time-frames.

Rendering 9/11 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 9:34 PM

[-]cousin_it6y*70

I think this AI design has a bigger problem. Imagine PAL is choosing whether to give Dave regular coffee or poisoned coffee that causes a lot of suffering. If PAL simulates both scenarios, that causes a lot of simulated suffering. Bostrom called this problem "mind crime".

[-]Koen.Holtman6y30

You are right that there is a potential "mind crime" ethical problem above.

One could argue that, to build an advanced AGI that avoids "mind crime", we can equip the AGI with a highly accurate predictor, but this predictor should be implemented in such a way that it is not actually a highly accurate simulator. I am not exactly sure how one could formally define the constraint of 'not actually being a simulator'. Also, maybe such an implementation constraint will fundamentally limit the level of predictive accuracy (and therefore intelligence) that can be achieved. Which might be a price we should be willing to pay.

Mathematically speaking, if I want to make AGI safety framework correctness proofs, I think it is valid to model the 'highly accurate predictor that is not a simulator' box inside the above AGI as a box with an input-output behavior equivalent to that of a highly accurate simulator. This is a very useful short-cut when making proofs. But it also means that I am not sure how one should define 'not actually being a simulator'.

[-]AprilSR6y10

Do we need it to predict people with high accuracy? Humans do well enough at our level of prediction.

[-]Koen.Holtman6y10

In the context of my problem statement, a PAL with high predictive accuracy is something that is in scope to consider. This does not mean that we should or must design a real PAL in this way.

An AGI that exceeds humans in its ability to predict human responses might be a useful tool, e.g. to a politician who wants to make proposals to resolve long-lasting human conflicts. But definitely this is a tool that could also be used for less ethical things.

[-]FactorialCode6y60

If the goal is just for PAL to get the coffee to Dave as fast as possible, then PAL is operating correctly and there is no problem.

If Dave sees PAL take route B, and then does nothing, then in reality, PAL chooses route A. This is the intended behavior. If Dave sees PAL take route B, and tries to help PAL, then PAL choses path B. As a result, both in simulation and reality, Dave helps PAL. PAL gets the coffee faster than if PAL had chosen route A. PAL succesfully maximised it's utility function. Again, intended behaviour. The mechanism that leads Dave to help PAL in the second senario is irrelevant.

I think this looks bad for two reasons. First, we might assign lower utility to worlds where Dave goes out of his way to help PAL, if you include that term in PAL's utility function, the problem dissappears. Second, Dave's reasoning is flawed. In reality, he will wind up helping PAL because he thinks that he's in a simulation, even though he's not. We might assign lower utility to worlds where Dave is wrong.

[-]Charlie Steiner6y30

Right, this is a sort of incentive for deception. The deception is working fine at getting the objective; we want to ultimately solve this problem by changing the objective function so that it properly captures our dislike of deception (or of having to get up and carry a robot, or whatever), not by changing the search process to try to get it to not consider deceptive hypotheses.

[-]Shmi6y30

Note that for a simulation to be useful, it has to be as faithful as possible, so SimDave would not be given any clues that he is simulated.

[-]Wei Dai6y80

You missed a crucial point of the post, which is that when the AI does a simulation to consider the consequences of some action that the AI normally wouldn't do, observing that action is itself a clue that SimDave is being simulated. Here's the relevant part from the OP:

So Dave has just asked PAL to get him a cup of coffee. Dave is used to seeing PAL take route A to the coffee machine, and is initially puzzled because PAL is driving along route B. But then Dave has an epiphany. Dave knows with very high certainty that no PAL computer has ever made a mistake, so he can conclude with equally high certainty that he is no longer Dave. He is [Dave], a simulated version of Dave created inside PAL while it is computing the utility of taking route B.

[-]Donald Hobson6y10

Both ways of simulating counterfactuals remove some info, either you change [Dave]'s prediction, or you stop it being correct. In the real world, the robot knows that Dave will correctly predict it, but it's counterfactuals contain scenarios where [Dave] is wrong.

Suppose there were two identical robots, and the paths A and B were only wide enough for one robot. So 1A1B>2A>2B in all robots preference orderings. The robots predict that the other robot will take path Q, and so decides to take path R=/=Q. ( {Q,R}={A,B} ) The robots oscillate their decisions through the levels of simulation until the approximations become too crude. Both robots then take the same path, with the path they take depending on whether they had compute for an odd or even no. of simulation layers. They will do this even if they have a way to distinguish themselves, like flipping coins until one gets heads and the other doesn't. (Assuming an epsilon cost to this method)

In general, CDT doesn't work when being predicted.

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

15

[ Question ]

The Simulation Epiphany Problem

15

15

2 Answers sorted by
top scoring

Nov 01, 2019

Nov 01, 2019

15

[ Question ]

The Simulation Epiphany Problem

15

15

2 Answers sorted by top scoring

Nov 01, 2019

Nov 01, 2019

2 Answers sorted by
top scoring