Gary_Drescher — LessWrong

An approach to the Agent Simulates Predictor problem

For the simulation-output variant of ASP, let's say the agent's possible actions/outputs consist of all possible simulations Si (up to some specified length), concatenated with "one box" or "two boxes". To prove that any given action has utility greater than zero, the agent must prove that the associated simulation of the predictor is correct. Where does your algorithm have an opportunity to commit to one-boxing before completing the simulation, if it's not yet aware that any of its available actions has nonzero utility? (Or would that commitment require a further modification to the algorithm?)

For the simulation-as-key variant of ASP, what principle would instruct a (modified) UDT algorithm to redact some of the inferences it has already derived?

An approach to the Agent Simulates Predictor problem

Gary_Drescher10yΩ110

Suppose we amend ASP to require the agent to output a full simulation of the predictor before saying "one box" or "two boxes" (or else the agent gets no payoff at all). Would that defeat UDT variants that depend on stopping the agent before it overthinks the problem?

(Or instead of requiring the the agent to output the simulation, we could use the entire simulation, in some canonical form, as a cryptographic key to unlock an encrypted description of the problem itself. Prior to decrypting the description, the agent doesn't even know what the rules are; the agent is told in advance only that that decryption will reveal the rules.)

Open Thread, April 27-May 4, 2014

Gary_Drescher12y60

According to information his family graciously posted to his blog, the cause of death was occlusive coronary artery disease with cardiomegaly.

http://blog.sethroberts.net/

Reflection in Probabilistic Logic

Gary_Drescher13y20

It occurs to me that my references above to "coherence" should be replaced by "coherence & P(T)=1 & reflective consistency". That is, there exists (if I understand correctly) a P that has all three properties, and that assigns the probabilities listed above. Therefore, those three properties would not suffice to characterize a suitable P for a UDT agent. (Not that anyone has claimed otherwise.)

Reflection in Probabilistic Logic

Gary_Drescher13y190

Wow, this is great work--congratulations! If it pans out, it bridges a really fundamental gap.

I'm still digesting the idea, and perhaps I'm jumping the gun here, but I'm trying to envision a UDT (or TDT) agent using the sense of subjective probability you define. It seems to me that an agent can get into trouble even if its subjective probability meets the coherence criterion. If that's right, some additional criterion would have to be required. (Maybe that's what you already intend? Or maybe the following is just muddled.)

Let's try invoking a coherent P in the case of a simple decision problem for a UDT agent. First, define G <--> P("G") < 0.1. Then consider the 5&10 problem:

If the agent chooses A, payoff is 10 if ~G, 0 if G.
If the agent chooses B, payoff is 5.

And suppose the agent can prove the foregoing. Then unless I'm mistaken, there's a coherent P with the following assignments:

P(G) = 0.1

P(Agent()=A) = 0

P(Agent()=B) = 1

P(G | Agent()=B) = P(G) = 0.1

And P assigns 1 to each of the following:

P("Agent()=A") < epsilon

P("Agent()=B") > 1-epsilon

P("G & Agent()=B") / P("Agent()=B") = 0.1 +- epsilon

P("G & Agent()=A") / P("Agent()=A") > 0.5

The last inequality is consistent with the agent indeed choosing B, because the postulated conditional probability of G makes the expected payoff given A less than the payoff given B.

Is that P actually incoherent for reasons I'm overlooking? If not, then we'd need something beyond coherence to tell us which P a UDT agent should use, correct?

(edit: formatting)

The Cognitive Science of Rationality

Gary_Drescher14y250

If John's physician prescribed a burdensome treatment because of a test whose false-positive rate is 99.9999%, John needs a lawyer rather than a statistician. :)

Example decision theory problem: "Agent simulates predictor"

Gary_Drescher15y70

In April 2010 Gary Drescher proposed the "Agent simulates predictor" problem, or ASP, that shows how agents with lots of computational power sometimes fare worse than agents with limited resources.

Just to give due credit: Wei Dai and others had already discussed Prisoner's Dilemma scenarios that exhibit a similar problem, which I then distilled into the ASP problem.

Discussion for Eliezer Yudkowsky's paper: Timeless Decision Theory

Gary_Drescher15y00

and for an illuminating reason - the algorithm is only run with one set of information

That's not essential, though (see the dual-simulation variant in Good and Real).

Discussion for Eliezer Yudkowsky's paper: Timeless Decision Theory

Gary_Drescher15y60

Just to clarify, I think your analysis here doesn't apply to the transparent-boxes version that I presented in Good and Real. There, the predictor's task is not necessarily to predict what the agent does for real, but rather to predict what the agent would do in the event that the agent sees $1M in the box. (That is, the predictor simulates what--according to physics--the agent's configuration would do, if presented with the $1M environment; or equivalently, what the agent's 'source code' returns if called with the $1M argument.)

If the agent would one-box if $1M is in the box, but the predictor leaves the box empty, then the predictor has not predicted correctly, even if the agent (correctly) two-boxes upon seeing the empty box.

Another attempt to explain UDT

Gary_Drescher15y30

2) "Agent simulates predictor"

This basically says that the predictor is a rock, doesn't depend on agent's decision,

True, it doesn't "depend" on the agent's decision in the specific sense of "dependency" defined by currently-formulated UDT. The question (as with any proposed DT) is whether that's in fact the right sense of "dependency" (between action and utility) to use for making decisions. Maybe it is, but the fact that UDT itself says so is insufficient reason to agree.

[EDIT: fixed typo]

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments