Counterfactual Reprogramming Decision Theory

I brought it up on the decision theory list back in 2010, Drescher and Slepnev commented. My current thoughts on the paper are as follows.

The decision theory Goertzel considers assumes that there is available an explicit specification of a dependence of the (model of the) world on agent's program. Unlike UDT/ADT, this decision theory doesn't address the problem of discovering the dependence of the world on its actions (or its program/strategy), it takes such dependence as a part of its problem statement.

It seems (partially based on examples, the description is not completely clear to me) that the dependence that's being considered takes agent's hypothetically-decided-upon program as an input, puts it at the beginning of time in agent's implementation, and allows its propagation in the world (model), including as knowledge of other agents about the agent. It's unclear what should happen in thought experiments that assume that there is an accidentally identical agent in the world (our agent's program won't have a causal effect on the accidental copy, so it seems that the "copy" won't be taken by the dependence to be identical with an agent that's counterfactually assumed to have various possible programs, unlike in TDT, where the agent is fixed and the dependence should take such similarities into account).

It's unclear whether the actual decision procedure that chooses a program is part of the world model. In the discussion of Newcomb's problem, it seems to be stated that it is, but then it's unclear in what sense is the agent's program being hypothetically replaced. Since the program is being chosen based on the consequences of the agent following it from the beginning of time without exception, the agent may be able to make past and future precommitments (like in TDT, and unlike in CDT). Unlike UDT (but like in CDT and TDT), it's stipulated that the consequences of a decision are evaluated based only on the possible worlds that are consistent with current agent's state of knowledge.

It's also stipulated that the possible worlds are to be weighted by current probability distribution, and the sense of this sameness of probability is unclear, since current possibilities, as well as their probabilities, typically depend on which program the agent is hypothetically following. In CDT and EDT, the decision is taken at present, so this doesn't pose a problem, and they rely on the dependence of probability distribution in the future on the hypothetical decision, in the respective different senses. But here, the decision influences the past, so the present will differ depending on the decision.

(The agent doesn't just take its action based on the chosen program's judgment in the current situation, but is replaced by the chosen program at the time of evaluation. This seems somewhat strange, since it's unclear how would such an agent remain unreplaced by a chosen program for any span of time, so as to take advantage of optimizing over possible past precommitments. This condition seems to lead to the agent immediately replacing itself by a program that would be chosen at the very start, without waiting for conditions of thought experiments to obtain, but this possibility is not discussed.)

The description of how the decision theory works on a list of thought experiments is rather informal, I'm not able to tell if it's reflecting a specific way of making decisions or follows less systematic common sense reasoning in each case (the description in case of the Prisoner's Dilemma is very short, the argument in Newcomb's problem is unclear to me).

The discussion of Counterfactual Mugging seems to dodge the main difficulty by assuming that optimization would take the counterfactual into account, which seems to contradict the stipulation that the agent would only evaluate its decision over the possible worlds consistent with its knowledge. From my understanding of its specification, Goertzel's agent would act as TDT, not giving up the money, as a result of only considering the possible worlds that are consistent with its current knowledge.

[-]Vladimir_Nesov13y300

[-]lukeprog13y20

Thanks for sharing your comments!

[-][anonymous]13y00

(These are new comments, in 2010 I didn't comment.)

[This comment is no longer endorsed by its author]Reply

[-]drethelin13y00

Am I missing something? Why does simulating an entity more powerful than yourself not break down?

[-]Vladimir_Nesov13y20

Which powerful entity simulated by whom are you referring to?

[-]drethelin13y10

basic idea is that one should ask, at each point in time: What would I do if the reprogrammable parts of my brain were reprogrammed by a superintelligent Master Programmer with the goal of supplying me with a program that would maximize my utility averaged over possible worlds?

If you knew how this entity would program you, don't you already know the correct thing to do?

[-]Vladimir_Nesov13y40

After you figure that out, yes you would know what the correct thing to do is, so it's a good subgoal if doable. High optimality of an outcome doesn't necessarily imply impossibility of its achievement, so the argument from "too good to be true" is not very reliable.

In this particular case (as with things like UDT), the express aim is to more rigorously understand what optimality is, which is a problem that isn't concerned with the practical difficulties of actually achieving (getting closer to) it.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

18

Counterfactual Reprogramming Decision Theory

18

18