Surprisingly, Ben Goertzel's (2010) Counterfactual Reprogramming Decision Theory (CRDT) has not been discussed even once on Less Wrong, so I present this discussion post as an opportunity to do so.

Here is Goertzel's abstract:

A novel variant of decision theory is presented. The basic idea is that one should ask, at each point in time: What would I do if the reprogrammable parts of my brain were reprogrammed by a superintelligent Master Programmer with the goal of supplying me with a program that would maximize my utility averaged over possible worlds? Problems such as the Prisoner's Dilemma, the value of voting, Newcomb's Problem and the Psychopath Button are reviewed from this perspective and shown to be addressed in a satisfactory way.

His first footnote acknowledges some debt to Less Wrong and to Wei Dai in particular:

Some interesting, albeit often confusing, discussion on CDT and hypothetical replacement decision theories may be found online at the Less Wrong blog... The decision algorithm presented by Dai on that blog page bears some resemblance to CRDT, but due to the rough and very informal exposition there, I'm not sure what is the precise relationship.

He also discusses Vladimir Nesov's counterfactual mugging scenario, and attempts to work toward a formalization of CRDT by making use of AIXI and some other stuff.


8 comments, sorted by Click to highlight new comments since: Today at 3:18 AM
New Comment

I brought it up on the decision theory list back in 2010, Drescher and Slepnev commented. My current thoughts on the paper are as follows.

The decision theory Goertzel considers assumes that there is available an explicit specification of a dependence of the (model of the) world on agent's program. Unlike UDT/ADT, this decision theory doesn't address the problem of discovering the dependence of the world on its actions (or its program/strategy), it takes such dependence as a part of its problem statement.

It seems (partially based on examples, the description is not completely clear to me) that the dependence that's being considered takes agent's hypothetically-decided-upon program as an input, puts it at the beginning of time in agent's implementation, and allows its propagation in the world (model), including as knowledge of other agents about the agent. It's unclear what should happen in thought experiments that assume that there is an accidentally identical agent in the world (our agent's program won't have a causal effect on the accidental copy, so it seems that the "copy" won't be taken by the dependence to be identical with an agent that's counterfactually assumed to have various possible programs, unlike in TDT, where the agent is fixed and the dependence should take such similarities into account).

It's unclear whether the actual decision procedure that chooses a program is part of the world model. In the discussion of Newcomb's problem, it seems to be stated that it is, but then it's unclear in what sense is the agent's program being hypothetically replaced. Since the program is being chosen based on the consequences of the agent following it from the beginning of time without exception, the agent may be able to make past and future precommitments (like in TDT, and unlike in CDT). Unlike UDT (but like in CDT and TDT), it's stipulated that the consequences of a decision are evaluated based only on the possible worlds that are consistent with current agent's state of knowledge.

It's also stipulated that the possible worlds are to be weighted by current probability distribution, and the sense of this sameness of probability is unclear, since current possibilities, as well as their probabilities, typically depend on which program the agent is hypothetically following. In CDT and EDT, the decision is taken at present, so this doesn't pose a problem, and they rely on the dependence of probability distribution in the future on the hypothetical decision, in the respective different senses. But here, the decision influences the past, so the present will differ depending on the decision.

(The agent doesn't just take its action based on the chosen program's judgment in the current situation, but is replaced by the chosen program at the time of evaluation. This seems somewhat strange, since it's unclear how would such an agent remain unreplaced by a chosen program for any span of time, so as to take advantage of optimizing over possible past precommitments. This condition seems to lead to the agent immediately replacing itself by a program that would be chosen at the very start, without waiting for conditions of thought experiments to obtain, but this possibility is not discussed.)

The description of how the decision theory works on a list of thought experiments is rather informal, I'm not able to tell if it's reflecting a specific way of making decisions or follows less systematic common sense reasoning in each case (the description in case of the Prisoner's Dilemma is very short, the argument in Newcomb's problem is unclear to me).

The discussion of Counterfactual Mugging seems to dodge the main difficulty by assuming that optimization would take the counterfactual into account, which seems to contradict the stipulation that the agent would only evaluate its decision over the possible worlds consistent with its knowledge. From my understanding of its specification, Goertzel's agent would act as TDT, not giving up the money, as a result of only considering the possible worlds that are consistent with its current knowledge.

Thanks for sharing your comments!

[-][anonymous]10y 0

(These are new comments, in 2010 I didn't comment.)

[This comment is no longer endorsed by its author]Reply

Sounds like a rigorous (non-fantasy) version of "Do what I think God wants me to do."

Am I missing something? Why does simulating an entity more powerful than yourself not break down?

Which powerful entity simulated by whom are you referring to?

basic idea is that one should ask, at each point in time: What would I do if the reprogrammable parts of my brain were reprogrammed by a superintelligent Master Programmer with the goal of supplying me with a program that would maximize my utility averaged over possible worlds?

If you knew how this entity would program you, don't you already know the correct thing to do?

After you figure that out, yes you would know what the correct thing to do is, so it's a good subgoal if doable. High optimality of an outcome doesn't necessarily imply impossibility of its achievement, so the argument from "too good to be true" is not very reliable.

In this particular case (as with things like UDT), the express aim is to more rigorously understand what optimality is, which is a problem that isn't concerned with the practical difficulties of actually achieving (getting closer to) it.