What is the point of this thought experiment? 

To demonstrate that thought experiments involving mind reading agents  (such as the AI in "Newcomb's Paradox" or Paul Ekman) can create scenarios in which any decision theory will be worse off than another. If we allow these agents to exist then no decision theory will be stricly better than anther.

The Paradox

Let's say we have two competing decision theories ADT and BDT, these can be any two decision theories you like, so long as they are different. Here, different is meant that there exists a scenario where ADT leads to a different decision than BDT.

Consider a mind-reading agent like Paul Ekman or a super intelligent AI that we will call Dick Kick'em. Like in the thought experiments mentioned above, Dick Kick'em can read the minds of other agents, know their beliefs, and future actions with near certainty.

Now imagine you are wandering the streets and you come across Dick Kick'em. He says "I am going to read you mind and if you believe in ADT I will leave you alone, but if you believe in BDT I will kick you in the dick". Let's consider the scenarios:

You believe in ADT: You get left alone, net zero utility.

You believe in BDT: You get kicked in the dick, negative utility.

Therefore, ADT is a superior decision theory compared to BDT.


Since ADT and BDT are arbitrary decision theories, we can use this scenario to show that CDT is better than UDT or even TDT. Either that, or the idea of mind reading agents is flawed. Thought experiments like "Newcomb's Paradox" essentially boil down to the same thing as the "Dick Kick'em Paradox" except instead of having the AI kick you directly there is a small game that makes it look like there's actual decision theory involved.

New to LessWrong?

New Comment
21 comments, sorted by Click to highlight new comments since: Today at 6:47 PM

The setup violates a fairness condition that has been talked about previously.

From https://arxiv.org/pdf/1710.05060.pdf, section 9:

We grant that it is possible to punish agents for using a specific decision proce- dure, or to design one decision problem that punishes an agent for rational behavior in a different decision problem. In those cases, no decision theory is safe. CDT per- forms worse that FDT in the decision problem where agents are punished for using CDT, but that hardly tells us which theory is better for making decisions. [...]

Yet FDT does appear to be superior to CDT and EDT in all dilemmas where the agent’s beliefs are accurate and the outcome depends only on the agent’s behavior in the dilemma at hand. Informally, we call these sorts of problems “fair problems.” By this standard, Newcomb’s problem is fair; Newcomb’s predictor punishes and rewards agents only based on their actions. [...]

There is no perfect decision theory for all possible scenarios, but there may be a general-purpose decision theory that matches or outperforms all rivals in fair dilem- mas, if a satisfactory notion of “fairness” can be formalized

Yet FDT does appear to be superior to CDT and EDT in all dilemmas where the agent’s beliefs are accurate and the outcome depends only on the agent’s behavior in the dilemma at hand


This is not true in cases even where mind-reading agents do not exist.

Consider the desert dilemma again with Paul Ekman, except he is actually not capable of reading people's mind. Also assume your goal here is to be selfish and gain as much utility for yourself as possible. You offer him $50 in exchange for him taking you out of the desert and to the nearest village, where you will be able to draw out the money and pay him. He can't read your mind but judges that the expected value is positive given most people in this scenario would be telling the truth. CDT says that you should simply not pay him when you reach the village, but FDT has you $50 short. In this real world scenario, that doesn't include magical mind-reading agents, CDT is about $50 up from FDT. 

The only times FDT wins against CDT is in strange mind-reading thought experiments that won't happen in the real world.

In your new dilemma, FDT does not say to pay the $50. It only says to pay when the driver's decision of whether or not to take you to the city depends on what you are planning to do when you get to the city. Which isn't true in your setup, since you assume the driver can't read faces.

The agent in this scenario doesn't necessarily know if the driver can read faces or not, in the original problem the agent isn't aware of this information. Surely if FDT advises you pay him on arrival in the face reading scenario, you would do the same in the non-face reading scenario since the agent can't tell them apart.

No, the whole premise of the face-reading scenario is that the agent can tell that his face is being read, and that's why he pays the money. If the agent can't tell whether his face is being read, then his correct action (under FDT) is to pay the money if and only if (probability of being read) times (utility of returning to civilization) is greater than (utility of the money). Now, if this condition holds but in fact the driver can't read faces, then FDT does pay the $50, but this is just because it got unlucky, and we shouldn't hold that against it.

Then you violate the accurate beliefs condition. (If the world is infact a random mixture in proportion which their beliefs track correctly, then fdt will do better when averaging over the mixture)

True beliefs doesn't mean omniscience. It is possible to have only true beliefs but still not know everything. In this case, the agent might not know if the driver can read minds but still have accurate beliefs otherwise.

In Newcomb's paradox, the predictor only cares about what you do when presented with the boxes. It doesn't care about whether that's because you use ADT, BDT, or anything else. Whereas Dick Kick'em has to actually look at your source code and base his decision on that. He might as well be deciding based on any arbitrary property of the code, like whether it uses tabs or spaces, rather than what decision theory it implements. (Actually, the tabs and spaces thing sounds more plausible! Determining what kind of decision theory a given piece of code implements seems like it could be equivalent to solving the halting problem.) Agents that care about what you'll do in various situations seem much more common and relevant than agents that arbitrarily reward you for being designed a certain way.

Newcomb's problem is analogous to many real world situations. (For example: I want to trade with a person, but not if they're going to rip me off. But as soon as I agree to trade with them, that grants them the opportunity to rip me off with no consequences. Thus, unless they're predictably not going to do that, I had better not trade with them.) It's not clear what real world situations (if any) Dick Kick'em is analogous to. Religious wars?

Why can't I use the strategy "pick one or two boxes in a way affected by what decision theories the predictor approves of"? Or more likely, something which is mathematically equivalent to doing that, but doesn't do an explicit comparison?

I believe that the AI does care about your beliefs, just not specific beliefs. The AI only cares about if your decision theory falls into the class of decision theories that will pick two boxes, and if it does then it punishes you. Sure, unlike Dick Kick'em the AI isn't looking for specific theories just any theory within a specific class, but it is still the same thing. The AI is punishing the agent by putting in less money based soley on your beliefs. In Newcomb's paradox, the AI scans your brain BEFORE you take any action whatsoever, the punishment cannot be based on your actions, the punishment from the AI is based only on your beliefs. This is exactly the same as the Dick Kick'em Paradox; Dick will punish you purely on your beliefs, not any action. The only difference is that in Newcomb's paradox you get to play a little game after the AI has punished you.

Eh, Omega only cares about your beliefs insofar as they affect your actions (past, present, or future, it's all just a different coordinate). I still think that seems way more natural and common than caring about beliefs in general.

Example: Agent A goes around making death threats, saying to people: "Give me $200 or I'm going to kill you." Agent B goes around handing out brochures that criticize the government. If the police arrest agent A, that's probably a reasonable decision. If the police arrest agent B, that's bad and authoritarian, since it goes against freedom of expression. This is true even though all either agent has done is say things. Agent A hasn't actually killed anyone yet. But the police still arrest agent A because they care about agent A's future actions.

"How dare you infer my future actions from what I merely say," cries agent A as they're being handcuffed, "you're arbitrarily punishing me for what I believe. This is a crass violation of my right to freedom of speech." The door of the police car slams shut and further commentary is inaudible.

Omega only cares about your beliefs insofar as they affect your actions 


So does Dick Kick'em, since he only cares about distinct decision theories that a particular agent believes in, and that in turn decides the agent's actions.

What if you believe in DKRUDT, the "Dick Kick'em rewards you" decision theory?

Seriously though, Newcomb's setup is not adversarial in the same way, the predictor rewards or punishes you for actions, not beliefs. Your internal reasoning does not matter, as long as you end up one-boxing you walk away with more money.

Seriously though, Newcomb's setup is not adversarial in the same way, the predictor rewards or punishes you for actions, not beliefs. 


This cannot be true becuase it would violate cause and effect. The predictor will decide to reward/punish you with the amount of money put in the boxes. This reward/punishment is done BEFORE any action is taken, and so it is based purely on the beliefs of the agent. If it were based on the actions of the agent, that would mean that the cause of the reward/punishment happens AFTER the decision was made, which violates cause and effect. The cause must come BEFORE.

Let me clarify what I said. Any decision theory or no decision theory at all that results in someone one-boxing is rewarded. Examples: Someone hates touching transparent boxes. Someone likes a mystery of an opaque box. Someone thinking that they don't deserve a guaranteed payout and hoping for an empty box. Someone who is a gambler. Etc. What matters is the outcome, not the thought process.

That just means the AI cares about a particular class of decision theories rather than a specific one like Dick Kick'em. I could re-run the same thought experiment but instead Dick Kick'em says:

"I am going to read you mind and if you believe in a decision theory that one-boxes in Newcomb's Paradox I will leave you alone, but if you believe in any other decision theory I will kick you in the dick"

In this variation, Dick Kick'em would be judging the agent based on the exact same criterea that the AI in Newcomb's problem is using. All I have done is remove the game afterwards but that is somewhat irrelevant since the AI doesn't judge you on your actions, just what you would do if you were in a Newcomb-type scenario.

"I am going to read you mind and if you believe in a decision theory that one-boxes in Newcomb's Paradox I will leave you alone, but if you believe in any other decision theory I will kick you in the dick"

Sure, that's possible. Assuming there are no Newcomb's predictors in that universe, but only DK, rational agents believe in two-boxing. I am lost as to how it is related to your original point.

 Either that, or the idea of mind reading agents is flawed.

We shouldn't conclude that, since to various degrees mindreading agents already happen in real life.

If we tighten our standard to "games where the mindreading agent is only allowed to predict actions you'd choose in the game, which is played with you already knowing about the mindreading agent", then many decision theories that are different in other situations might all choose to respond to "pick B or I'll kick you in the dick" by picking B.

Mindreading agents do happen in real life but they are often wrong and can be fooled. Most decision theories on this website don't entertain either of these possibilities. If we allow "fooling a predictor" as a possible action then the solution to Newcomb's problem is easy: simply fool the predictor and then take both boxes.

In Newcomb's scenario, an agent that believes they have a probability of 99.9% of being able to fool Omega should two-box. They're wrong and will only get $1000 instead of $1000000, but that's a cost of having wildly inaccurate beliefs about the world they're in, not a criticism of any particular decision theory.

Setting up a scenario in which the agent has true beliefs about the world isolates the effect of the decision theory for analysis, without mixing in a bunch of extraneous factors. Likewise for the fairness assumption that says that the payoff distribution is correlated only with the agents' strategies and not the process by which they arrive at those strategies.

Violating those assumptions does allow a broader range of scenarios, but doesn't appear to help in the evaluation of decision theories. It's already a difficult enough field of study without throwing in stuff like that.

To entertain that possibility, suppose you're X% confident that your best "fool the predictor into thinking I'll one-box, and then two-box" plan will work, and Y% confident that "actually do one-box, in a way the predictor can predict" plan will work. If X=Y or X>Y you've got no incentive to actually one-box, only to try to pretend you will, but above some threshold of belief the predictor might beat your deception it makes sense to actually be honest.