Decision Theory Paradox: Answer Key

orthonormal

As promised, I'm posting the answers to the exercises I wrote in the decision theory post.

Exercise 1: Prove that if the population consists of TDT agents and DefectBots, then a TDT agent will cooperate precisely when at least one of the other agents is also TDT. (Difficulty: 1 star.)

With the utility function specified (i.e. the shortsighted one that only cares about immediate children), the TDT agent's decision can be deduced from simple superrationality concerns; that is, if any other TDT agents are present with analogous utility functions, then their decisions will be identical. Thus if a TDT faces off against 2 DefectBots, it chooses between 0 children (for C) and 2 children (for D), and thus it defects. If it faces off against a TDT and a DefectBot, it chooses between 3 children (for C) and 2 children (for D), so it cooperates. And if it faces off against two other TDTs, it chooses between 6 children (for C) and 2 children (for D), so it cooperates. (EDIT: It's actually not that simple after all- see Douglas Knight's comment and the ensuing discussion.)

Exercise 2: Prove that if a very large population starts with equal numbers of TDTs and DefectBots, then the expected population growth in TDTs and DefectBots is practically equal. (If Omega samples with replacement– assuming that the agents don't care about their exact copy's children– then the expected population growth is precisely equal.) (Difficulty: 2 stars.)

For the sake of simplicity, we'll consider only the parenthetical case. (The interested reader can see that if sampled without replacement, the figures will differ by a factor on the order of one divided by the population.) There are four cases to consider when Omega picks a trio: it includes 0, 1, 2 or 3 TDT agents, with probability 1/8, 3/8, 3/8 and 1/8 respectively. The first case results in 6 DefectBots being returned; the second results in 4 DefectBots and 2 TDTs; the third results in 8 DefectBots and 6 TDTs; the last results in 18 TDTs. Weighting and adding the cases, each "side" has expected population growth of 5.25 agents in that round.

Exercise 3: Prove that if the initial population consists of TDTs and DefectBots, then the ratio of the two will (with probability 1) tend to 1 over time. (Difficulty: 3 stars.)

This is a bit tricky; note that the expected population growth is higher for the more populous side! However, the expected fertility of each agent is higher on the less populous side, and thus its share grows proportionally. (Think of the demographics of a small minority with high fertility- while they won't have as many total children as the rest of the population, their proportion of the population will increase.)

For very large populations, we can model the fraction of DefectBots as a differential equation, and we will show that this differential equation has a single stable attractive equilibrium at 1/2. Let N be the total population at a given moment, and x (in (0,1)) the fraction of that population consisting of DefectBots. Then we let P(x)=6x³+12x²(1-x)+24x(1-x)² and Q(x)=6x²(1-x)+18x(1-x)²+18(1-x)³ denote the expected population growth for DefectBots and TDTs, respectively (these numbers are arrived at in the same way we calculated Exercise 2 in the special case x=1/2), and note that the "difference quotient" between the old and new fractions of the population comes out to ((1-x)P(x)-xQ(x))/(N+P(x)+Q(x)). If we consider this to be x' and study the differential equation (with an extra parameter N for the current population), we see that indeed, x has a stable equilibrium when the expected fertilities are equal (that is, P(x)/x = Q(x)/(1-x)) at x=1/2, and that x steadily increases for x<1/2 and steadily decreases for x>1/2.

I'll admit that this isn't a rigorous proof, but it's the correct heuristic calculation; increased formalism only makes it more difficult to communicate.

Exercise 4: If the initial population consists of CliqueBots, DefectBots and TDTs in any proportion, then the ratio of both others to CliqueBots approaches 0 (with probability 1). (Difficulty: 4 stars.)

We can model this as a differential equation in the two-dimensional region {x+y+z=1: 0<x,y,z<1}, and as in Exercise 3 a stable equilibrium is a point at which the three expected fertilities are equal. At this point, it's worthwhile to note that you can calculate expected fertilities more simply by, for a given agent, counting only its individual fertility given the possible other two partners in the PD. If we let x, y and z denote the proportions of DefectBots, TDTs and CliqueBots, respectively, and let P, Q, and R denote their respective fertilities as functions of x,y and z, then we get

P(x,y,z)=2x²+4xy+8y²+4xz+4yz+2z²
Q(x,y,z)=2x²+6xy+6y²+4xz+6yz+2z²
R(x,y,z)=2x²+4xy+8y²+4xz+4yz+6z²

It is simple to show that if we set x=0, then R>Q for all {(y,z):y+z=1,0<y,z<1}; that is, CliqueBots beat TDTs completely when DefectBots are not present. It is even simpler to show that they beat DefectBots entirely when TDTs are not present. It is a bit more complicated when all three are together: there is a tiny unstable region near (3/4,1/4,0) where Q>R, but the proportions drift out of this region as y gains against x, and they do not return; the stable equilibrium is at (0,0,1) as claimed.

Finally,

Problem: The setup looks perfectly fair for TDT agents. So why do they lose? (Difficulty: 2+3i stars.)

As explained in the consequentialism post, we've handicapped TDT by giving our agents shortsighted utility functions. If they instead care about distant descendants (let's say that Omega is only running the experiment finitely many times, either for a fixed number of tournaments or until the population reaches a certain number), then (unless it's known that the experiment is soon to end) the gains in population made by dominating the demographics will overwhelm the immediate gains of letting a DefectBot or CliqueBot take advantage of the agent. Growing almost 6-fold each time one's selected (or occupying a larger fraction of the fixed final population) will justify the correct decision, essentially via the more complicated calculations we've done above.

As explained in the consequentialism post, we've handicapped TDT by giving our agents shortsighted utility functions.

This is a perfect illustration of the 'consequentialism isn't nearsighted' moral but shortsightedness just isn't a complete answer here. Sure, telling the agents to MAX(number of descendants in the long term) is sufficient to give them a combination of input->behaviour pairs that will make them win the metagame but it isn't their only mistake and giving it exclusive emphasis distracts somewhat from the general problem.

From the perspective of the meta-game the utility function given to the TDTs is not just shortsighted, it is also naive. That is, when we come to "Problem: " we are not really looking at the absolute number of descendants the TDTs had. We're looking at the ratio TDTs : !TDTs. Given two different outcomes, one in which agents dominated the population and produced X offspring and another which produced X+1 offspring but ended up a minority then the reasoning we have done here with Exercises 2 to 4 and the Problem would call the X+1 outcome the 'loser' even though it had more success and even if that may well be the best it could possibly do (according to MAX(descendants)) in certain instances of these 'chaotic' situations!

The not-naive utility function is simply "maximise the proportion of copies of yourself in the next generation".

It so happens that in the specific meta-game we're considering we only have to give the TDTs a utility function that is either not shortsighted or not naive. They will both happen to win this specific overall meta game because they take the same actions. But there are simple variants of the game that require that naivety and shortsightedness are both eliminated, neither hack being sufficient alone. We should focus on the underlying problem: Lost Purpose - any difference between the utility function given to the agent and what it actually means to 'win'.

Again, I don't think Exercise 1 is that simple. Also, if you return the agent to the pool after playing the game ("sample with replacement") then everyone has infinitely many children, so it is not enough to say that they maximize children.

See, I don't think I underspecified. Omega doesn't do something to every agent every time; in round N, Omega picks three at random and plays the game with them. Then in round N+1, it picks three at random (from the pool including the children of round N) and plays the game with them, et cetera.

I agree that if you allow the agents (and not just their children) back into the game, the conditions for folly aren't met. The point is that you really need a delicately defined setup for TDT to be completely shortsighted, even with a shortsighted utility function.

OK, so now that you've pinned it down, my main complaint applies: the distribution of partners that the agent will eventually have for the game is a function of the agent's strategy. You can't treat them separately and conclude simply that it cooperate against 1 CDT and 1 TDT. Thus, in doing exercise 1, choosing between strategies, the TDT must do exercises 2-4 and more to determine which strategy has the best expected value. And, since we're now talking about expected value, the calculation must involve the utility as a function of the number of children. You can set it to be the number of children, but you have to use that somewhere, and not just monotonicity.

the calculation must involve the utility function of the children.

Why?

That was badly phrased. I meant: the calculation must involve the utility function, the function that converts the number of children into utiles. (original corrected)

Huh- you may be right. Let me ponder this when I'm less tired.

So if I understand you right, even with the short-sighted utility function, there's an echo of Parfit's Hitchhiker here: what TDT decides on these problems actually controls which situations the agent finds itself in, and thus its possible payoff matrices. Since TDT is supposed to get Parfit's Hitchhiker right, therefore, it should give the long-term-winning answer even in this case.

Well, there are some more caveats (it's not clear that agents in the first round would do this, since TDT doesn't win the Counterfactual Mugging, and if agents in round N don't think that way, then what about round N+1...), but you're right that the simple calculation doesn't actually suffice. Drat, and thanks.

Since TDT is supposed to get Parfit's Hitchhiker right, therefore, it should give the long-term-winning answer even in this case.

Its goal is still different though (if we restore some missing pieces): it wants to game the probabilities of encountering certain opponents so that a single round that contains TDT delivers the most reward. It just so happens that getting rid of DefectBots serves this purpose, but if the opponents were CooperateBots, it looks like TDTs would drive themselves to extinction (or farm the opponents) to maximize the number of expected cooperating opponents that they can defect against (for each instance where there's a TDT agent in the round). (I didn't check this example carefully, so could be wrong, but the principle it exemplifies seems to hold.)

That's a seriously sick idea, but there doesn't seem to be a way to both set up such a favorable matchup and exploit it- is there?

LESSWRONG
LW

LESSWRONG
LW

10

Decision Theory Paradox: Answer Key

10

10