A key goal of Less Wrong's "advanced" decision theories (like TDT, UDT and ADT) is that they should out-perform standard decision theories (such as CDT) in contexts where another agent has access to the decider's code, or can otherwise predict the decider's behaviour. In particular, agents who run these theories will one-box on Newcomb's problem, and so generally make more money than agents which two-box. Slightly surprisingly, they may well continue to one-box even if the boxes are transparent, and even if the predictor Omega makes occasional errors (a problem due to Gary Drescher, which Eliezer has described as equivalent to "counterfactual mugging"). More generally, these agents behave like a CDT agent will wish it had pre-committed itself to behaving before being faced with the problem.
However, I've recently thought of a class of Omega problems where TDT (and related theories) appears to under-perform compared to CDT. Importantly, these are problems which are "fair" - at least as fair as the original Newcomb problem - because the reward is a function of the agent's actual choices in the problem (namely which box or boxes get picked) and independent of the method that the agent uses to choose, or of its choices on any other problems. This contrasts with clearly "unfair" problems like the following:
Discrimination: Omega presents the usual two boxes. Box A always contains $1000. Box B contains nothing if Omega detects that the agent is running TDT; otherwise it contains $1 million.
So what are some fair "problematic problems"?
Problem 1: Omega (who experience has shown is always truthful) presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT. I won't tell you what the agent decided, but I will tell you that if the agent two-boxed then I put nothing in Box B, whereas if the agent one-boxed then I put $1 million in Box B. Regardless of how the simulated agent decided, I put $1000 in Box A. Now please choose your box or boxes."
Analysis: Any agent who is themselves running TDT will reason as in the standard Newcomb problem. They'll prove that their decision is linked to the simulated agent's, so that if they two-box they'll only win $1000, whereas if they one-box they will win $1 million. So the agent will choose to one-box and win $1 million.
However, any CDT agent can just take both boxes and win $1001000. In fact, any other agent who is not running TDT (e.g. an EDT agent) will be able to re-construct the chain of logic and reason that the simulation one-boxed and so box B contains the $1 million. So any other agent can safely two-box as well.
Note that we can modify the contents of Box A so that it contains anything up to $1 million; the CDT agent (or EDT agent) can in principle win up to twice as much as the TDT agent.
Problem 2: Our ever-reliable Omega now presents ten boxes, numbered from 1 to 10, and announces the following. "Exactly one of these boxes contains $1 million; the others contain nothing. You must take exactly one box to win the money; if you try to take more than one, then you won't be allowed to keep any winnings. Before you entered the room, I ran multiple simulations of this problem as presented to an agent running TDT, and determined the box which the agent was least likely to take. If there were several such boxes tied for equal-lowest probability, then I just selected one of them, the one labelled with the smallest number. I then placed $1 million in the selected box. Please choose your box."
Analysis: A TDT agent will reason that whatever it does, it cannot have more than 10% chance of winning the $1 million. In fact, the TDT agent's best reply is to pick each box with equal probability; after Omega calculates this, it will place the $1 million under box number 1 and the TDT agent has exactly 10% chance of winning it.
But any non-TDT agent (e.g. CDT or EDT) can reason this through as well, and just pick box number 1, so winning $1 million. By increasing the number of boxes, we can ensure that TDT has arbitrarily low chance of winning, compared to CDT which always wins.
1. Have these or similar problems already been discovered by TDT (or UDT) theorists, and if so, is there a known solution? I had a search on Less Wrong but couldn't find anything obviously like them.
2. Is the analysis correct, or is there some subtle reason why a TDT (or UDT) agent would choose differently from described?
3. If a TDT agent believed (or had reason to believe) that Omega was going to present it with such problems, then wouldn't it want to self-modify to CDT? But this seems paradoxical, since the whole idea of a TDT agent is that it doesn't have to self-modify.
4. Might such problems show that there cannot be a single TDT algorithm (or family of provably-linked TDT algorithms) so that when Omega says it is simulating a TDT agent, it is quite ambiguous what it is doing? (This objection would go away if Omega revealed the source-code of its simulated agent, and the source-code of the choosing agent; each particular version of TDT would then be out-performed on a specific matching problem.)
5. Are these really "fair" problems? Is there some intelligible sense in which they are not fair, but Newcomb's problem is fair? It certainly looks like Omega may be "rewarding irrationality" (i.e. giving greater gains to someone who runs an inferior decision theory), but that's exactly the argument that CDT theorists use about Newcomb.
6. Finally, is it more likely that Omegas - or things like them - will present agents with Newcomb and Prisoner's Dilemma problems (on which TDT succeeds) rather than problematic problems (on which it fails)?
Edit: I tweaked the explanation of Box A's contents in Problem 1, since this was causing some confusion. The idea is that, as in the usual Newcomb problem, Box A always contains $1000. Note that Box B depends on what the simulated agent chooses; it doesn't depend on Omega predicting what the actual deciding agent chooses (so Omega doesn't put less money in any box just because it sees that the actual decider is running TDT).
You can construct a "counterexample" to any decision theory by writing a scenario in which it (or the decision theory you want to have win) is named explicitly. For example, consider Alphabetic Decision Theory, which writes a description of each of the options, then chooses whichever is first alphabetically. ADT is bad, but not so bad that you can't make it win: you could postulate an Omega which checks to see whether you're ADT, gives you $1000 if you are, and tortures you for a year if you aren't.
That's what's happening in Problem 1, except that it's a little bit hidden. There, you have an Omega which says: if you are TDT, I will make the content of these boxes depend on your choice in such a way that you can't have both; if you aren't TDT, I filled both boxes.
You can see that something funny has hapened by postulating TDT-prime, which is identical to TDT except that Omega doesn't recognize it as a duplicate (eg, it differs in some way that should be irrelevant). TDT-prime would two-box, and win.
Right, but this is exactly the insight of this post put another way. The possibility of an Omega that rewards eg ADT is discussed in Eliezer's TDT paper. He sets out an idea of a "fair" test, which evaluates only what you do and what you are predicted to do, not what you are. What's interesting about this is that this is a "fair" test by that definition, yet it acts like an unfair test.
Because it's a fair test, it doesn't matter whether Omega thinks TDT and TDT-prime are the same - what matters is whether TDT-prime thinks so.
I'm not sure the part about comparing source code is correct. TDT isn't supposed to search for exact copies of itself, it's supposed to search for parts of the world that are logically equivalent to itself.
I think we could generalise problem 2 to be problematic for any decision theory XDT:
There are 10 boxes, numbered 1 to 10. You may only take one. Omega has (several times) run a simulated XDT agent on this problem. It then put a prize in the box which it determined was least likely to be taken by such an agent - or, in the case of a tie, in the box with the lowest index.
If agent X follows XDT, it has at best a 10% chance of winning. Any sufficiently resourceful YDT agent, however, could run a simulated XDT agent themselves, and figure out what Omega's choice was without getting into an infinite loop.
Therefore, YDT performs better than XDT on this problem.
If I'm right, we may have shown the impossibility of a "best' decision theory, no matter how meta you get (in a close analogy to Godelian incompleteness). If I'm wrong, what have I missed?
You're right about problem 2 being a fully general counterargument, but your philosophical conclusion seems to be stopping too early. For example, can we define a class of "fair" problems that excludes problem 2?
Consider Problem 3: Omega presents you with two boxes, one of which contains $100, and says that it just ran a simulation of you in the present situation and put the money in the box the simulation didn't choose.
This is a standard diagonal construction, where the environment is set up so that you are punished for the actions you choose, and rewarded for those of don't choose, irrespective of the actions. This doesn't depend on the decision algorithm you're implementing. A possible escape strategy is to make yourself unpredictable to the environment. The difficulty would also go away if the thing being predicted wasn't you, but something else you could predict as well (like a different agent that doesn't simulate you).
The correct solution to this problem is to choose each box with equal probability; this problem is the reason why decision theories have to be non-deterministic. It comes up all the time in real life: I try and guess what safe combination you chose, try that combination, and if it works I take all your money. Or I try to guess what escape route you'll use and post all the guards there.
What's interesting about Problem 2 is that it makes what would be the normal game-theoretic strategy unstable by choosing deterministically where the probabilities are exactly equal.
My sense is that question 6 is a better question to ask than 5. That is, what's important isn't drawing some theoretical distinction between fair and unfair problems, but finding out what problems we and/or our agents will actually face. To the extent that we are ignorant of this now but may know more in the future when we are smarter and more powerful, it argues for not fixing a formal decision theory to determine our future decisions, but instead making sure that we and/or our agents can continue to reason about decision theory the same way we currently can (i.e., via philosophy).
BTW, general question about decision theory. There appears to have been an academic study of decision theory for over a century, and causal and evidential decision theory were set out in 1981. Newcomb's paradox was set out in 1969. Yet it seems as though no-one thought to explore the space beyond these two decision theories until Eliezer proposed TDT, and it seems as if there is a 100% disconnect between the community exploring new theories (which is centered around LW) and the academic decision theory community. This seems really, really odd - what's going on?
This is simply not true. Robert Nozick (who introduced Newcomb's problem to philosophers) compared/contrasted EDT and CDT at least as far back as 1993. Even back then, he noted their inadequacy on several decision-theoretic problems and proposed some alternatives.
I haven't read The Nature of Rationality in quite a long time, so I won't be of much help. For a very simple and short introduction to Nozick's work on decision theory, you should read this (PDF).
There were plenty of previous theories trying to go beyond CDT or EDT, they just weren't satisfactory.
Problem 2 reminds me strongly of playing GOPS.
For those who aren't familiar with it, here's a description of the game. Each player receives a complete suit of standard playing cards, ranked Ace low through King high. Another complete suit, the diamonds, is shuffled (or not, if you want a game of complete information) and put face down on the table; these diamonds have point values Ace=1 through King=13. In each trick, one diamond is flipped face-up. Each player then chooses one card from their own hand to bid for the face-up diamonds, and all bids are revealed simultaneously. Whoever bids highest wins the face-up diamonds, but if there is a tie for the highest bid (even when other players did not tie), then no one wins them and they remain on the table to be won along with the next trick. All bids are discarded after every trick.
Especially when the King comes up early, you can see everyone looking at each other trying to figure out how many levels deep to evaluate "What will the other players do?".
(1) Play my King to be likely to win. (2) Everyone else is likely to do (1) also, which will waste their Kings. So instead play low while they throw away their Kings. (... (read more)
The problems look like a kind of an anti-Prisoner's Dilemma. An agent plays against an opponent, and gets a reward iff they played differently. Then any agent playing against itself is screwed.
The more I think about it, the more interesting these problems get! Problem 1 seems to re-introduce all the issues that CDT has on Newcomb's Problem, but for TDT. I first thought to introduce the ability to 'break' with past selves, but that doesn't actually help with the simulation problem.
It did lead to a cute observation, though. Given that TDT cares about all sufficiently accurate simulations of itself, it's actually winning.
It doesn't seem very relevant, but I think if we explored Richa... (read more)
This needs some serious mathematics underneath it. Omega is supposed to run a simulation of how an agent of a certain sort handled a certain problem, the result of that simulation being a part of the problem itself. I don't think it's possible to tell, just from these English words, that there is a solution to this fixed-point formulation. And TDT itself hasn't been formalised, although I assume there are people (Eliezer? Marcello? Wei Dai?) working on that.
Cf. the construction of Gödel sentences: you can't just assume that a proof-system can talk about itself, you have to explicitly construct a way for it to talk about itself and show precisely what "talking about itself" means, before you can do all the cool stuff about undecidable sentences, Löb's theorem, and so on.
This seems well-specified to me: Since the agent is not told its own output in advance, it is possible to run the "simulation" and the "real version" in finite time. If you hand me a computer program that is the agent, I will hand you a computer program that is Omega and the environment.
But TDT already has this problem - TDT is all about finding a fixed point decision.
I think it's right to say that these aren't really "fair" problems, but they are unfair in a very interesting new way that Eliezer's definition of fairness doesn't cover, and it's not at all clear that it's possible to come up with a nice new definition that avoids this class of problem. They remind me of "Lucas cannot consistently assert this sentence".
If he's always truthful, then he didn't lie to the simulation either and this means that he did infinitely many simulations before that. So assume he says "Either before you entered the room I ran a simulation of this problem as presented to an agent running TDT, or you are such a simulation yourself and I'm going to present t... (read more)
Thanks for the post! Your problems look a little similar to Wei's 2TDT-1CDT, but much simpler. Not sure about the other decision theory folks, but I'm quite puzzled by these problems and don't see any good answer yet.
Can someone answer the following: Say someone implemented an AGI using CDT. What exactly would go wrong that a better decision theory would fix?
There's a different version of these problems for each decision theory, depending on what Omega simulates. For CDT, all agents two-box and all agents get $1000. However, on problem 2, it seems like CDT doesn't have a well-defined decision at all; the effort to work out what Omega's simulator will say won't terminate.
(I'm spamming this post with comments - sorry!)
Someone may already have mentioned this, but doesn't the fact that these scenarios include self-referencing components bring Goedel's Incompleteness Theorem into play somehow? I.e. As soon as we let decision theories become self-referencing, it is impossible for a "best" decision theory to exist at all.
Intuitively this doesn't feel like a 'fair' problem. A UDT agent would ace the TDT formulation and vice versa. Any TDT agent that found a way of distinguishing between 'themselves' and Omega's TDT agent would also ace the problem. It feels like an acausal version of something like:
"I get agents A and B to choose one or two boxes. I then determine the contents of the boxes based on my best guess of A's choice. Surprisingly, B succeeds much better than A at this."
Still an intriguing problem, though.
Problems 1 and 2 both look - to me - like fancy versions of the Discrimination problem. edit: I am much less sure of this. That is, Omega changes the world based on whether the agent implements TDT. This bit I am still sure of, but it might be the case that TDT can overcome this anyway.
Discrimination problem: Money Omega puts in room if you're TDT = $1,000. Money Omega puts in room if you're not = $1,001,000.
Problem 1: Money Omega puts in room if you're TDT = $1,000 or $1,001,000. Edit: made a mistake. The error in this problem may be subtler than I first... (read more)
I think we need a 'non-problematic problems for CDT' thread.
For example, it is not problematic for CDT-based robot controller to have the control values in the action A represent multiple servos in it's world model, as if you wired multiple robot arms to 1 controller in parallel. You may want to do this if you want the robot arms move in unison and pass along the balls in the real world imitation of http://blueballmachine2.ytmnd.com/
It is likewise not problematic if you ran out of wire and decided to make the '1 controller' be physically 2 controllers run... (read more)
Let's say that TDT agents can be divided into two categories, TDT-A and TDT-B, based on a single random bit added to their source code in advance. Then TDT-A can take the strategy of always picking the first box in Problem 2, and TDT-B can always pick the second box.
Now, if you're a TDT agent being offered the problem; with the aforementioned strategy, there's a 50% chance that the simulated agent is different than you, netting you $1 million. This also narrows down the advantage of the CDT agent - now they only have a 50% chance of winning the money, which is equal to yours.
These questions seem decidedly UNfair to me.
No, they don't depend on the agent's decision-making algorithm; just on another agent's specific decision-making algorithm skewing results against an agent with an identical algorithm and letting all others reap the benefits of an otherwise non-advantageous situation.
So, a couple of things:
While I have not mathematically formulated this, I suspect that absolutely any decision theory can have a similar scenario constructed for it, using another agent / simulation with that specific decision theory as the basis f
Interaction of this simulated TDT and you is so complicated I don't think many of commenters here actually did the math to see how should they expect the simulated TDT agent to react in these situations. I know I didn't. I tried, and failed.
I wonder if there is a mathematician in this forum willing to present the issue in a form of a theorem and a proof for it, in a reasonable mathematical framework. So far all I can see is a bunch of ostensibly plausible informal arguments from different points of view.
Either this problem can be formalized, in which case such a theorem is possible to formulate (whether or not it is possible to prove), or it cannot, in which case it is pointless to argue about it.
I'm not sure I can add much by elaboration.
My general impression of you(1) is that you consider much of the discussion that takes place here, and much of the thinking of the people who do it, to be kind of a silly waste of time, and that you further see your role here in part as the person who points that fact out to those who for whatever reason have failed to notice it.
Within that context, responding to a comment with a request to formalize it is easy to read as a polite way of expressing "what you just said is uselessly vague. If you are capable of saying something useful, do so, otherwise shut up and leave this subject to the grownups."
And since you aren't consistent about wanting everything to be expressed as a formalism, I assume this is a function of the topic of discussion, because that's the most charitable assumption I can think of.
That said, I reiterate that I have no special knowledge of why you're being downvoted; please don't take me as definitive.
(1) This might be an unfair impression, as I no longer remember what it was that led me to form it.
Why do you assume agents cannot randomize?
1) Not to my knowledge. 2) No, you reasoned TDT's decisions correctly. 3) A TDT agent would not self-modify to CDT, because if it did, its simulation would also self-modify to CDT and then two-box, yielding only $1000 for the real TDT agent. 4) TDT does seem to be a single algorithm, albeit a recursive one in the presense of other TDT agents or simulations. TDT doesn't have to look into its own code, nor does it change its mind upon seeing it, for it decides as if deciding what the code outputs. 5) This is a bit of a tricky one. You could say it's fair if ... (read more)
In Newcomb's Problem, Omega determines ahead of time what decision theory you use. In these problems, it selects an arbitrary decision theory ahead of time. As such, for any agent using this preselected d... (read more)
Generalization of Newcomb's Problem: Omega predicts your behavior with accuracy p.
This one could actually be experimentally tested, at least for certain values of p; so for instance we could run undergrads (with $10 and $100 instead of $1,000 and $1,000,000; don't bankrupt the university) and use their behavior from the pilot experiment to predict their behavior in later experiments.
Why is the discrimination problem "unfair"? It seems like in any situation where decision theories are actually put into practice, that type of reasoning is likely to be popular. In fact I thought the whole point of advanced decision theories was to deal with that sort of self-referencing reasoning. Am I misunderstanding something?
Is the trick with problem 1 that what you are really doing, by using a simulation, is having an agent use timeless decision theory in a context where they can't use timeless decision theory? The simulated agent doesn't know about the external agent. Or, you could say, it's impossible for it to be timeless; the directionality of time (simulation first, external agent moves second) is enforced in a way that makes it impossible for the simulated agent to reason across that time barrier. Therefore it's not fair to call what it decides "timeless decision theory".
Either problem 1 and 2 are hitting an infinite regress issue, or I don't see why an ordinary TDT agent wouldn't 2box, and choose the first box, respectively. There's a difference between the following problems:
Now, in problem 1 and 2, are the simulated problem and the... (read more)
This is indeed a problem - and... (read more)
For problem 1, in the language of the blackmail posts, because the tactic omega uses to fill box 2,
depends on TDT-sim's decision, because Omega has already decided, and because Omega didn't make its decision known, a TDT agent presented with this problem is at an epistemic disadvantage relative to Omega: TDT can't react to Omega's actual decision, because it won't know Omega's actual decision until it knows it's own actual decision, at which point TDT can't further react. This epistemic ... (read more)
Will they? Surely it's clear that it's now possible to take $1,001,00, because the circumstances are slightly different.
In the standard Newcomb problem, where Omega predicts your behaviour, it's not possible to trick it or act other than its expectation. Here, it is.
Is there some basic part of decision theory I'm not accounting for here?
In both your problems, the seeming paradox comes from failure to recognize that the two agents (one that Omega has simulated and one making the decision) are facing entirely different prior information. Then, nothing requires them to make identical decisions. The second agent can simulate itself having prior information I1 (that the simulated agent has been facing), then infer Omega's actions, and arrive at the new prior information I2 that is relevant for the decision. And I2 now is independent of which decision the agent would make given I2.
There seems to be a contradiction here. If Omega siad this to me I would either have to believe omega just presented evidence of being untruthful some of the time.
If Omega simulated the problem at hand then in said simulation Omega must have siad: "Before you entered the room, I ran a simulation of this problem as presented to ... (read more)
I don't understand the special role of box 1 in Problem 2. It seems to me that if Omega just makes different choices for the box in which to put the money, all decision theories will say "pick one at random" and will be equal.
In fact, the only reason I can see why Omega picks box 1 seems to be that the "pick at random" process of your TDT is exactly "pick the first one". Just replace it with something dependant on its internal clock (or any parameter not known at the time when Omega asks its question) and the problem disappears.