I believe the discussion of UDT is spot on, and a very good summary placing various thought experiments in its context (though reframing Smoking Lesion to get the correct answer seems like cheating).
I have trouble understanding your second point about Sleeping Beauty (and DT-independent probabilities).
I was really hoping that a post called "What is UDT?" would explain what UDT is. You've got pages of discussion and diagrams, but only one sentence describing what UDT is.
Yeah, but the thing is: I don't think there is very much to it. (And in fact about 25% of my motivation for writing this is to see whether others will 'correct me' on that.)
If I say "it's extremely simple, it's blurgh" that means among other things "it is blurgh". Not "it is blurgh and whole bunch of other stuff which I'm never going to get round to."
To be fair, one thing I haven't mentioned at all is the concept of logical uncertainty, which plays a critical role in TDT (which was after all the motivation for UDT) and in a number of past threads where UDT was under discussion. But again, I personally don't think we need to go into this to explain what UDT is.
I wish I could say this in a nicer way, but here it is: this post has not clarified UDT for me one bit, and I seem to be more confused now about the topic than I previously was.
The diagrams and lack of explanation of how I'm supposed to interpret them are a large part of why this isn't helpful to me at all.
Thanks for writing this post. I think it's an excellent step toward popularizing UDT/TDT (though I'm still not yet convinced subjective probabilities have a canonical meaning independent of specific decision problems). I recently downloaded some vector graphics software to write a post with much the same content, and much the same motive, so it's a relief to see some of it already written!
This post clarified some concepts for me but also created some confusion:
Hello, thank you for the post!
All images on this post are no longer available. I'm wondering if you're able to imbed them directly into the rich text :)
Can someone explain this to me in a simpler way? It looks really interesting and I generally assume the premises you've stated for UDT so it may be a useful tool.
I like the idea of the graphs, but the way they're drawn is often unenlightening. (E.g. it requires lots of disclaimers on the side, and a paragraph to distinguish Newcomb and the Smoking Lesion.)
Unfortunately, I don't have a better suggestion to offer.
Please show your work on your thirder Sleeping Beauty answer, given an observed random bit (α or β upon waking) that's independent of the coin toss. I realize the bit is flipped on Tuesday, which is only observed by Beauty if tails, but a conditionally flipped independent random bit is still independent.
Why is the availability of a random bit necessary and/or sufficient to end the Beauty controversy?
I found this to be a very helpful article, but a bit more deliberation on the Smoking Lesion problem would be helpful; I looked it up on the wiki, but still don't really understand it, or how it is solved.
Not to nitpick, but if the colors correspond to days, shouldn't the left half of the left orange box be blue and the left half of the right blue box be orange? I believe this may affect the answer UDT gives.
Hey, Neil - sorry to bug you here but I didn't know how else to get in touch. Would you be willing to email me at my LJ address? I would like to ask you a question but here is probably not an appropriate place to do it.
Thanks!
-MM
When you say "uniform random number in the set [0,1]", you mean something isomorphic to a countably infinite sequence of (independent) random bits (each equally likely 0 or 1) . The probability of any particular sequence is then 0. I guess you alluded to this with "probability (or rather density)".
But what's your larger point? Why is it necessary to fix some particular random bits? Couldn't you just as well say I have the ability to generate them if I want them? I guess by having them exist before I ask to see them, you can think of a Player decision node as existing frozen in time (interactions with the world happen only as results of decisions).
What if, in addition to freely-willed 'Player' nodes and random 'Nature' nodes, there is a third kind of node where the branch followed depends on the Player's strategy for a particular information state, regardless of whether that strategy has yet been executed. In other words, what if the universe contains 'telepathic robots' (whose behaviour is totally mechanical - they're not trying to maximize a utility function) that can see inside the Player's mind before they have acted?
Why is this worth considering?
Re: "The above procedure is extremely limited. Taking it exactly as stated, it only applies to games with a single player and a single opportunity to act at some stage in the game."
I don't really see what you mean. Your "naive" decision theory updated on sensory input - and then maximised expected utility. That seems like standard decision theory to me - and surely it works fine with multiple actors and iterated interactions.
As a newcomer to LessWrong, I quite often see references to 'UDT' or 'updateless decision theory'. The very name is like crack - I'm irresistably compelled to find out what the fuss is about.
Wei Dai's post is certainly interesting, but it seemed to me (as a naive observer) that a fairly small 'mathematical signal' was in danger of being lost in a lot of AI-noise. Or to put it less confrontationally: I saw a simple 'lesson' on how to attack many of the problems that frequently get discussed here, which can easily be detached from the rest of the theory. Hence this short note, the purpose of which is to present and motivate UDT in the context of 'naive decision theory' (NDT), and to pre-empt what I think is a possible misunderstanding.
First, a quick review of the basic Bayesian decision-making recipe.
What is Naïve Decision Theory?
You take the prior and some empirical data and calculate a posterior by (i) working out the 'likelihood function' of the data and (ii) calculating prior times likelihood and renormalising. Then you calculate expected utilities for every possible action (wrt to this posterior) and maximize.
Of course there's a lot more to conventional decision theory than this, but I think one can best get a handle on UDT by considering it as an alternative to the above procedure, in order to handle situations where some of its presuppositions fail.
(Note: NDT is especially 'naïve' in that it takes the existence of a 'likelihood function' for granted. Therefore, in decision problems where EDT and CDT diverge, one must 'dogmatically' choose between them at the outset just to obtain a problem that NDT regards as being well-defined.)
When does NDT fail?
The above procedure is extremely limited. Taking it exactly as stated, it only applies to games with a single player and a single opportunity to act at some stage in the game. The following diagram illustrates the kind of situation for which NDT is adequate:
This is a tree diagram (as opposed to a causal graph). The blue and orange boxes show 'information states', so that any player-instance within the blue box sees exactly the same 'data'. Hence, their strategy (whether pure or mixed) must be the same throughout the box. The branches on the right have been greyed out to depict the Bayesian 'updating' that the player following NDT would do upon seeing 'blue' rather than 'orange'--a branch is greyed out if and only if it fails to pass through a blue 'Player' node. Of course, the correct strategy will depend on the probabilities of each of Nature's possible actions, and on the utilities of each outcome, which have been omitted from the diagram. The probabilities of the outward branches from any given 'Nature' node are to be regarded as 'fixed at the outset'.
Now let's consider two generalisations:
It may be worth remarking that we haven't even considered the most obvious generalisation: The one where the game includes several 'freely-willed' Players, each with their own utility functions. However, UDT doesn't say much about this - UDT is intended purely as an approach to solving decision problems for a single 'Player', and to the extent that other 'Players' are included, they must be regarded as 'robots' (of the non-telepathic type) rather than intentional agents. In other words, when we consider other Players, we try to do the best we can from the 'Physical Stance' (i.e. try to divine what they will do from their 'source code' alone) rather than rising to the 'Intentional Stance' (i.e. put ourselves in their place with their goals and see what we think is rational).
Note: If a non-forgetful player has several opportunities to act then, as long as the game only contains Player and Nature nodes, the Player is able to calculate the relevant likelihood function (up to a constant of proportionality) from within any of their possible information states. Therefore, they can solve the decision problem recursively using NDT, working backwards from the end (as long as the game is guaranteed to end after a finite number of moves.) If, in addition to this, the utility function is 'separable' (e.g. a sum of utilities 'earned' at each move) then things are even easier: each information state gives us a separate NDT problem, which can be solved independently of the others. Therefore, unless the player is forgetful, the 'naïve' approach is capable of dealing with generalisation 1.
Here are two familiar examples of generalisation 1 (ii):
Note: The Sleeping Beauty problem is usually presented as a question about probabilities ("what is the Player's subjective probability that the coin toss was heads?") rather than utilities, although for no particularly good reason the above diagram depicts a decision problem. Another point of interest is that the Absent-Minded Driver contains an extra ingredient not present in the SB problem: the player's actions affect how many player-instances there are in a branch.
Now a trio of notorious problems exemplifying generalisation 2:
How Does UDT Deal With These Problems?
The essence of UDT is extremely simple: We give up the idea of 'conditioning on the blue box' (doing Bayesian reasoning to obtain a posterior distribution etc) and instead just choose the action (or more generally, the probability distribution over actions) that will maximize the unconditional expected utility.
So, UDT:
Is that it? (Doesn't that give the wrong answer to the Smoking Lesion problem?)
Yes, that's all there is to it.
Prima facie, the tree diagram for the Smoking Lesion would seem to be identical to my diagram of Newcomb's Problem (except that the connection between Omega's action and the Player's action would have to be probabilistic), but let's look a little closer:
Wei Dai imagines the Player's action to be computed by a subroutine called S, and although other subroutines are free to inspect the source code of S, and try to 'simulate' it, ultimately 'we' the decision-maker have control over S's source code. In Newcomb's problem, Omega's activities are not supposed to have any influence on the Player's source code. However, in the Smoking Lesion problem, the presence of a 'lesion' is somehow supposed to cause Player's to choose to smoke (without altering their utility function), which can only mean that in some sense the Player's source code is 'partially written' before the Player can exercise any control over it. However, UDT wants to 'wipe the slate clean' and delete whatever half-written nonsense is there before deciding what code to write.
Ultimately this means that when UDT encounters the Smoking Lesion, it simply throws away the supposed correlation between the lesion and the decision and acts as though that were never a part of the problem. So the appropriate tree diagram for the Smoking Lesion problem would have a Nature node at the bottom rather than an Omega node, and so UDT would advise smoking.
Why Is It Rational To Act In The Way UDT Prescribes?
UDT arises from the philosophical viewpoint that says things like
If you take the above seriously then you're forced to conclude that a game containing an Omega node 'linked' to a Player node in the manner above is isomorphic (for the purposes of decision theory) to the game in which that Omega node is really a Player node belonging to the same information state. In other words, 'Counterfactual Mugging' is actually isomorphic to:
This latter version is much less of a headache to think about! Similarly, we can simplify and solve The Absent-Minded Driver by noting that it is isomorphic to the following, which can easily be solved:
Even more interesting is the fact that the Absent-Minded Driver turns out to be isomorphic to (a probabilistic variant of) Parfit's Hitchhiker (if we interchange the Omega and Player nodes in the above diagram).
Addendum: Do Questions About Subjective Probability Have Answers Irrespective Of One's Decision Theory And Utility Function?
In the short time I've been here, I have seen several people arguing that the answer is 'no'. I want to say that the answer is 'yes' but with a caveat:
We have puzzles like the Absent-Minded Driver (original version) where the player's strategy for a particular information state affects the probability of that information state recurring. It's clear that in such cases, we may be unable to assign a probability to a particular event until the player settles on a particular strategy. However, once the player's strategy is 'set in stone', then I want to argue that regardless of the utility function, questions about the probability of a given player-instance do in fact have canonical answers:
Let's suppose that each player-instance is granted a uniform random number in the set [0,1]. In a sense this was already implicit, given that we had no qualms about considering the possibility of a mixed strategy. However, let's suppose that each player-instance's random number is now regarded as part of its 'information'. When a player sees (i) that she is somewhere within the 'blue rectangle', and (ii) that her random number is α, then for all player-instances P within the rectangle, she can calculate the probability (or rather density) of the event "P's random number is α" and thereby obtain a conditional probability distribution over player-instances within the rectangle.
Notice that this procedure is entirely independent of decision theory (again, provided that the Player's strategy has been fixed).
In the context of the Sleeping-Beauty problem (much discussed of late) the above recipe is equivalent to asserting that (a) whenever Sleeping Beauty is woken, this takes place at a uniformly distributed time between 8am and 9am and (b) there is a clock on the wall. So whenever SB awakes at time α, she learns the information "α is one of the times at which I have been woken". A short exercise in probability theory suffices to show that SB must now calculate 1/3 probabilities for each of (Heads, Monday), (Tails, Monday) and (Tails, Tuesday) [which I think is fairly interesting given that the latter two are, as far as the prior is concerned, the very same event].
One can get a flavour of it by considering a much simpler variation: Let α and β be 'random names' for Monday and Tuesday, in the sense that with probability 1/2, (α, β) = ("Monday", "Tuesday") and with probability 1/2, (α, β) = ("Tuesday", "Monday"). Suppose that SB's room lacks a clock but includes a special 'calendar' showing either α or β, but that SB doesn't know which symbol refers to which day.
Then we obtain the following diagram:
Nature's first decision determines the meaning of α and β, and its second is the 'coin flip' that inaugurates the Sleeping Beauty problem we know and love. There are now two information states, corresponding to SB's perception of α or β upon waking. Thus, if SB sees the calendar showing α (the orange state, let's say) it is clear that the conditional probabilities for the three possible awakenings must be split (1/3, 1/3, 1/3) as above (note that the two orange 'Tuesday' nodes correspond to the same awakening.)