[video] Paul Christiano's impromptu tutorial on AIXI and TDT

by lukeprog1 min read19th Mar 201213 comments


Personal Blog

Paul Christiano was about to give a tutorial on AIXI and TDT, so I whipped out my iPhone and recorded it. His tutorial wasn't carefully planned or executed, but it may still be useful to some. Note that when Paul writes "UDT" on a piece of paper he really meant "TDT." :)


HD video download links: 1, 2.

13 comments, sorted by Highlighting new comments since Today at 10:37 PM
New Comment

Please, please, someone put it on YouTube, and someone transcribe it.

Here's a transcript of the first half, with my changes to the original in brackets:

OK, so AIXI is a really simply algorithm. The setting is: we have some sequence of input bits to an algorithm A. It's a stream: i1, i2, i3, etc. And you have an output, a1, a2, a3, etc. And this works in the obvious way: if the input's i0, the output's a0, etc. So the algorithm's going to be, 'A maintains distribution over possible worlds W" So it's going to model the world W as another step, that takes its actions and then returns the next input.


So the world produces this first input [i1], then A produces this first output [a1], and then the world produces the next input [i2]. And so we can imagine this as being, like A is a computer, and W is everything else in the world. The input stream i1, i2, i3... is the input wire from the world, and the output stream a1, a2, a3... is the output back to the world. Physics [takes the output and the prior world-state and generates the next world-state, which provides the next input]. So, 'A maintains a distribution over possible worlds W' Do you know much about algorithmic information theory? Nothing too deep there, just the definitions. Listener: Such as Solomonoff Induction, Kolmogorov Complexity? So, the universal prior is just this [prior probability that's equal to the sum of all 2^-l(p), where is l(p) is the length of any program p that outputs the given bits]


It starts with a description [indecipherable] a "uniform" distribution over programs. Obviously, in a uniform distribution, like in some prefix-free encoding, you want the shorter program to be more likely. So we start with a uniform distribution over programs and condition on agreement with observations, amongst all possible programs for the world that would have produced this sequence of inputs [i1, i2, i3...] in response to the outputs from AIXI. So it's conditioned based on this uniform distribution [the universal prior]. Let's call this step k...I've completely diverged from my original notation, but it's OK.


So at step k, the machine has a distribution over possible worlds Wk, and it's deciding what action to take. Along with the input, we're gonna receive a reward signal r1, r2, r3... The goal is to maximize the reward signal. Now, once we have this reward signal and the model of the world, it's just going to chose the action which maximizes the reward.


So, "Select ak = argmax(Expected Value over possible worlds Wk of r(k+1)|ak=a)" So, this notation is really lying. A lot. This [gesturing to Wk] is choosing a program to take the place of the world; it takes the output and gives out the input and reward. So when I'm conditioning here [|ak=a], I'm not really conditioning.


I actually mean, for each program, if you set the kth action to be a, and you run that program to see what the world would do. So |ak=a means set the kth action to be a, temporarily, then take the average over all possible worlds and see what your reward would be. And then you get this quantity for each different action the machine could choose. This is kind of the simplest algorithm- this is kind of silly, this algorithm, because it's just very greedily maximizing reward, only doing it for the next step. And so basically, all AIXI does is it looks ahead for some horizon h, like it looks ahead h steps and tries to maximize its reward over those h steps.


So, just, instead it uses the algorithm: ""Select ak = argmax(Expected Value over possible worlds Mk of r(k+1)+r(k+2)+r(k+3)+...+r(k+h)|ak=a,a(k+1)=argmax(...), a(k+2)=argmax(...), ...a(k+h-1)=argmax(...)) This starts off the same, up to |ak=a), but to look ahead h steps we also need to model our own behavior over this period. So what I mean is that we can use this same algorithm and [use it to find a(k+1)], our own action in subsequent rounds.


We're taking into account the fact that in future rounds our model will be different, because we will have observed one more piece of information, which will allow our future selves to update their model. That's sort of like the most naive possible thing, in some sense. I've glossed over a small number of technical details, but I think that was pretty good. And then you can prove optimality results, and that sort of thing.


If you actually run this program against the universe, [indeciherable], it will choose optimal results, aside from the fact that there's a horizon. But over just the first h steps, the result will be optimal. OK, that's all! Now, I guess, one issue with this model is that it artificially splits the world into the agent and the non-agent. This sounds reasonable, because it's how we reasona about the world intuitively- there's stuff going in inside your brain and outside your brain. But this is not really cutting reality at a joint. [indecipherable] Your brain is just another thing that exists in the world.


This is what I mean when I say that decision theories are Cartesian dualists, which is a poetic way to put it. The goal is now to have an agent that's aware of the fact that it's just more stuff happening. To motivate this, we can see what happens to AIXI if we build an approximate version and put it in the world. I'm going to gloss over the details of the approximation of Solomonoff Induction that we use. (In real life, Solomonoff Induction takes far too long to ever be usable, but again, we're going to gloss over this.)


Drawing: A box labeled 'A', with an input wire with attached sensors, and an output wire with attached actuator. Note also that in the AIXI paper they define this thing, I don't even know what they call it, something like AIXI_ti, which is a time and space bounded version of AIXI. We're not going to deal with that, because this should still make clear what's wrong with dualism. So this, the box A, is an approximate version of AIXI, approximate because you can't actually compute these maxes. We imagine, when we design this algorithm, that we're putting A in a box and calling everything else W, for World.


What we want is for A to learn to model W. Physics supplies values on in, and takes values on out. Normal physics operates outside of A; it tells you what voltages come on the "in" wire, given what happens in the sensors, and it also tells you what the actuators do, once you've supplied what happens on this output wire. (We're applying this model where A actually models itself, given these processes- [as noted before, AIXI models future versions of itself to determine its likely actions, and therefore to determine the optimum action at current time].) So the question is, do you actually learn this model? If you have a bunch of observations, what sort of model of the world do you acquire?


This is obviously a very complicated model, because physics is pretty complicated, you know? But you still have to learn it [physics]. So let's call this Model 1. Here's model 2; Model 2 is just physics. In this model, physics supplies values on in, and that's all. We were imagining that physics does something [in the world to supply an input], and the magic of A supplies the output. But we can now imagine that physics will apply on A, too, which is true, given a reasonable model of physics. So, this model produces the exact same output as Model 1, because both are in agreeement with reality, as long as the characterization of what A's doing remains valid.


So, Models 1 and 2 make the same predictions. One observation is the Model 1 is a really odd model. The "takes values of output" is kind of a really arbitrary addendum. If you were a human trying to model the world, then you could be like "Well, my mind is behind a curtain, and there's this magical, non-physical law where my mind directly controls what happens in my muscles."


Or you could just get rid of that. And once you have a good enough understanding of physics, this [magical description of A] doesn't have any explanatory value.

Second half of the transcript:

You can cast this out formally, mathematically, and say that if you start with a uniform prior, [Model 2] seems to be simpler. There are some issues: you could say that A can't really model itself, tautologically, because it only has so much power. But if there are other agents in the world similar to A, lets call them B, C, etc, which are about as hard to model as A. A is still going to need a good approximate model using physics of B and C, and therefore of its own behaviour. So unless you do something special and go out of your way, it's going to be similar to model A.


And even if it can't predict exactly what it's going to do, you can still say the reductionist model it knows is accurate. You believe, as a mathematical fact, that the predictions of the model are in accordance with reality. So any AI that is sophisticated to come up with Model 2 will adopt it over Model 1. We could talk about the sort of philosophical difficulties that would stop A from being able to realize that, but for now let's just assume that Model 2 wins. So, what does A do if it believes Model 2? This is like asking, "What would you do if you believe that your choices have no effect on the world and you're just bringing in physics to see what you do?"


In the expectation [in the previously given algorithm A], it's setting the kth action equal to a. But in Model 2, reality doesn't care what it's equal to- it just going to keep running physics. It's like it's ignoring these outputs, instead of putting them on the output wire. So in Model 2, it doesn't matter what A does, physics is just going to keep running ahead. And A is thinking about its interventions in this very specific way- it's taking the world model its learned, the mapping from outputs to inputs, and seeing what would happen if you fed it a different output. But in Model 2, we're ignoring A's outputs entirely. So if you're only using this particular strategy of looking at what different possible worlds, the worlds aren't different at all, because reality is ignoring what you say and just using physics to generate inputs. So, A's decisions have no consequences.


So that means that, for decision-making purposes, A can ignore Model 2, because in Model 2 it doesn't matter what A does. So, if you were uncertain about whether reductionism was true or not, and you were using this sort of algorithm to choose between actions, you could just ignore the possibility that reductionism was true, and so I can concentrate on the non-reducationism possibility. The problem is that there's these other models: eg Model 3, which is just like Model 2, except for outputs "1001010101"- some random, hardwired exception to Model 2. So in this model it does matter what A does. And there are perhaps more natural forms of this, where reductionism is true, and everything is physical, but I'm still going to be rewarded if I make the right decisions.


So in this exception, perhaps something good happens to A, perhaps something bad happens. So, we've discarded Model 2 as irrelevant, but both Model 3 and Model 1 are still relevant. The evidence we have so far can't distinguish between any of these three models. In Model 1, your decision matters in the intended way; in Model 2, your decision doesn't matter; and in Model 3, your decision matters in this completely arbitrary way. (There's actually a whole, huge class of Model 3-like models.)


So we'd like you to believe Model 1 instead of Model 3, but the problem is that the part of Model 1 where it says "takes values on out" magically does surgery on the world to cause the output wire to agree with it. That model's actually really complicated. Given that physics is already producing the charges on the wire, this clause isn't supported by evidence, but is also really mechanically complicated. It's complicated to write down the program that looks in the universe, finds the output wire, and changes its value.


Whereas Model 3 isn't that complicated- if you're using a normal human programming language, stuff like this is quite straight forward- you just have one if statement. So Model 3 just requires about one extra line of code, while Model 1 requires a whole lot of extra code. So, since Model 3 is simpler, A winds up believing that reductionism is true, such that its actions are meaningless, and then it starts behaving completely erratically. So, this is just a simple description of what's wrong with trying to adapt AIXI to the world- it's not really compatible with Occam's Razor and normal inductive reasoning.


So, then we get to decision theory. So, what do you do, as a human? I don't know about your philosophical orientation, but I assume that you are onboard with reductionism. [Listener: Yeah.] And yet you still decide that you should make choices in a certain way; you desire outcomes, and you make choices that systematically lead to those outcomes. So, the question is, what is a good algorithmic solution? In some sense you understand that your actions have no consequences, that everything that you're going to do is already fixed. This would be even more severe is you were an AI and you knew your own source code, because there's a fact of the matter as to whether you would output zero or one, and no matter what you do, you couldn't affect this, it's just frozen. So how do you reason in this context?


There's a very simple answer. So, in UDT [Updateless Decision Theory] you have uncertainty. You are A, and you know your own source code. (Or, at least, you have a distribution over what your brain looks like; you know what it is, or you know how would would learn.) But even once you know this, you are still uncertain about what you're going to do. You don't know whether you'll output zero or one- and you can't know, in some strong sense. So, you have uncertainty about your own output, even though it's a mathematical fact.


So, in light of this, you have some distribution over all of these unknown facts about the world- a joint distribution over a bunch of things, including what you end up doing, and the utility you get.


Like A, you use a function that you're aware of. And there's a known mathematical fact of the matter about how the universe evolves. So there's also a mathematical fact of the matter about what your utility is. It is, in some sense, immutable, in the same way that your actions are immutable. So you have this joint distribution over what you actually do, and what your utility is, and other stuff. You have uncertainty; your uncertainty is correlated.


So you could know that if a=0, U=0, and if a=1, U=1. And you could know this even if you didn't know what a was. A reasonable way to approach this is to take the action such that, once you condition on learning that you took that action, your expected utility is as high as possible. (This is essentially Evidential Decision Theory [EDT].) The only distinction to Updateless Decision Theory is that you treat [the actions and utilities] as mathematical facts. So instead of saying "the action that you took," where you is some vague indexical, we look at your code and condition on that mathematical algorithm that you're instantiating outputs this.


And conditioned on that mathematical facts being true, you can reason about these other mathematical facts. I guess that was really EDT, and I'll need some more discussion to explain why EDT has some problems, and why the best way to get around those problems is to talk about this algorithm that's outputting the action. About the vague "you," in EDT. I guess there's two natural ways to go. One is to clarify why we've defined things the way we have, and another is to try to formalize it- because this isn't formalized yet, and both EDT and UDT are going to run into a lot of problems when we try to formalize them.


So, the common place that EDT fails is something like the Smoking Lesion Problem. Suppose there are two classes of agents, one of which likes candy, and always dies early, and the other of which likes candy less. So then you get situations where, if you learn that you've eaten some candy, you can infer that you're the sort of agent that likes candy, and therefore the sort of agent that will die early. This seems incorrect, in the sense that your decision to take the candy doesn't really have any influence on whether or not you die early.


The point is that by passing to this context, where you look at the algorithm we're using directly, you don't get any more information.

[sentence fragment]

This is a good explanation of UDT, its relationship to EDT, and why UDT solves the Smoking Lesion Problem when EDT doesn't. The Smoking Lesion Problem is actually a central motivation for UDT. (The story is that I was reading about the Smoking Lesion Problem, and thought something like "EDT would solve this correctly if the agent knew its own source code and conditioned on that first before deciding its action", which doesn't quite work, but helped me to later formulate UDT). I've been lazy about writing a post on this, so I'm glad Paul has figured it out and Randaly has put it in written form. (None of the previous discussions of the Smoking Lesion Problem I've seen on LW seems to have really "gotten it".)

Thanks a lot for doing this!

Thanks, I've embedded it in the post.

The embedded YouTube video seems to end rather abruptly. Did the iPhone battery run out?

Note that when Paul writes "UDT" on a piece of paper he really meant "TDT." :)

The description fits both UDT and TDT, so there seems to be no need to correct that (it doesn't fit ADT).

For the longest while I have been trying to figure what AIXI is about. Tell me if I got it correct:

  1. We are in an unknown world that has a utility function to maximise. For instance, we are in a pacman game and we are trying to gain the maximum score possible.

  2. Based on the previous observation and rewards, AIXI forms different model for predicting which action will maximise future rewards. It chooses the model with the greats rewards with a small program size.

I want to see what he's writing, preferably as he's writing it, but just having the document while I'm watching would be very useful.

About a half of the tutorial is about the ideas from AIXI and Existential Despair (which are used to motivate UDT, as compared to AIXI).