UDT agents as deontologists

One way (the usual way?) to think of an agent running Updateless Decision Theory is to imagine that the agent always cares about all possible worlds according to how probable those worlds seemed to the agent's builders when they wrote the agent's source code[see added footnote 1 below].  In particular, the agent never develops any additional concern for whatever turns out to be the actual world[2].  This is what puts the "U" in "UDT".

I suggest an alternative conception of a UDT agent, without changing the UDT formalism. According to this view, the agent cares about only the actual world.  In fact, at any time, the agent cares about only one small facet of the actual world — namely, whether the agent's act at that time maximizes a certain fixed act-evaluating function.  In effect, a UDT agent is the ultimate deontologist:  It doesn't care at all about the actual consequences that result from its action.  One implication of this conception is that a UDT agent cannot be truly counterfactually mugged.

[ETA: For completeness, I give a description of UDT here (pdf).]

Vladimir Nesov's Counterfactual Mugging presents us with the following scenario:

Imagine that one day, Omega comes to you and says that it has just tossed a fair coin, and given that the coin came up tails, it decided to ask you to give it \$100. Whatever you do in this situation, nothing else will happen differently in reality as a result. Naturally you don't want to give up your \$100. But see, the Omega tells you that if the coin came up heads instead of tails, it'd give you \$10000, but only if you'd agree to give it \$100 if the coin came up tails.

Omega can predict your decision in case it asked you to give it \$100, even if that hasn't actually happened, it can compute the counterfactual truth. The Omega is also known to be absolutely honest and trustworthy, no word-twisting, so the facts are really as it says, it really tossed a coin and really would've given you \$10000.

An agent following UDT will give the \$100.  Imagine that we were building an agent, and that we will receive whatever utility follows from the agent's actions.  Then it's easy to see why we should build our agent to give Omega the money in this scenario.  After all, at the time we build our agent, we know that Omega might one day flip a fair coin with the intentions Nesov describes.  Whatever probability this has of happening, our expected earnings are greater if we program our agent to give Omega the \$100 on tails.

More generally, if we suppose that we get whatever utility will follow from our agent's actions, then we can do no better than to program the agent to follow UDT.  But since we have to program the UDT agent now, the act-evaluating function that determines how the agent will act needs to be fixed with the probabilities that we know now.  This will suffice to maximize our expected utility given our best knowledge at the time when we build the agent.

So, it makes sense for a builder to program an agent to follow UDT on expected-utility grounds.  We can understand the builder's motivations.  We can get inside the builder's head, so to speak.

But what about the agent's head?  The brilliance of Nesov's scenario is that it is so hard, on first hearing it, to imagine why a reasonable agent would give Omega the money knowing that the only result will be that they gave up \$100.  It's easy enough to follow the UDT formalism.  But what on earth could the UDT agent itself be thinking?  Yes, trying to figure this out is an exercise in anthropomorphization.  Nonetheless, I think that it is worthwhile if we are going to use UDT to try to understand what we ought to do.

Here are three ways to conceive of the agent's thinking when it gives Omega the \$100.  They form a sort of spectrum.

1. One extreme view:  The agent considers all the possible words to be on equal ontological footing.  There is no sense in which any one of them is distinguished as "actual" by the agent.  It conceives of itself as acting simultaneously in all the possible worlds so as to maximize utility over all of them.  Sometimes this entails acting in one world so as to make things worse in that world.  But, no matter which world this is, there is nothing special about it.  The only property of the world that has any ontologically significance is the probability weight given to that world at the time that the agent was built. (I believe that this is roughly the view that Wei Dai himself takes, but I may be wrong.)
2. An intermediate view:  The agent thinks that there is only one actual world.  That is, there is an ontological fact of the matter about which world is actual.  However, the other possible worlds continue to exist in some sense, although they are merely possible, not actual.  Nonetheless, the agent continues to care about all of the possible worlds, and this amount of care never changes.  After being counterfactually mugged, the agent is happy to know that, in some merely-possible world, Omega gave the agent \$10000.
3. The other extreme:  As in (2), the agent thinks that there is only one actual world.  Contrary to (2), the agent cares about only this world.  However, the agent is a deontologist.  When deciding how to act, all that it cares about is whether its act in this world is "right", where "right" means "maximizes the fixed act-evaluating function that was built into me."

View (3) is the one that I wanted to develop in this post.  On this view, the "probability distribution" in the act-evaluating function no longer has any epistemic meaning for the agent.  The act-evaluating function is just a particular computation which, for the agent, constitutes the essence of rightness.  Yes, the computation involves considering some counterfactuals, but to consider those counterfactuals does not entail any ontological commitment.

Thus, when the agent has been counterfactually mugged, it's not (as in (1)) happy because it cares about expected utility over all possible worlds.  It's not (as in (2)) happy because, in some merely-possible world, Omega gave it \$10000.  On this view, the agent considers all those "possible worlds" to have been rendered impossible by what it has learned since it was built.  The reason the agent is happy is that it did the right thing.  Merely doing the right thing has given the agent all the utility it could hope for.  More to the point, the agent got that utility in the actual world.  The agent knows that it did the right thing, so it genuinely does not care about what actual consequences will follow from its action.

In other words, although the agent lost \$100, it really gained from the interaction with Omega.  This suggests that we try to consider a "true" analog of the Counterfactual Mugging.  In The True Prisoner's Dilemma, Eliezer Yudkowsky presents a version of the Prisoner's Dilemma in which it's viscerally clear that the payoffs at stake capture everything that we care about, not just our selfish values.  The point is to make the problem about utilons, and not about some stand-in, such as years in prison or dollars.

In a True Counterfactual Mugging, Omega would ask the agent to give up utility.  Here we see that the UDT agent cannot possibly do as Omega asks.  Whatever it chooses to do will turn out to have in fact maximized its utility.  Not just expected utility, but actual utility. In the original Counterfactual Mugging, the agent looks like something of a chump who gave up \$100 for nothing.  But in the True Counterfactual Mugging, our deontological agent lives with the satisfaction that, no matter what it does, it lives in the best of all possible worlds.

[1] ETA: Under UDT, the agent assigns a utility to having all of the possible worlds P1, P2, . . . undergo respective execution histories E1, E2, . . ..  (The way that a world evolves may depend in part on the agent's action).  That is, for each vector <E1, E2, . . .> of ways that these worlds could respectively evolve, the agent assigns a utility U(<E1, E2, . . .>).  Due to criticisms by Vladimir Nesov (beginning here), I have realized that this post only applies to instances of UDT in which the utility function U takes the form that it has in standard decision theories.  In this case, each world Pi has its own probability pr(Pi) and its own utility function ui that takes an execution history of Pi alone as input, and the function U takes the form

U(<E1, E2, . . .>) = Σi pr(Pi) ui(Ei).

The probabilities pr(Pi) are what I'm talking about when I mention probabilities in this post.  Wei Dai is interested in instances of UDT with more general utility functions U.  However, to my knowledge, this special kind of utility function is the only one in terms of which he's talked about the meanings of probabilities of possible worlds in UDT.  See in particular this quote from the original UDT post:

If your preferences for what happens in one such program is independent of what happens in another, then we can represent them by a probability distribution on the set of programs plus a utility function on the execution of each individual program.

(A "program" is what Wei Dai calls a possible world in that post.)  The utility function U is "baked in" to the UDT agent at the time it's created.  Therefore, so too are the probabilities pr(Pi).

[2] By "the actual world", I do not mean one of the worlds in the many-worlds interpretation (MWI) of quantum mechanics.  I mean something more like the entire path traversed by the quantum state vector of the universe through its corresponding Hilbert space.  Distinct possible worlds are distinct paths that the state of the universe might (for all we know) be traversing in this Hilbert space.  All the "many worlds" of the MWI together constitute a single world in the sense used here.

ETA: This post was originally titled "UDT agents are deontologists".  I changed the title to "UDT agents as deontologists" to emphasize that I am describing a way to view UDT agents.  That is, I am describing an interpretive framework for understanding the agent's thinking.  My proposal is analogous to Dennett's "intentional stance".  To take the intentional stance is not to make a claim about what a conscious organism is doing.  Rather, it is to make use of a framework for organizing our understanding of the organism's behavior.  Similarly, I am not suggesting that UDT somehow gets things wrong.  I am saying that it might be more natural for us if we think of the UDT agent as a deontologist, instead of as an agent that never changes its belief about which possible worlds will actually happen.  I say a little bit more about this in this comment.

magical algorithm
Highlighting new comments since Today at 6:55 PM

I don't understand the motivation for developing view (3). It seems like any possible agent could be interpreted that way:

When deciding how to act, all that it cares about is whether its act in this world is "right", where "right" means "maximizes the fixed act-evaluating function that was built into me."

How does it help us to understand UDT specifically?

I don't claim that it helps us to understand UDT as a decision theory. It is a way to anthropomorphize UDT agents to get an intuitively-graspable sense of what it might "feel like to be" a UDT agent with a particular utility function U over worlds.

It seems like any possible agent could be interpreted that way:

When deciding how to act, all that it cares about is whether its act in this world is "right", where "right" means "maximizes the fixed act-evaluating function that was built into me."

How does it help us to understand UDT specifically?

I think that I probably missed your point in my first reply. I see now that you were probably asking why it's any more useful to view UDT agents this way that it would be to view any arbitrary agent as a deontologist.

The reason is that the UDT agent appears, from the outside, to be taking into account what happens in possible worlds that it should know will never happen, at least according to conventional epistemology. Unlike conventional consequentialists, you cannot interpret its behavior as a function of what it thinks might happen in the actual world (with what probabilities and with what payoffs). You can interpret its behavior as a function of what its builders thought might happen in the actual world, but you can't do this for the agent itself.

One response to this is to treat the UDT agent as a consequentialist who cares about the consequences of its actions even in possible worlds that it knows aren't actual. This is perfectly fine, except that it makes it hard to conceive of the agent as learning anything. The agent continues to take into account the evolution-histories of world-programs that would call it as a subroutine if they were run, even after it learns that they won't be run. (Obviously this is not a problem if you think that the notion of an un-run program is incoherent.)

The alternative approach that I offer allows us to think of the agent as learning which once-possible worlds are actual. This is a more natural way to conceive of epistemic agents in my opinion. The cost is that the UDT agent is now a deontologist, for whom the rightness of an action doesn't depend on just the effects that it will have in the actual world. "Rightness" doesn't depend on actual consequences, at least not exclusively. However, the additional factors that figure into the "rightness" of an act require no further justification as far as the agent is concerned.

This is not to turn those additional factors into a "black box". They were designed by the agent's builders on conventional consequentialist grounds.

I feel again as if I do not understand what Timeless Decision Theory or Updateless Decision Theory is (or what it's for; what it adds to ordinary decision theory). Can anyone help me? For example, by providing the simplest possible example of one of these "decision theories" in action?

Suppose we have an agent that cares about something extremely simple, like number of paperclips in the world. More paperclips is a better world. Can someone provide an example of how TDT or UDT would matter, or would make a difference, or would be applied, by an entity which made its decisions using that criterion?

This is my vague understanding.

Naive decision theory: "Choose the action that will cause the highest expected utility, given what I know now."

Timeless decision theory: "Choose the action that I wish I had precommitted to, given what I know now."

Updateless decision theory: "Choose a set of rules that will cause the highest expected utility given my priors, then stick to it no matter what happens."

If this is accurate, then I don't see how UDT can generally be better than TDT.

UDT would be better in circumstances where you suspect that your ability to update accurately is compromised.

I'm assuming that the priors for UDT were set at some past time.

UDT gives the money in the counterfactual mugging thought experiment, TDT doesn't.

There's nothing that prevents a UDT agent from behaving as if it were updating; that's what I surmise would happen in more normal situations where Omega isn't involved. But if ignoring information is the winning move, TDT can't do that.

TDT and UDT are intended to solve Newcomb's problem and the prisoner's dilemma and those are surely the simplest examples of their strengths. It is fairly widely believed that, say, causal decision theory two-boxes and defects, but I would rather say that CDT simply doesn't understand the statements of the problems. Either way, one-boxing and arranging mutual cooperation are improvements.

If it's any consolation, the last bit of understanding of the original Wei Dai's post (the role of execution histories, prerequisite to being able to make this correction) dawned on me only last week, as a result of a long effort for developing a decision theory of my own that only in retrospect turned out to be along roughly the same lines as UDT.

A convergence like that makes both UDT and your decision theory more interesting to me. Is the process of your decision theory's genesis detailed on your personal blog? In retrospect, was your starting place and development process influenced heavily enough by LW/OB/Wei Dai to screen out the coincidence?

I call it "ambient control". This can work as an abstract:

You, as an agent, determine what you do, and so have the power to choose which statements about you are true. By making some statements true and not others, you influence the truth of other statements that logically depend on the statements about you. Thus, if you have preference about what should be true about the world, you can make some of those things true by choosing what to do. Theories of consequences (partially) investigate what becomes true if you make a particular decision. (Of course, you can't change what's true, but you do determine what's true, because some truths are about you.)

Longer description here. I'll likely post on some aspects of it in the future, as the idea gets further developed. There is a lot of trouble with logical strength of theories of consequences, for example. There is also some hope to unify logical and observational uncertainty here, at the same time making the decision algorithm computationally feasible (it's not part of the description linked above).

Here, so far as I can understand it, is UDT vs. ordinary DT for paper clips:

Ordinary DT ("ODT") says: at all times t, act so as to maximize the number of paper clips that will be observed at time (t + 1), where "1" is a long time and we don't have to worry about discount rates.

UDT says: in each situation s, take the action that is returns the highest value on an internal lookup table that has been incorporated into me as part of my programming, which, incidentally, was programmed by people who loved paper clips.

Suppose ODT and UDT are fairly dumb, say, as smart as a cocker spaniel.

Suppose we put both agents on the set of the movie Office Space. ODT will scan the area, evaluate the situation, and simulate several different courses of action, one of which is bending staples into paper clips. Other models might include hiding, talking to accountants, and attempting to program a paper clip screensaver using Microsoft Office. The model that involves bending staples shows the highest number of paper clips in the future compared to other models, so the ODT will start bending staples. If the ODT is later surprised to discover that the boss has walked in and confiscated the staples, it will be "sad" because it did not get as much paper-clip utility as it expected to, and it will mentally adjust the utility of the "bend staples" model downward, especially when it detects boss-like objects. In the future, this may lead ODT to adopt different courses of behavior, such as "bend staples until you see boss, then hide." The reason for changing course and adopting these other behaviors is that they would have relatively higher utility in its modeling scheme.

UDT will scan the area, evaluate the situation, and categorize the situation as situation #7, which roughly corresponds to "metal available, no obvious threats, no other obvious resources," and lookup the correct action for situation #7, which its programmers have specified is "bend staples into paper clips." Accordingly, UDT will bend staples. If UDT is later surprised to discover that the boss has wandered in and confiscated the staples, it will not care. The UDT will continue to be confident that it did the "right" thing by following its instructions for the given situation, and would behave exactly the same way if it encountered a similar situation.

UDT sounds stupider, and, at cocker-spaniel levels of intelligence, it undoubtedly is. That's why evolution designed cocker-spaniels to run on ODT, which is much more Pavlovian. However, UDT has the neat little virtue that it is immune to a Counterfactual Mugging. If we could somehow design a UDT that was arbitrarily intelligent, it would both achieve great results and win in a situation where ODT failed.

Here, so far as I can understand it, is Tyrell's UDT vs. ordinary DT for paper clips:

For god's sake, don't call it my UDT :D. My post already seems to be giving some people the impression that I was suggesting some amendment or improvement to Wei Dai's UDT.

Edited. [grin]

The act-evaluating function is just a particular computation which, for the agent, constitutes the essence of rightness.

This sounds almost like saying that the agent is running its own algorithm because running this particular algorithm constitutes the essence of rightness. This perspective doesn't improve understanding of the process of decision-making, it just rounds up the whole agent in an opaque box and labels it an officially approved way to compute. The "rightness" and "actual world" properties you ascribe to this opaque box don't seem to be actually present.

The "rightness" and "actual world" properties you ascribe to this opaque box don't seem to be actually present.

They aren't present as part of what we must know to predict the agent's actions. They are part of a "stance" (like Dennett's intentional stance) that we can use to give a narrative framework within which to understand agent's motivation. What you are calling a black box isn't supposed to be part of the "view" at all. Instead of a black box, there is a socket where a particular program vector and "preference vector" , together with the UDT formalism, can be plugged in.

ETA: The reference to a "'preference vector' " was a misreading of Wei Dai's post on my part. What I (should have) meant was the utility function U over world-evolution vectors .

I don't understand this.

Edited

Previously, I attempted to disagree with this comment. My disagreement was tersely dismissed, and, when I protested, my protests were strongly downvoted. This suggests two possibilities:

(1) I fail to understand this topic in ways that I fail to understand or (2) I lack the status in this community for my disagreement with Vladmir_Nesov on this topic to be welcomed or taken seriously.

If I were certain that the problem were (2), then I would continue to press my point, and the karma loss be damned. However, I am still uncertain about what the problem is, and so I am deleting all my posts on the thread underneath this comment.

One commenter suggested that I was being combative myself; he may be right. If so, I apologize for my tone.

Saying that this decision is "right" has no explanatory power, gives no guidelines on the design of decision-making algorithms.

gives no guidelines on the design of decision-making algorithms.

I am nowhere purporting to be giving guidelines for the design of a decision-making algorithm. As I said, I am not suggesting any alteration of the UDT formalism. I was also explicit in the OP that there is no problem understanding at an intuitive level what the agent's builders were thinking when they decided to use UDT.

If all you care about is designing an agent that you can set loose to harvest utility for you, then my post is not meant to be interesting to you.

Beliefs should pay rent, not fly in the ether, unattached to what they are supposed to be about.

Beliefs should pay rent . . .

The whole Eliezer quote is that beliefs should "pay rent in future anticipations". Beliefs about which once-possible world is actual do this.

The beliefs in question are yours, and anticipation is about agent's design or behavior.

The reason the agent is happy is that it did the right thing. Merely doing the right thing has given the agent all the utility it could hope for.

This seems to be tacking a lot of anthropomorphic emotional reactions onto the agent's decision theory.

Imagine an agent that follows the decision theory of "Always take the first option presented." but has humanlike reactions to the outcome.

It will one box or two box depending on how the situation is described to it, but it will be happy if it gets the million dollars.

The process used to make choices need not be connected to the process used to evaluate preference.

This seems to be tacking a lot of anthropomorphic emotional reactions onto the agent's decision theory.

It may in some cases be inappropriate to anthropomorphize an agent. But anthropomorphization can be useful in other cases. My suggestion in the OP is to be used in the case where anthropomorphization seems useful.

Imagine an agent that follows the decision theory of "Always take the first option presented." but has humanlike reactions to the outcome.

This is a great example. Maybe I should have started with something like that to motivate the post.

Suppose that someone you cared about were acting like this. Let's suppose that, according to your decision theory, you should try to change the person to follow a different decision algorithm. One option is to consider them to be a baffling alien, whose actions you can predict, but whose thinking you cannot at all sympathize with.

However, if you care about them, you might want to view them in a way that encourages sympathy. You also probably want to interpret their psychology in a way that seems as human as possible, so that you can bring to bear the tools of psychology. Psychology, at this time, depends heavily on using our own human brains as almost-opaque boxes to model other neurologically similar humans. So your only hope of helping this person is to conceive of them in a way that seems more like a normal human. You need to anthropomorphize them.

In this case, I would probably first try to think of the person as a normal person who is being parasitized by an alien agent with this weird decision theory. I would focus on trying to remove the parasitic agent. The hope would be that the human has normal human decision-making mechanisms that were being overridden by the parasite.

Let me see if I understand your argument correctly: UDT works by converting all beliefs about facts into their equivalent value expressions (due to fact/value equivalence), and chooses the optimal program for maximizing expected utility according to those values.

So, if you were to program a robot such that it adheres to the decisions output by UDT, then this robot, when acting, can be viewed as simply adhering to a programmer-fed ruleset. That ruleset does not explicitly use desirability of any consequence as a desideratum when deciding what action to output, and the ruleset can be regarded as the robot's judgment of "what is right". Because it does "what is right" irrespective of the consequences (esp. in its particular location in time/space/world), its moral judgments match those of a deontologist.

Does that about get it right?

Does that about get it right?

I think that's about right. Your next question might be, "How does this make a UDT agent different from any other?" I address that question in this reply to Wei Dai.

Thanks! Turns out I correctly guessed your answer to that question too! (I noticed the distinction between the programmer's goals and [what the agent regards as] the agent's goals, but hadn't mentioned that explicitly in my summary.)

Doesn't sound too unreasonable to me... I'll think about it some more.

Edit: Do you think it would be a good idea to put (a modified version of) my summary at the top of your article?

Voted up for, among other things, actually explaining UDT in a way I could understand. Thanks! :-)

In a True Counterfactual Mugging, Omega would ask the agent to give up utility.

Doesn't this, like, trivially define what should be the correct decision? What's the point?

What's the point?

The point is, "the UDT agent cannot possibly satisfy this request." So I think we agree here (?).

You'd need to represent your problem statement in terms UDT understands, with the world program and strategy-controlled probabilities for its possible execution histories, and fixed utilities for each possible execution history. If you do that properly, you'll find that UDT acts correctly (otherwise, you haven't managed to correctly represent your problem statement...).

If you do that properly, you'll find that UDT acts correctly

Are you under the impression that I am saying that UDT acts incorrectly? I was explicit that I was suggesting no change to the UDT formalism. I was explicit that I was suggesting a way to anthropomorphize what the agent is thinking. Are you familiar with Dennett's notion of an intentional stance? This is like that. To suggest that we view the agent from a different stance is not to suggest that the agent acts differently.

ETA: I'm gathering that I should have been clearer that the so-called "true counterfactual mugging" is trivial or senseless when posed to a UDT agent. I'm a little surprised that I failed to make this clear, because it was the original thought that motivated the post. It's not immediately obvious to me how to make this clearer, so I will give it some thought.

You've got this in the post:

In a True Counterfactual Mugging, Omega would ask the agent to give up utility. Here we see that the UDT agent cannot possibly satisfy this request.

I'm not sure what you intended to say by that, but it sounds like "UDT agent will make the wrong decision", together with an opaque proposition that Omega offers "actual utility and not even expected utility", which it's not at all clear how to represent formally.

I'm not sure what you intended to say by that, but it sounds like "UDT agent will make the wrong decision",

No, that is not at all what I meant. That interpretation never occurred to me. I meant that the UDT agent cannot possibly give up the utility that Omega asks for in the previous sentence. Now that I understood how you misunderstood that part, I will edit it.

Well, isn't it a good thing that UDT won't give up utility to Omega? You can't take away utility on one side of the coin, and return it on the other, utility is global.

Well, isn't it a good thing that UDT won't give up utility to Omega?

Yes, of course it is. I'm afraid that I don't yet understand why you thought that I suggest otherwise.

You can't take away utility on one side of the coin, and return it on the other, utility is global.

Yes, that is why I said that the agent couldn't possibly satisfy Omega's request to give it utility.

You are attacking a position that I don't hold. But I'm not sure what position you're thinking of, so I don't know how to address the misunderstanding. You haven't made any claim that I disagree with in response to that paragraph.

It seems to me that you're looking for a way to model a deontologist.

And a necessary condition is that you follow a function that does not depend on states of the world. If you don't have any fixed principles, we can't call you a deontologist. You can call that UDT (I think I've seen the same thing called rule-utilitarianism.)

Is there a more complicated insight than that here?

It seems to me that you're looking for a way to model a deontologist.

I don't think so. I'm supposing that I'm reasonably comfortable with human deontologists, and I'm trying to use that familiarity to make intuitive sense of the behavior of a UDT agent.

Well, that's the way the post was phrased ("a UDT agent is a deontologist.")

But you could construct a UDT agent that doesn't behave anything like a human deontologist, who acts based upon a function that has nothing to do with rights or virtues or moral laws. That's why I think it's better understood as "All deontologists are UDT" instead of vice versa.

It's easier for me to understand an agent who acts on weird principles (such as those having nothing to do with rights or virtues or moral laws) than an agent who either

• thinks that all possible worlds are equally actual, or

• doesn't care more for what happens in the actual world than what happens in possible worlds.

So, if I were to think of deontologists as UDT agents, I would be moving them further away from comprehensibility.

What is the difference between (1) and (2)? Just an XML tag that the agent doesn't care about, but sticks onto one of the worlds it considers possible? (Why would it continue spending cycles to compute which world is actual, if it doesn't care?)

What is the difference between (1) and (2)? Just an XML tag that the agent doesn't care about, but sticks onto one of the worlds it considers possible?

Basically, yes. (2) is not a view that I advocate.

According to this view, the agent cares about only the actual world.

A decision-making algorithm can only care about things accessible in its mind. The "actual world" is not one of them.

Although how does it connect with a phrase later in the paragraph?

It doesn't care at all about the actual consequences that result from its action.

A decision-making algorithm can only care about things accessible in its mind. The "actual world" is not one of them.

The purpose of this post is not to defend realism, and I think that it would take me far afield to do so now. For example, on my view, the agent is not identical to its decision-making algorithm, if that is to be construed as saying that the agent is purely an abstract mathematical entity. Rather, the agent is the actual implementation of that algorithm. The universe is not purely an algorithm. It is an implementation of that algorithm. Not all algorithms are in fact run.

I haven't given any reasons for the position that I just stated. But I hope that you can recognize it as a familiar position, however incoherent it seems to you. Do you need any more explanation to understand the viewpoint that I'm coming from in the post?

The actual world is not epistemically accessible to the agent. It's a useless concept for its decision-making algorithm. An ontology (logic of actions and observations) that describes possible worlds and in which you can interpret observations, is useful, but not the actual world.

An ontology is not a "logic of actions and observations" as I am using the term. I am using it in the sense described in the Stanford Encyclopedia of Philosophy.

At any rate, what I'm calling the ontology is not part of the decision theory. I consider different ontologies that the agent might think in terms of, but I am explicit that I am not trying to change how the UDT itself works when I write, "I suggest an alternative conception of a UDT agent, without changing the UDT formalism."

One way (the usual way?) to think of an agent running Updateless Decision Theory is to imagine that the agent always cares about all possible worlds according to how probable those worlds seemed when the agent's source code was originally written.

Seemed to who? And what about the part where the probabilities are controlled by agent's decisions (as estimated by mathematical intuition)?

Seemed to who?

To the agent's builders.

ETA: I make that clear later in the post, but I'll add it to the intro paragraph.

And what about the part where the probabilities are controlled by agent's decisions?

I'm not sure what you mean. What I'm describing as coded into the agent "from birth" is Wei Dai's function P, which takes an output string Y as its argument (using subscript notation in his post).

ETA: Sorry, that is not right. To be more careful, I mean the "mathematical intuition" that takes in an input X and returns such a function P. But P isn't controlled by the agent's decisions.

ETA2: Gah. I misremembered how Wei Dai used his notation. And when I went back to the post to answer your question, I skimmed to quickly and misread.

So, final answer, when I say that "the agent always cares about all possible worlds according to how probable those worlds seemed to the agent's builders when they wrote the agent's source code", I'm talking about the "preference vector" that Wei Dai denotes by "" and which he says "defines its preferences on how those programs should run."

I took him to be thinking of these entries Ei as corresponding to probabilities because of his post What Are Probabilities, Anyway?, where he suggests that "probabilities represent how much I care about each world".

ETA3: Nope, this was another misreading on my part. Wei Dai does not say that is a vector of preferences, or anything like that. He says that it is an input to a utility function U, and that utility function is what "defines [the agent's] preferences on how those programs should run". So, what I gather very tentatively at this point is that the probability of each possible world is baked into the utility function U.

I took him to be thinking of these entries Ei as corresponding to probabilities because of his post What Are Probabilities, Anyway?, where he suggests that "probabilities represent how much I care about each world".

Do you see that these E's are not intended to be interpreted as probabilities here, and so "probabilities of possible worlds are fixed at the start" remark at the beginning of your post is wrong?

Do you see that these E's are not intended to be interpreted as probabilities here,

Yes.

and so "probabilities of possible worlds are fixed at the start" remark at the beginning of your post is wrong?

I realize that my post applies only to the kind of UDT agent that Wei Dai talks about when he discusses what probabilities of possible worlds are. See the added footnote.

I realize that my post applies only to the kind of UDT agent that Wei Dai talks about when he discusses what probabilities of possible worlds are. See the added footnote.

It's still misinterpretation of Wei Dai's discussion of probability. What you described is not UDT, and not even a decision theory: say, what U() is for? It's not utility of agent's decision. When Wei Dai discusses probability in the post you linked, he still means it in the same sense as is used in decision theories, but makes informal remarks about what those values, say, P_Y(...), seem to denote. From the beginning of the post:

I wrote that probabilities can be thought of as weights that we assign to possible world-histories.

Weights assigned to world-histories, not worlds. Totally different. (Although Wei Dai doesn't seem to consistently follow the distinction in terminology himself, it begins to matter when you try to express things formally.)

Edit: this comment is wrong, see correction here.

It's still misinterpretation of Wei Dai's discussion of probability. What you described is not UDT, and not even a decision theory

I have added a link (pdf) to a complete description of what a UDT algorithm is. I am confident that there are no "misinterpretations" there, but I would be grateful if you pointed out any that you perceive.

I believe it is an accurate description of UDT as presented in the original post, although incomplete knowledge about P_i can be accommodated without changing the formalism, by including all alternatives (completely described this time) enabled by available knowledge about the corresponding world programs, in the list {P_i} (which is the usual reading of "possible world"). Also note that in this post Wei Dai corrected the format of the decisions from individual input/output instances to global strategy-selection.

incomplete knowledge about P_i can be accommodated without changing the formalism, by including all alternatives (completely described this time) enabled by available knowledge about the corresponding world programs, in the list {P_i}

How important is it that the list {P_i} be finite? If P_i is one of the programs in our initial list that we're uncertain about, couldn't there be infinitely many alternative programs P_i1, P_i2, . . . behind whatever we know about P_i?

I was thinking that incomplete knowledge about the P_i could be captured (within the formalism) with the mathematical intuition function. (Though it would then make less sense to call it a specifically mathematical intuition.)

Also note that in this post Wei Dai corrected the format of the decisions from individual input/output instances to global strategy-selection.

I've added a description of UDT1.1 to my pdf.

In principle, it doesn't matter, because you can represent a countable list of programs as a single program that takes an extra parameter (but then you'll need to be more careful about the notion of "execution histories"), and more generally you can just include all possible programs in the list and express the level to which you care about the specific programs in the way mathematical intuition ranks their probability and the way utility function ranks their possible semantics.

On execution histories: note that a program is a nice finite inductive definition of how that program behaves, while it's unclear what an "execution history" is, since it's an infinite object and so it needs to be somehow finitely described. Also, if, as in the example above you have the world program taking parameters (e.g. a universal machine that takes a Goedel number of a world program as parameter), you'll have different executions depending on parameter. But if you see a program as a set of axioms for a logical theory defining the program's behavior, then execution histories can just be different sets of axioms defining program's behavior in a different way. These different sets of axioms could describe the same theories, or different theories, and can include specific facts about what happens during program execution on so and so parameters. Equivalence of such theories will depend on what you assume about the agent (i.e. if you add different assumptions about the agent to the theories, you get different theories, and so different equivalences), which is what mathematical intuition is trying to estimate.

I've added a description of UDT1.1 to my pdf.

It's not accurate to describe strategies as mappings f: X->Y. A strategy can be interactive: it takes input, produces an output, and then environment can prepare another input depending on this output, and so on. Think normalization in lambda calculus. So, the agent's strategy is specified by a program, but generally speaking this program is untyped.

Let's assume that there is a single world program, as described here. Then, if A is the agent's program known to the agent, B is one possible strategy for that program, given in form of a program, X is the world program known to the agent, and Y is one of the possible world execution histories of X given that A behaves like B, again given in form of a program, then mathematical intuition M(B,Y) returns the probability that the statement (A~B => X~Y) is true, where A~B stands for "A behaves like B", and similarly for X and Y. (This taps into the ambient control analysis of decision theory.)

It's not accurate to describe strategies as mappings f: X->Y.

I'm following this paragraph from Wei Dai's post on UDT1.1:

[U]pon receiving input X, [the agent] would put that input aside and first iterate through all possible input/output mappings that it could implement and determine the logical consequence of choosing each one upon the executions of the world programs that it cares about. After determining the optimal S that best satisfies its preferences, it then outputs S(X).

So, "input/output mappings" is Wei Dai's language. Does he not mean mappings between the set of possible inputs and the set of possible outputs?

A strategy can be interactive: it takes input, produces an output, and then environment can prepare another input depending on this output, and so on.

It seems to me that this could be captured by the right function f: X -> Y. The set I of input-output mappings could be a big collections of GLUTs. Why wouldn't that suffice for Wei Dai's purposes?

ETA: And it feels weird typing out "Wei Dai" in full all the time. But the name looks like it might be Asian to me, so I don't know which part is the surname and which is the given name.

And it feels weird typing out "Wei Dai" in full all the time. But the name looks like it might be Asian to me, so I don't know which part is the surname and which is the given name.

I've been wondering why people keep using my full name around here. Yes, the name is Chinese, but since I live in the US I follow the given-name-first convention. Feel free to call me "Wei".

No, you can't represent an interactive strategy by a single input to output mapping. That post made a step in the right direction, but stopped short of victory :-). But I must admit, I forgot about that detail in the second post, so you've correctly rendered Wei's algorithm, although using untyped strategies would further improve on that.

No, you can't represent an interactive strategy by a single input to output mapping.

Why not?

BTW, in UDT1.1 (as well as UDT1), "input" consists of the agent's entire memory of the past as well as its current perceptions. Thought I'd mention that in case there's a misunderstanding there.

... okay, this question allowed me to make a bit of progress. Taking as a starting point the setting of this comment (that we are estimating the probability of (A~B => X~Y) being true, where A and X are respectively agent's and environment's programs, B and Y programs representing agent's strategy and outcome for environment), and the observations made here and here, we get a scheme for local decision-making.

Instead of trying to decide the whole strategy, we can just decide the local action. Then, the agent program, and "input" consisting of observations and memories, together make up the description of where the agent is in the environment, and thus where its control will be applied. The action that the agent considers can then be local, just something the agent does at this very moment, and the alternatives for this action are alternative statements about the agent: thus, instead of considering a statement A~B for agent's program A and various whole strategies B, we consider just predicates like action1(A) and action2(A) which assert A to choose action 1 or action 2 in this particular situation, and which don't assert anything else about its behavior in other situations or on other counterfactuals. Taking into account other actions that the agent might have to make in the past or in the future happens automatically, because the agent works with complete description of environment, even if under severe logical uncertainty. Thus, decision-making happens "one bit at a time", and the agent's strategy mostly exists in the environment, not under in any way direct control by the agent, but still controlled in the same sense everything in the environment is.

Thus, in the simplest case of a binary local decision, mathematical intuition would only take as explicit argument a single bit, which indicates what assertion is being made about [agent's program together with memory and observations], and that is all. No maps, no untyped strategies.

This solution was unavailable to me when I thought about explicit control, because the agent has to coordinate with itself, rely on what it can in fact decide in other situations and not what it should optimally decide, but it's a natural step in the setting of ambient control, because the incorrect counterfactuals are completely banished out of consideration, and environment describes what the agent will actually do on other occasions.

Going back to the post explicit optimization of global strategy, the agent doesn't need to figure out the global strategy! Each of the agent copies is allowed to make the decision locally, while observing the other copy as part of the environment (in fact, it's the same problem as "general coordination problem" I described on the DT list, back when I was clueless about this approach).

Each of the agent copies is allowed to make the decision locally, while observing the other copy as part of the environment

Well, that was my approach in UDT1, but then I found a problem that UDT1 apparently can't solve, so I switched to optimizing over the global strategy (and named that UDT1.1).

Can you re-read explicit optimization of global strategy and let me know what you think about it now? What I called "logical correlation" (using Eliezer's terminology) seems to be what you call "ambient control". The point of that post was that it seems an insufficiently powerful tool for even two agents with the same preferences to solve the general coordination problem amongst themselves, if they only explicitly optimize the local decision and depend on "logical correlation"/"ambient control" to implicitly optimize the global strategy.

If you think there is some way to get around that problem, I'm eager to hear it.

So far as I can see, your mistake was assuming "symmetry", and dropping probabilities. There is no symmetry, only one of the possibilities is what will actually happen, and the other (which I'm back to believing since the last post on DT list) is inconsistent, though you are unlikely to be able to actually prove any such inconsistency. You can't say that since (S(1)=A => S(2)=B) therefore (S(1)=B => S(2)=A). One of the counterfactuals is inconsistent, so if S(1) is in fact A, then S(1)=B implies anything. But what you are dealing with are probabilities of these statements (which possibly means proof search schemes trying to prove these statements and making a certain number of elementary assumptions, the number that works as the length of programs in universal probability distribution). These probabilities will paint a picture of what you expect the other copy to do, depending on what you do, and this doesn't at all have to be symmetric.

If there is to be no symmetry between "S(1)=A => S(2)=B" and "S(1)=B => S(2)=A", then something in the algorithm has to treat the two cases differently. In UDT1 there is no such thing to break the symmetry, as far as I can tell, so it would treat them symmetrically and fail on the problem one way or another. Probabilities don't seem to help since I don't see why UDT1 would assign them different probabilities.

If you have an idea how the symmetry might be broken, can you explain it in more detail?

I think that Vladimir is right if he is saying that UDT1 can handle the problem in your Explicit Optimization of Global Strategy post.

With your forbearance, I'll set up the problem in the notation of my write-up of UDT1.

There is only one world-program P in this problem. The world-program runs the UDT1 algorithm twice, feeding it input "1" on one run, and feeding it input "2" on the other run. I'll call these respective runs "Run1" and "Run2".

The set of inputs for the UDT1 algorithm is X = {1, 2}.

The set of outputs for the UDT1 algorithm is Y = {A, B}.

There are four possible execution histories for P:

• E, in which Run1 outputs A, Run2 outputs A, and each gets \$0.

• F, in which Run1 outputs A, Run2 outputs B, and each gets \$10.

• G, in which Run1 outputs B, Run2 outputs A, and each gets \$10.

• H, in which Run1 outputs B, Run2 outputs B, and each gets \$0.

The utility function U for the UDT1 algorithm is defined as follows:

• U(E) = 0.

• U(F) = 20.

• U(G) = 20.

• U(H) = 0.

Now we want to choose a mathematical intuition function M so that Run1 and Run2 don't give the same output. This mathematical intuition function does have to satisfy a couple of constraints:

• For each choice of input X and output Y, the function M(X, Y, –) must be a normalized probability distribution on {E, F, G, H}.

• The mathematical intuition needs to meet certain minimal standards to deserve its name. For example, we need to have M(1, B, E) = 0. The algorithm should know that P isn't going to execute according to E if the algorithm returns B on input 1.

But these constraints still leave us with enough freedom in how we set up the mathematical intuition. In particular, we can set

• M(1, A, F) = 1, and all other values of M(1, A, –) equal to zero;

• M(1, B, H) = 1, and all other values of M(1, B, –) equal to zero;

• M(2, A, E) = 1, and all other values of M(2, A, –) equal to zero;

• M(2, B, F) = 1, and all other values of M(2, B, –) equal to zero.

Thus, in Run1, the algorithm computes that, if it outputs A, then execution history F would transpire, so the agent would get utility U(F) = 20. But if Run1 were to output B, then H would transpire, yielding utility U(H) = 0. Therefore, Run1 outputs A.

Similarly, Run2 computes that its outputting A would result in E, with utility 0, while outputting B would result in F, with utility 20. Therefore, Run2 outputs B.

Hence, execution history F transpires, and the algorithm reaps \$20.

ETA: And, as a bonus, this mathematical intuition really makes sense. For, suppose that we held everything equal, except that we do some surgery so that Run1 outputs B. Since everything else is equal, Run2 is still going to output B. And that really would put us in history H, just as Run1 predicted when it evaluated M(1, B, H) = 1.

That's cheating, you haven't explained anything, you've just chosen the strategies and baptized them with mathematical intuition magically knowing them from the start.

I'm not sure what you mean by "cheating". Wei Dai doesn't claim to have explained where the mathematical intuition comes from, and I don't either. The point is, I could build a UDT1 agent with that mathematical intuition, and the agent would behave correctly if it were to encounter the scenario that Wei describes. How I came up with that mathematical intuition is an open problem. But the agent that I build with it falls under the scope of UDT1. It is not necessary to pass to UDT1.1 to find such an agent.

I'm giving an existence proof: There exist UDT1 agents that perform correctly in Wei's scenario. Furthermore, the mathematical intuition used by the agent that I exhibit evaluates counterfactuals in a reasonable way (see my edit to the comment).

Wei Dai doesn't claim to have explained where the mathematical intuition comes from, and I don't either.

There is a difference between not specifying the structure of an unknown phenomenon for which we still have no explanation, and assigning the phenomenon an arbitrary structure without giving an explanation. Even though you haven't violated the formalism, mathematical intuition is not supposed to magically rationalize your (or mine) conclusions.

How I came up with that mathematical intuition is an open problem.

No it's not, you've chosen it so that it "proves" what we believe to be a correct conclusion.

I'm giving an existence proof: There exist UDT1 agents that perform correctly in Wei's scenario.

Since you can force the agent to pick any of the available actions by appropriately manipulating its mathematical intuition, you can "prove" that there is an agent that performs correctly in any given situation, so long as you can forge its mathematical intuition for every such situation. You can also "prove" that there is an agent that makes the worst possible choice, in exactly the same way.

How I came up with that mathematical intuition is an open problem.

No it's not, you've chosen it so that it "proves" what we believe to be a correct conclusion.

This is kind of interesting. In Wei's problem, I believe that I can force a winning mathematical intuition with just a few additional conditions, none of which assume that we know the correct conclusion. They seem like reasonable conditions to me, though maybe further reflection will reveal counterexamples.

Using my notation from this comment, we have to find right-hand values for the following 16 equations.

``````M(1, A, E) = .   M(1, A, F) = .   M(1, A, G) = .   M(1, A, H) = .

M(1, B, E) = .   M(1, B, F) = .   M(1, B, G) = .   M(1, B, H) = .

M(2, A, E) = .   M(2, A, F) = .   M(2, A, G) = .   M(2, A, H) = .

M(2, B, E) = .   M(2, B, F) = .   M(2, B, G) = .   M(2, B, H) = .
``````

In addition to the conditions that I mentioned in that comment, I add the following,

• Binary: Each probability distribution M(X, Y, –) is binary. That is, the mathematical intuition is certain about which execution history would follow from a given output on a given input.

• Accuracy: The mathematical intuition, being certain, should be accurate. That is, if the agent expects a certain amount of utility when it produces its output, then it should really get that utility.

(Those both seem sorta plausible in such a simple problem.)

• Counterfactual Accuracy: The mathematical intuition should behave well under counterfactual surgery, in the sense that I used in the edit to the comment linked above. More precisely, suppose that the algorithm outputs Yi on input Xi for all i. Suppose that, for a single fixed value of j, we surgically interfered with the algorithm's execution to make it output Y'j instead of Yj on input Xj. Let E' be the execution history that would result from this. Then we ought to have that M(Xj, Y'j, E') = 1.

I suspect that the counterfactual accuracy condition needs to be replaced with something far more subtle to deal with other problems, even in the binary case.

Nonetheless, it seems interesting that, in this case, we don't need to use any prior knowledge about which mathematical intuitions win.

I'll proceed by filling in the array above entry-by-entry. We can fill in half the entries right away from the definitions of the execution histories:

``````M(1, A, E) = .   M(1, A, F) = .   M(1, A, G) = 0   M(1, A, H) = 0

M(1, B, E) = 0   M(1, B, F) = 0   M(1, B, G) = .   M(1, B, H) = .

M(2, A, E) = .   M(2, A, F) = 0   M(2, A, G) = .   M(2, A, H) = 0

M(2, B, E) = 0   M(2, B, F) = .   M(2, B, G) = 0   M(2, B, H) = .
``````

Now we have to consider cases. Starting with the upper-left corner, the value of M(1, A, E) will be either 0 or 1.

Case I: Suppose that M(1, A, E) = 0. Normalization forces M(1, A, F) = 1:

``````M(1, A, E) = 0   M(1, A, F) = 1   M(1, A, G) = 0   M(1, A, H) = 0

M(1, B, E) = 0   M(1, B, F) = 0   M(1, B, G) = .   M(1, B, H) = .

M(2, A, E) = .   M(2, A, F) = 0   M(2, A, G) = .   M(2, A, H) = 0

M(2, B, E) = 0   M(2, B, F) = .   M(2, B, G) = 0   M(2, B, H) = .
``````

Now, in the second row, the value of M(1, B, G) will be either 0 or 1.

Case I A: Suppose that M(1, B, G) = 0. Normalization forces M(1, B, H) = 1:

``````M(1, A, E) = 0   M(1, A, F) = 1   M(1, A, G) = 0   M(1, A, H) = 0

M(1, B, E) = 0   M(1, B, F) = 0   M(1, B, G) = 0   M(1, B, H) = 1

M(2, A, E) = .   M(2, A, F) = 0   M(2, A, G) = .   M(2, A, H) = 0

M(2, B, E) = 0   M(2, B, F) = .   M(2, B, G) = 0   M(2, B, H) = .
``````

We have filled in enough entries to see that Run1 will output A. (Recall that U(F) = 20 and U(H) = 0.) Thus, if Run2 outputs A, then E will happen, not G. Similarly, if Run2 outputs B, then F will happen, not H. This allows us to complete the mathematical intuition function:

``````M(1, A, E) = 0   M(1, A, F) = 1   M(1, A, G) = 0   M(1, A, H) = 0

M(1, B, E) = 0   M(1, B, F) = 0   M(1, B, G) = 0   M(1, B, H) = 1

M(2, A, E) = 1   M(2, A, F) = 0   M(2, A, G) = 0   M(2, A, H) = 0

M(2, B, E) = 0   M(2, B, F) = 1   M(2, B, G) = 0   M(2, B, H) = 0
``````

Under this mathematical intuition function, Run1 outputs A and Run2 outputs B. Moreover, this function meets the counterfactual accuracy condition. Note that this function wins.

Case I B: Suppose that M(1, B, G) = 1 in the second row. Normalization forces M(1, B, H) = 0:

``````M(1, A, E) = 0   M(1, A, F) = 1   M(1, A, G) = 0   M(1, A, H) = 0

M(1, B, E) = 0   M(1, B, F) = 0   M(1, B, G) = 1   M(1, B, H) = 0

M(2, A, E) = .   M(2, A, F) = 0   M(2, A, G) = .   M(2, A, H) = 0

M(2, B, E) = 0   M(2, B, F) = .   M(2, B, G) = 0   M(2, B, H) = .
``````

In this case, Run1 will need to use a tie-breaker, because it predicts utility 20 from both outputs. There are two cases, one for each possible tie-breaker.

Case I B i: Suppose that the tie-breaker leads Run1 to output A. If Run2 outputs A, then E will happen, not G. And if Run2 outputs B, then F will happen, not H. This gives us a complete mathematical intuition function:

``````M(1, A, E) = 0   M(1, A, F) = 1   M(1, A, G) = 0   M(1, A, H) = 0

M(1, B, E) = 0   M(1, B, F) = 0   M(1, B, G) = 1   M(1, B, H) = 0

M(2, A, E) = 1   M(2, A, F) = 0   M(2, A, G) = 0   M(2, A, H) = 0

M(2, B, E) = 0   M(2, B, F) = 1   M(2, B, G) = 0   M(2, B, H) = 0
``````

Hence, Run2 will output B. But this function fails the counterfactual accuracy condition. It predicts execution history G if Run1 were to output B, when in fact the execution history would be H. Thus we throw out this function.

Case I B ii: Suppose that the tie-breaker leads Run1 to output B. Then, similar to Case I B i, the resulting function fails the counterfactual accuracy test. (Run2 will output A. The resulting function predicts history F if Run1 were to output A, when in fact the history would be E.) Thus we throw out this function.

Therefore, in Case I, all functions either win or are ineligible.

Case II: Suppose that M(1, A, E) = 1. Normalization forces M(1, A, F) = 0, getting us to

``````M(1, A, E) = 1   M(1, A, F) = 0   M(1, A, G) = 0   M(1, A, H) = 0

M(1, B, E) = 0   M(1, B, F) = 0   M(1, B, G) = .   M(1, B, H) = .

M(2, A, E) = .   M(2, A, F) = 0   M(2, A, G) = .   M(2, A, H) = 0

M(2, B, E) = 0   M(2, B, F) = .   M(2, B, G) = 0   M(2, B, H) = .
``````

Now, in the second row, the value of M(1, B, G) will be either 0 or 1.

Case II A: Suppose that M(1, B, G) = 0. Normalization forces M(1, B, H) = 1:

``````M(1, A, E) = 1   M(1, A, F) = 0   M(1, A, G) = 0   M(1, A, H) = 0

M(1, B, E) = 0   M(1, B, F) = 0   M(1, B, G) = 0   M(1, B, H) = 1

M(2, A, E) = .   M(2, A, F) = 0   M(2, A, G) = .   M(2, A, H) = 0

M(2, B, E) = 0   M(2, B, F) = .   M(2, B, G) = 0   M(2, B, H) = .
``````

In this case, Run1 will need to use a tie-breaker, because it predicts utility 0 from both outputs. There are two cases, one for each possible tie-breaker.

Case II A i: Suppose that the tie-breaker leads Run1 to output A. If Run2 outputs A, then E will happen, not G. And if Run2 outputs B, then F will happen, not H. This gives us a complete mathematical intuition function:

``````M(1, A, E) = 1   M(1, A, F) = 0   M(1, A, G) = 0   M(1, A, H) = 0

M(1, B, E) = 0   M(1, B, F) = 0   M(1, B, G) = 0   M(1, B, H) = 1

M(2, A, E) = 1   M(2, A, F) = 0   M(2, A, G) = 0   M(2, A, H) = 0

M(2, B, E) = 0   M(2, B, F) = 1   M(2, B, G) = 0   M(2, B, H) = 0
``````

Hence, Run2 will output B. But this function fails the accuracy condition. Run1 expects utility 0 for its output, when in fact it will get utility 20. Thus we throw out this function.

Case II A ii: Suppose that the tie-breaker leads Run1 to output B. If Run2 outputs A, then G will happen, not E. And if Run2 outputs B, then H will happen, not F. This gives us a complete mathematical intuition:

``````M(1, A, E) = 1   M(1, A, F) = 0   M(1, A, G) = 0   M(1, A, H) = 0

M(1, B, E) = 0   M(1, B, F) = 0   M(1, B, G) = 0   M(1, B, H) = 1

M(2, A, E) = 0   M(2, A, F) = 0   M(2, A, G) = 1   M(2, A, H) = 0

M(2, B, E) = 0   M(2, B, F) = 0   M(2, B, G) = 0   M(2, B, H) = 1
``````

Hence, Run2 will output A. But this function fails the accuracy condition. Run1 expects utility 0 for its output, when in fact it will get utility 20. Thus we throw out this function.

Case II B: Suppose that M(1, B, G) = 1. Normalization forces M(1, B, H) = 0:

``````M(1, A, E) = 1   M(1, A, F) = 0   M(1, A, G) = 0   M(1, A, H) = 0

M(1, B, E) = 0   M(1, B, F) = 0   M(1, B, G) = 1   M(1, B, H) = 0

M(2, A, E) = .   M(2, A, F) = 0   M(2, A, G) = .   M(2, A, H) = 0

M(2, B, E) = 0   M(2, B, F) = .   M(2, B, G) = 0   M(2, B, H) = .
``````

We have filled in enough entries to see that Run1 will output B. (Recall that U(E) = 0 and U(G) = 20.) Thus, if Run2 outputs A, then G will happen, not E. Similarly, if Run2 outputs B, then H will happen, not F. This allows us to complete the mathematical intuition function:

``````M(1, A, E) = 1   M(1, A, F) = 0   M(1, A, G) = 0   M(1, A, H) = 0

M(1, B, E) = 0   M(1, B, F) = 0   M(1, B, G) = 1   M(1, B, H) = 0

M(2, A, E) = 0   M(2, A, F) = 0   M(2, A, G) = 1   M(2, A, H) = 0

M(2, B, E) = 0   M(2, B, F) = 0   M(2, B, G) = 0   M(2, B, H) = 1
``````

Under this mathematical intuition function, Run1 outputs B and Run2 outputs A. Moreover, this function meets the counterfactual accuracy condition. Note that this function wins.

Therefore, all cases lead to mathematical intuitions that either win or are ineligible.

ETA: And I just discovered that there's a length-limit on comments.

Do you think your idea is applicable to multi-player games, which is ultimately what we're after? (I don't see how to do it myself.) Take a look at this post, which I originally wrote for another mailing list:

In http://lesswrong.com/lw/1s5/explicit_optimization_of_global_strategy_fixing_a/ I gave an example of a coordination game for two identical agents with the same (non-indexical) preferences and different inputs. The two agents had to choose different outputs in order to maximize their preferences, and I tried to explain why it seemed to me that they couldn't do this by a logical correlation type reasoning alone.

A harder version of this problem involves two agents with different preferences, but are otherwise identical. For simplicity let's assume they both care only about what happens in one particular world program (and therefore have no uncertainty about each other's source code). This may not be the right way to frame the question, which is part of my confusion. But anyway, let the choices be C and D, and consider this payoff matrix (and suppose randomized strategies are not possible):

``````0,0  4,5
5,4  0,0
``````

Here's the standard PD matrix for comparison:

``````3,3  0,5
5,0  1,1
``````

Nesov's intuitions at http://lesswrong.com/lw/1vv/the_blackmail_equation/1qk9 make sense to me in this context. It seems that if these two agents are to achieve the 4,5 or 5,4 outcome, it has to be through some sort of "jumbles of wires" consideration, since there is no "principled" way to decide between the two, as far as I can tell. But what is that reasoning exactly? Does anyone understand acausal game theory (is this a good name?) well enough to walk me through how these two agents might arrive at one of the intuitively correct answers (and also show that the same type of reasoning gives a intuitively correct answer for PD)?

If my way of framing the question is not a good one, I'd like to see any kind of worked-out example in this vein.

It's tempting to take a step back and consider the coordination game from the point of view of the agent before-observation, as it gives a nice equivalence between the copies, control over the consequences for both copies from a common source. This comes with a simple algorithm, an actual explanation. But as I suspect you intended to communicate in this comment, this is not very interesting, because it's not a general case: in two-player games the other player is not your copy, and wasn't one any time previous. But if we try to consider the actions of agent after-observation, of the two copies diverged, there seems to be no nice solution anymore.

It's clear how the agent before-observations controls the copies after, and so how its decisions about the strategy of reacting to future observations control both copies, coordinate them. It's far from clear how a copy that received one observation can control a copy that received the other observation. Parts control the whole, but not conversely. Yet the coordination problem could be posed about two agents that have nothing in common, and we'd expect there to be a solution to that as well. Thus I expect the coordination problem with two copies to have a local solution, apart from the solution of deciding in advance, as you describe in the post.

My comment to which you linked is clearly flawed in at least one respect: it assumes that to control a structure B with agent A, B has to be defined in terms of A. This is still an explicit control mindset, what I call acausal control, but it's wrong, not as general as ambient control, where you are allowed to discover new dependencies, or UDT, where the discovery of new dependencies is implicit in mathematical intuition.

It'll take much better understanding of theories of consequences, the process of their exploration, preference defined over them, to give specific examples, and I don't expect these examples to be transparent (but maybe there is a simple proof that the decisions will be correct, that doesn't point out the specific details of the decision-making process).

Do you think your idea is applicable to multi-player games, which is ultimately what we're after? (I don't see how to do it myself.) Take a look at this post, which I originally wrote for another mailing list:

In http://lesswrong.com/lw/1s5/explicit_optimization_of_global_strategy_fixing_a/ I gave an example of a coordination game for two identical agents with the same (non-indexical) preferences and different inputs. The two agents had to choose different outputs in order to maximize their preferences, and I tried to explain why it seemed to me that they couldn't do this by a logical correlation type reasoning alone.

I think that there may have been a communication failure here. The comment that you're replying to is specifically about that exact game, the one in your post Explicit Optimization of Global Strategy (Fixing a Bug in UDT1). The communication failure is my fault, because I had assumed that you had been following along with the conversation.

Here is the relevant context:

In this comment, I re-posed your game from the "explicit optimization" post in the notation of my write-up of UDT. In that comment, I gave an example of a mathematical intuition such that a UDT1 agent with that mathematical intuition would win the game.

In reply, Vladimir pointed out that the real problem is not to show that there exists a winning mathematical intuition. Rather, the problem is to give a general formal decision procedure that picks out a winning mathematical intuition. Cooking up a mathematical intuition that "proves" what I already believe to be the correct conclusion is "cheating".

The purpose of the comment that you're replying to was to answer Vladimir's criticism. I show that, for this particular game (the one in your "explicit optimization" post), the winning mathematical intuitions are the only ones that meet certain reasonable criteria. The point is that these "reasonable criteria" do not involve any assumption about what the agent should do in the game.

Actually, I had been following your discussion with Nesov, but I'm not sure if your comment adequately answered his objection. So rather than commenting on that, I wanted to ask whether your approach of using "reasonable criteria" to narrow down mathematical intuitions can be generalized to deal with the harder problem of multi-player games. (If it can't, then perhaps the discussion is moot.)

I see. I misunderstood the grandparent to be saying that your "explicit optimization" LW post had originally appeared on another mailing list, and I thought that you were directing me to it to see what I had to say about the game there. I was confused because this whole conversation already centered around that very game :).

I show that, for this particular game (the one in your "explicit optimization" post), the winning mathematical intuitions are the only ones that meet certain reasonable criteria.

(1) Which one of them will actually be given? (2) If there is no sense in which some of these "reasonable" conclusions are better than each other, why do you single them out, rather than mathematical intuitions expressing uncertainty about the outcomes that would express the lack of priority of some of these outcomes over others?

I don't find the certainty of conclusions a reasonable assumption, in particular because, as you can see, you can't unambiguously decide which of the conclusions is the right one, and so can't the agent.

(1) Which one of them will actually be given?

I claim to be giving, at best, a subset of "reasonable criteria" for mathematical intuition functions. Any UDT1-builder who uses a superset of these criteria, and who has enough decision criteria to decide which UDT1 agent to write, will write an agent who wins Wei's game. In this case, it would suffice to have the criteria I mentioned plus a lexicographic tie-breaker (as in UDT1.1). I'm not optimistic that that will hold in general.

(I also wouldn't be surprised to see an example showing that my "counterfactual accuracy" condition, as stated, rules out all winning UDT1 algorithms in some other game. I find it pretty unlikely that it suffices to deal with mathematical counterfactuals in such a simple way, even given the binary certainty and accuracy conditions.)

My point was only that the criteria above already suffice to narrow the field of options for the builder down to winning options. Hence, whatever superset of these criteria the builder uses, this superset doesn't need to include any knowledge about which possible UDT1 agent would win.

(2) If there is no sense in which some of these "reasonable" conclusions are better than each other, why do you single them out, rather than mathematical intuitions expressing uncertainty about the outcomes that would express the lack of priority of some of these outcomes over others?

I don't follow. Are you suggesting that I could just as reasonably have made it a condition of any acceptable mathematical intuition function that M(1, A, E) = 0.5 ?

I don't find the certainty of conclusions a reasonable assumption, in particular because, as you can see, you can't unambiguously decide which of the conclusions is the right one, and so can't the agent.

If I (the builder/writer) really couldn't decide which mathematical intuition function to use, then the agent won't come to exist in the first place. If I can't choose among the two options that remain after I apply the described criteria, then I will be frozen in indecision, and no agent will get built or written. I take it that this is your point.

But if I do have enough additional criteria to decide (which in this case could be just a lexicographic tie-breaker), then I don't see what is unreasonable about the "certainty of conclusions" assumption for this game.

If I (the builder/writer) really couldn't decide which mathematical intuition function to use, then the agent won't come to exist in the first place.

You don't pick the output of mathematical intuition in a particular case, mathematical intuition is a general algorithm that works based on world programs, outcomes, and your proposed decisions. It's computationally intensive, its results are not specified in advance based on intuition, on the contrary the algorithm is what stands for intuition. With more resources, this algorithm will produce different probabilities, as it comes to understand the problem better. And you just pick the algorithm. What you can say about its outcome is a matter of understanding the requirements for such general algorithm, and predicting what it must therefore compute. Absolute certainty of the algorithm, for example, would imply that the algorithm managed to logically infer that the outcome would be so and so, and I don't see how it's possible to do that, given the problem statement. If it's unclear how to infer what will happen, then mathematical intuition should be uncertain (but it can know something to tilt the balance one way a little bit, perhaps enough to decide the coordination problem!)

Okay, I understand you to be saying this:

There is a single ideal mathematical intuition, which, given a particular amount of resources, and a particular game, determines a unique function M mapping {inputs} x {outputs} x {execution histories} --> [0,1] for a UDT1 agent in that game. This ideal mathematical intuition (IMI) is defined by the very nature of logical or mathematical inference under computational limitation. So, in particular, it's not something that you can talk about choosing using some arbitrary tie-breaker like lexicographic order.

Now, maybe the IMI requires that the function M be binary in some particular game with some particular amount of resources. Or maybe the IMI requires a non-binary function M for all amounts of computational resources in that game. Unless you can explain exactly why the IMI requires a binary function M for this particular game, you haven't really made progress on the kinds of questions that we're interested in.

Is that right?

Is that right?

More or less. Of course there is no point in going for a "single" mathematical intuition, but the criteria for choosing one shouldn't be specific to a particular game. Mathematical intuition primarily works with the world program, trying to estimate how plausible it is that this world program will be equivalent to a given history definition, under the condition that the agent produces given output.

Let me see if I understand your point. Are you saying the following?

Some UDT1 agents perform correctly in the scenario, but some don't. To not be "cheating", you need to provide a formal decision theory (or at least make some substantial progress towards providing one) that explains why the agent's builder would choose to build one of the UDT1 agents that do perform correctly.

Not quite. UDT is not an engineering problem, it's a science problem. There is a mystery in what mathematical intuition is supposed to be, not just a question of tackling it on. The current understanding allows to instantiate incorrect UDT agents, but that's a failure of understanding, not a problem with UDT agents. By studying the setting more, we'll learn more about what mathematical intuition is, which will show some of the old designs incorrect.

You say "Not quite", but this is still looking like what I tried to capture with my paraphrase. I was asking if you were saying the following:

A full solution that was a pure extension (not revision) of UDT1 [since I was trying to work within UDT1] would have to take the form of a formal DT such that a builder with that DT would have to choose to build a correct UDT1 agent.

Yeah, that works; though of course the revised decision theories will most certainly not be formal extensions of UDT1, they might give guidelines on designing good UDT1-compliant agents.

The symmetry is broken by "1" being different from "2". The probabilities express logical uncertainty, and so essentially depend on what happens to be provable given finite resources and epistemic state of the agent, for which implementation detail matters. The asymmetry is thus hidden in mathematical intuition, and is not visible in the parts of UDT explicitly described.

...but on the other hand, you don't need the "input" at all, if decision-making is about figuring out the strategy. You can just have a strategy that produces the output, with no explicit input. The history of input can remain implicit in the agent's program, which is available anyway.

BTW, in UDT1.1 (as well as UDT1), "input" consists of the agent's entire memory of the past as well as its current perceptions. Thought I'd mention that in case there's a misunderstanding there.

Good; that was my understanding.

BTW, in UDT1.1 (as well as UDT1), "input" consists of the agent's entire memory of the past as well as its current perceptions. Thought I'd mention that in case there's a misunderstanding there.

Yes, that works too. On second thought, extracting output in this exact manner, while pushing everything else to the "input" allows to pose a problem specifically about the output in this particular situation, so as to optimize the activity for figuring out this output, rather than the whole strategy, of which right now you only need this aspect and no more.

Edit: Though, you don't need "input" to hold the rest of the strategy.

I was having trouble understanding what strategy couldn't be captured by a function X -> Y. After all, what could possibly determine the output of an algorithm other than its source code and whatever input it remembers getting on that particular run? Just to be clear, do you now agree that every strategy is captured by some function f: X -> Y mapping inputs to outputs?

One potential problem is that there are infinitely many input-output mappings. The agent can't assume a bound on the memory it will have, so it can't assume a bound on the lengths of inputs X that it will someday need to plug into an input-output mapping f.

Unlike the case where there are potentially infinitely many programs P1, P2, . . ., it's not clear to me that it's enough to wrap up an infinte set I of input-output mappings into some finite program that generates them. This is because the UDT1.1 agent needs to compute a sum for every element of I. So, if the set I is infinite, the number of sums to be computed will be infinite. Having a finite description of I won't help here, at least not with a brute-force UDT1.1 algorithm.

Any infinite thing in any given problem statement is already presented to you with a finite description. All you have to do is transform that finite description of an infinite object so as to get a finite description of a solution of your problem posed about the infinite object.

Any infinite thing in any given problem statement is already presented to you with a finite description. All you have to do is transform that finite description of an infinite object so as to get a finite description of a solution of your problem posed about the infinite object.

Right. I agree.

But, to make Wei's formal description of UDT1.1 work, there is a difference between

• dealing with a finite description of an infinite execution history Ei and

• dealing with a finite description of an infinite set I of input-output maps.

The difference is this: The execution histories only get fed into the utility function U and the mathematical intuition function (which I denote by M). These two functions are taken to be black boxes in Wei's description of UDT1.1. His purpose is not to explain how these functions work, so he isn't responsible for explaining how they deal with finite descriptions of infinite things. Therefore, the potential infinitude of the execution histories is not a problem for what he was trying to do.

In contrast, the part of the algorithm that he describes explicitly does require computing an expected utility for every input-output map and then selecting the input-output map that yielded the largest expected utility. Thus, if I is infinite, the brute-force version of UDT1.1 requires the agent to find a maximum from among infinitely many expected utilities. That means that the brute-force version just doesn't work in this case. Merely saying that you have a finite description of I is not enough to say in general how you are finding the maximum from among infinitely many expected utilities. In fact, it seems possible that there may be no maximum.

Actually, in both UDT1 and UDT1.1, there is a similar issue with the possibility of having infinitely many possible execution-history sequences . In both versions of UDT, you have to perform a sum over all such sequences. Even if you have a finite description of the set E of such sequences, a complete description of UDT still needs to explain how you are performing the sum over the infinitely many elements of the set. In particular, it's not obvious that this sum is always well-defined.

...but the action could be a natural number, no? It's entirely OK if there is no maximum - the available computational resources then limit how good a strategy the agent manages to implement ("Define as big a natural number as you can!"). The "algorithm" is descriptive, it's really a definition of optimality of a decision, not specification of how this decision is to be computed. You can sometimes optimize infinities away, and can almost always find a finite approximation that gets better with more resources and ingenuity.

The "algorithm" is descriptive, it's really a definition of optimality of a decision, not specification of how this decision is to be computed. You can sometimes optimize infinities away, and can almost always find a finite approximation that gets better with more resources and ingenuity.

Okay. I didn't know that the specification of how to compute was explicitly understood to be incomplete in this way. Of course, the description could only be improved by being more specific about just when you can "sometimes optimize infinities away, and can almost always find a finite approximation that gets better with more resources and ingenuity."

What you described is not UDT, and not even a decision theory: say, what U() is for? It's not utility of agent's decision.

I gave an accurate definition of Wei Dai's utility function U. As you note, I did not say what U is for, because I was not giving a complete recapitulation of UDT. In particular, I did not imply that U() is the utility of the agent's decision.

(I understand that U() is the utility that the agent assigns to having program Pi undergo execution history Ei for all i. I understand that, here, Ei is a complete history of what the program Pi does. However, note that this does include the agent's chosen action if Pi calls the agent as a subroutine. But none of this was relevant to the point that I was making, which was to point out that my post only applies to UDT agents that use a particular kind of function U.)

(Although Wei Dai doesn't seem to consistently follow the distinction in terminology himself, it begins to matter when you try to express things formally.)

It's looking to me like I'm following one of Wei Dai's uses of the word "probability", and you're following another. You think that Wei Dai should abandon the use of his that I'm following. I am not seeing that this dispute is more than semantics at this point. That wasn't the case earlier, by the way, where I really did misunderstand where the probabilities of possible worlds show up in Wei Dai's formalism. I now maintain that these probabilities are the values I denoted by pr(Pi) when U has the form I describe in the footnote. Wei Dai is welcome to correct me if I'm wrong.

I agree with this description now. I apologize for this instance and a couple others; stayed up too late last night, and negative impression about your post from the other mistakes primed me to see mistakes where everything is correct.

It was a little confusing, because the probabilities here have nothing to do with the probabilities supplied by mathematical intuition, while the probabilities of mathematical intuition are still in play. In UDT, different world-programs correspond to observational and indexical uncertainty, while different execution strategies to logical uncertainty about a specific world program. Only where there is essentially no indexical uncertainty, it makes sense to introduce probabilities of possible worlds, factorizing the probabilities otherwise supplied by mathematical intuition together with those describing logical uncertainty.

I agree with this description now. I apologize for this instance and a couple others; stayed up too late last night, and negative impression about your post from the other mistakes primed me to see mistakes where everything is correct.

Thanks for the apology. I accept responsibility for priming you with my other mistakes.

In UDT, different world-programs correspond to observational and indexical uncertainty, while different execution strategies to logical uncertainty about a specific world program. Only where there is essentially no indexical uncertainty, it makes sense to introduce probabilities of possible worlds, factorizing the probabilities otherwise supplied by mathematical intuition together with those describing logical uncertainty.

I hadn't thought about the connection to indexical uncertainty. That is food for thought.

But P isn't controlled by the agent's decisions.

Very very wrong. The world program P (or what it does, anyway) is the only thing that's actually controlled in this control problem statement (more generally, a list of programs, which could equivalently be represented by one program parametrized by an integer).

Edit: I misinterpreted the way Tyrrell used "P", correction here.

Very very wrong.

Here is the relevant portion of Wei Dai's post:

These considerations lead to the following design for the decision algorithm S. S is coded with a vector of programs that it cares about, and a utility function on vectors of the form that defines its preferences on how those programs should run. When it receives an input X, it looks inside the programs P1, P2, P3, ..., and uses its "mathematical intuition" to form a probability distribution P_Y over the set of vectors for each choice of output string Y. Finally, it outputs a string Y* that maximizes the expected utility Sum P_Y() U(). (This specifically assumes that expected utility maximization is the right way to deal with mathematical uncertainty. Consider it a temporary placeholder until that problem is solved. Also, I'm describing the algorithm as a brute force search for simplicity. In reality, you'd probably want it to do something cleverer to find the optimal Y* more quickly.)

If I am reading him correctly, he uses the letter "P" in two different ways. In one use, he writes Pi, where i is an integer, to denote a program. In the other use, he writes P_Y, where Y is an output vector, to denote a probability distribution.

I was referring to the second use.

Okay, the characterization of P_Y seems right. For my reaction I blame the prior.

Returning to the original argument,

the agent always cares about all possible worlds according to how probable those worlds seemed to the agent's builders when they wrote the agent's source code.

P_Y is not a description of probabilities of possible worlds conceived by agent's builder, it's something produced by "mathematical intuition module" for a given output Y (or, strategy Y if you incorporate the later patch to UDT).

P_Y is not a description of probabilities of possible worlds conceived by agent's builder, it's something produced by "mathematical intuition module" for a given output Y (or, strategy Y if you incorporate the later patch to UDT).

You are right here. Like you, I misremembered Wei Dai's notation. See my last (I hope) edit to that comment.

I would appreciate it if you edited your comment where you say that I was "very very wrong" to say that P isn't controlled by the agent's decisions.

It's easier to have a linear discussion, rather than trying to patch everything by reediting it from the start (just saying, you are doing this for the third time to that poor top-level comment). You've got something wrong, then I've got something wrong, the errors were corrected as the discussion developed, moving on. The history doesn't need to be corrected. (I insert corrections to comments this way, without breaking the sequence.)

Thank you for the edit.

The second question (edited in later) is more pressing: you can't postulate fixed probabilities of possible worlds, how the agent controls these probabilities is essential.

The second question (edited in later) is more pressing

See my edit to my reply.