It commonly acknowledged here that current decision theories have deficiencies that show up in the form of various paradoxes. Since there seems to be little hope that Eliezer will publish his Timeless Decision Theory any time soon, I decided to try to synthesize some of the ideas discussed in this forum, along with a few of my own, into a coherent alternative that is hopefully not so paradox-prone.

I'll start with a way of framing the question. Put yourself in the place of an AI, or more specifically, the decision algorithm of an AI. You have access to your own source code S, plus a bit string X representing all of your memories and sensory data. You have to choose an output string Y. That’s the decision. The question is, how? (The answer isn't “Run S,” because what we want to know is what S should be in the first place.)

Let’s proceed by asking the question, “What are the consequences of S, on input X, returning Y as the output, instead of Z?” To begin with, we'll consider just the consequences of that choice in the realm of abstract computations (i.e. computations considered as mathematical objects rather than as implemented in physical systems). The most immediate consequence is that any program that calls S as a subroutine with X as input, will receive Y as output, instead of Z. What happens next is a bit harder to tell, but supposing that you know something about a program P that call S as a subroutine, you can further deduce the effects of choosing Y versus Z by tracing the difference between the two choices in P’s subsequent execution. We could call these the computational consequences of Y. Suppose you have preferences about the execution of a set of programs, some of which call S as a subroutine, then you can satisfy your preferences directly by choosing the output of S so that those programs will run the way you most prefer.

A more general class of consequences might be called logical consequences. Consider a program P’ that doesn’t call S, but a different subroutine S’ that’s logically equivalent to S. In other words, S’ always produces the same output as S when given the same input. Due to the logical relationship between S and S’, your choice of output for S must also affect the subsequent execution of P’. Another example of a logical relationship is an S' which always returns the first bit of the output of S when given the same input, or one that returns the same output as S on some subset of inputs.

In general, you can’t be certain about the consequences of a choice, because you’re not logically omniscient. How to handle logical/mathematical uncertainty is an open problem, so for now we'll just assume that you have access to a "mathematical intuition subroutine" that somehow allows you to form beliefs about the likely consequences of your choices.

At this point, you might ask, “That’s well and good, but what if my preferences extend beyond abstract computations? What about consequences on the physical universe?” The answer is, we can view the physical universe as a program that runs S as a subroutine, or more generally, view it as a mathematical object which has S embedded within it. (From now on I’ll just refer to programs for simplicity, with the understanding that the subsequent discussion can be generalized to non-computable universes.) Your preferences about the physical universe can be translated into preferences about such a program P and programmed into the AI. The AI, upon receiving an input X, will look into P, determine all the instances where it calls S with input X, and choose the output that optimizes its preferences about the execution of P. If the preferences were translated faithfully, the the AI's decision should also optimize your preferences regarding the physical universe. This faithful translation is a second major open problem.

What if you have some uncertainty about which program our universe corresponds to? In that case, we have to specify preferences for the entire set of programs that our universe may correspond to. If your preferences for what happens in one such program is independent of what happens in another, then we can represent them by a probability distribution on the set of programs plus a utility function on the execution of each individual program. More generally, we can always represent your preferences as a utility function on vectors of the form <E1, E2, E3, …> where E1 is an execution history of P1, E2 is an execution history of P2, and so on.

These considerations lead to the following design for the decision algorithm S. S is coded with a vector <P1, P2, P3, ...> of programs that it cares about, and a utility function on vectors of the form <E1, E2, E3, …> that defines its preferences on how those programs should run. When it receives an input X, it looks inside the programs P1, P2, P3, ..., and uses its "mathematical intuition" to form a probability distribution P_Y over the set of vectors <E1, E2, E3, …> for each choice of output string Y. Finally, it outputs a string Y* that maximizes the expected utility Sum P_Y(<E1, E2, E3, …>) U(<E1, E2, E3, …>). (This specifically assumes that expected utility maximization is the right way to deal with mathematical uncertainty. Consider it a temporary placeholder until that problem is solved. Also, I'm describing the algorithm as a brute force search for simplicity. In reality, you'd probably want it to do something cleverer to find the optimal Y* more quickly.)

#### Example 1: Counterfactual Mugging

Note that Bayesian updating is not done explicitly in this decision theory. When the decision algorithm receives input X, it may determine that a subset of programs it has preferences about never calls it with X and are also logically independent of its output, and therefore it can safely ignore them when computing the consequences of a choice. There is no need to set the probabilities of those programs to 0 and renormalize.

So, with that in mind, we can model Counterfactual Mugging by the following Python program:

def P(coin):

AI_balance = 100

if coin == "heads":

if S("heads") == "give $100":

AI_balance -= 100

if coin == "tails":

if Omega_Predict(S, "heads") == "give $100":

AI_balance += 10000

The AI’s goal is to maximize expected utility = .5 * U(AI_balance after P("heads")) + .5 * U(AI_balance after P("tails")). Assuming U(AI_balance)=AI_balance, it’s easy to determine U(AI_balance after P("heads")) as a function of S’s output. It equals 0 if S(“heads”) == “give $100”, and 100 otherwise. To compute U(AI_balance after P("tails")), the AI needs to look inside the Omega_Predict function (not shown here), and try to figure out how accurate it is. Assuming the mathematical intuition module says that choosing “give $100” as the output for S(“heads”) makes it more likely (by a sufficiently large margin) for Omega_Predict(S, "heads") to output “give $100”, then that choice maximizes expected utility.

#### Example 2: Return of Bayes

This example is based on case 1 in Eliezer's post Priors as Mathematical Objects. An urn contains 5 red balls and 5 white balls. The AI is asked to predict the probability of each ball being red as it as drawn from the urn, its goal being to maximize the expected logarithmic score of its predictions. The main point of this example is that this decision theory can reproduce the effect of Bayesian reasoning when the situation calls for it. We can model the scenario using preferences on the following Python program:

def P(n):

urn = ['red', 'red', 'red', 'red', 'red', 'white', 'white', 'white', 'white', 'white']

history = []

score = 0

while urn:

i = n%len(urn)

n = n/len(urn)

ball = urn[i]

urn[i:i+1] = []

prediction = S(history)

if ball == 'red':

score += math.log(prediction, 2)

else:

score += math.log(1-prediction, 2)

print (score, ball, prediction)

history.append(ball)

Here is a printout from a sample run, using n=1222222:

-1.0 red 0.5

-2.16992500144 red 0.444444444444

-2.84799690655 white 0.375

-3.65535182861 white 0.428571428571

-4.65535182861 red 0.5

-5.9772799235 red 0.4

-7.9772799235 red 0.25

-7.9772799235 white 0.0

-7.9772799235 white 0.0

-7.9772799235 white 0.0

S should use deductive reasoning to conclude that returning (number of red balls remaining / total balls remaining) maximizes the average score across the range of possible inputs to P, from n=1 to 10! (representing the possible orders in which the balls are drawn), and do that. Alternatively, S can approximate the correct predictions using brute force: generate a random function from histories to predictions, and compute what the average score would be if it were to implement that function. Repeat this a large number of times and it is likely to find a function that returns values close to the optimum predictions.

#### Example 3: Level IV Multiverse

In Tegmark's Level 4 Multiverse, all structures that exist mathematically also exist physically. In this case, we'd need to program the AI with preferences over all mathematical structures, perhaps represented by an ordering or utility function over conjunctions of well-formed sentences in a formal set theory. The AI will then proceed to "optimize" all of mathematics, or at least the parts of math that (A) are logically dependent on its decisions and (B) it can reason or form intuitions about.

I suggest that the Level 4 Multiverse should be considered the default setting for a general decision theory, since we cannot rule out the possibility that all mathematical structures do indeed exist physically, or that we have direct preferences on mathematical structures (in which case there is no need for them to exist "physically"). Clearly, application of decision theory to the Level 4 Multiverse requires that the previously mentioned open problems be solved in their most general forms: how to handle logical uncertainty in any mathematical domain, and how to map fuzzy human preferences to well-defined preferences over the structures of mathematical objects.

**Added:** For further information and additional posts on this decision theory idea, which came to be called "Updateless Decision Theory", please see its entry in the LessWrong Wiki.

There's

lotsof mentions of Timeless Decision Theory (TDT) in this thread - as though it refers to something real. However, AFAICS, the reference is to unpublished material by Eliezer Yudkowsky.I am not clear about how anyone is supposed to make sense of all these references before that material has been published. To those who use "TDT" as though they know what they are talking about - and who are

notEliezer Yudkowsky - what exactly is it that you think you are talking about?Now that I have some idea what Eliezer and Nesov were talking about, I'm still a bit confused about AI cooperation. Consider the following scenario: Omega appears and asks two human players (who are at least as skilled as Eliezer and Nesov) to each design an AI. The AIs will each undergo some single-player challenges like Newcomb's Problem and Counterfactual Mugging, but there will be a one-shot PD between the two AIs at the end, with their source codes hidden from each other. Omega will grant each human player utility equal to the total score of his or he... (read more)

There are two parts to AGI: consequentialist reasoning and preference.

Humans have feeble consequentialist abilities, but can use computers to implement huge calculations, if the problem statement can be entered in the computer. For example, you can program the material and mechanical laws in an engineering application, enter a building plan, and have the computer predict what's going to happen to it, or what parameters should be used in the construction so that the outcome is as required. That's the power outside human mind, directed by the correct laws, and targeted at the formally specified problem.

When you consider AGI in isolation, it's like an engineering application with a random building plan: it can powerfully produce a solution, but it's not a solution to the problem you need solving. Nonetheless, this part is essential when you

dohave an ability to specify the problem. And that's the AI's algorithm, one aspect of which is decision-making. It's separate from the problem statement that comes from human nature.For an engineering program, you can say that the computer is basically doing what a person would do if they had crazy amount of time and machine patience. But that's... (read more)

1) Congratulations: moving to logical uncertainty and considering your decision's consequences to be the consequence of

that logical programoutputting a particular decision, is what I would call the key insight in moving to (my version of) timeless decision theory. The rest of it (that is, the work I've done already) is showing that this answer is the only reflectively consistent one for a certain class of decision problems, and working through some of the mathematical inelegancies in mainstream decision theory that TDT seems to successfully clear up an... (read more)Why didn't you mention earlier that your timeless decision theory mainly had to do with logical uncertainty? It would have saved people a lot of time trying to guess what you were talking about.

Looking at my 2001 post, it seems that I already had the essential idea at that time, but didn't pursue very far. I think it was because (A) I wasn't as interested in AI back then, and (B) I thought an AI ought to be able to come up with these ideas by itself.

I still think (B) is true, BTW. We should devote some time and resources to thinking about

howwe are solving these problems (and coming up with questions in the first place). Finding that algorithm is perhaps more important than finding a reflectively consistent decision algorithm, if we don't want an AI to be stuck with whatever mistakes we might make.To celebrate, here are some pictures of Omega!

(except the models that are palette swaps of Ultima)

Although I still have not tried to decipher what "Timeless Decision Theory" or "Updateless Decision Theory" is actually about, I would like to observe that it is very unlikely that the "timeless" aspect, in the sense of an ontology which denies the reality of time, change, or process, is in any way essential to how it works.

If you have a Julian-Barbour-style timeless wavefunction of the universe, which associates an amplitude with every point in a configuration space of spacelike states of the universe, you can always constru... (read more)

Thanks for twisting my mind in the right direction with the S' stuff. I hereby submit the following ridiculous but rigorous theory of Newcomblike problems:

You submit a program that outputs a row number in a payoff matrix, and a "world program" simultaneously outputs a column number in the same matrix; together they determine your payoff. Your program receives the source code of the world program as an argument. The world program

doesn'treceive your source code, but it contains some opaque function calls to an "oracle" that's guaranteed... (read more)2) The key problem in Drescher's(?) Counterfactual Mugging is that after you actually

seethe coinflip, your posterior probability of "coin comes up heads" is no longer 0.5 - so if you compute the answer after seeing the coin, the answer is not the reflectively consistent one. I still don't know how to handle this - it's not in the class of problems to which my TDT corresponds.Please note that the problem persists if we deal in a non-quantum coin, like an unknown binary digit of pi.

I thought the answer Vladimir Nesov already posted solved Counterfactual Mugging for a quantum coin?

In this solution, there is no belief updating; there is just decision theory. (All probabilities are "timestamped" to the beliefs of the agent's creator when the agent was created.) This means that the use of Bayesian belief updating with expected utili... (read more)

My book discusses a similar scenario: the dual-simulation version of Newcomb's Problem (section 6.3), in the case where the large box is empty (no $1M) and (I argue) it's still rational to forfeit the $1K. Nesov's version nicely streamlines the scenario.

Just to elaborate a bit, Nesov's scenario and mine share the following features:

In both cases, we argue that an agent should forfeit a smaller sum for the sake of a larger reward that would have been obtainted (couterfactually contingently on that forfeiture) if a random event had turned out differently than in fact it did (and than the agent knows it did).

We both argue for using the original coin-flip probability distribution (i.e., not-updating, if I've understood that idea correctly) for purposes of this decision, and indeed in general, even in mundane scenarios.

We both note that the forfeiture decision is easier to justify if the coin-toss was quantum under MWI, because then the original probability distribution corresponds to a real physical distribution of amplitude in configuration-space.

Nesov's scenario improves on mine in several ways. He eliminates some unnecessary complications (he uses one simulation instead of two, and just tells the agent what the coin-toss was, whereas my scenario requires the agent to deduce that). So he makes the point more clearly, succinctly and dramatically. Even more importantly, his analysis (along with Yudkowsky, Dai, and others here... (read more)

When describing your decision theory, you say, "Finally, [the agent] outputs a string Y* that maximizes the expected utility Sum P_Y(<E1, E2, E3, …>) U(<E1, E2, E3, …>)."

P_Y and U are both shown to be functions of <E1, E2, E3,...>, not Y*. Could you explain how assigning values to Y* would affect the sum?

Why do you insist on making life harder on yourself? If the problem isn't solved satisfactorily in a simple world model, e.g. a deterministic finite process with however good mathematical properties you'd like, it's not yet time to consider more complicated situations, with various poorly-understood kinds of uncertainty, platonic mathematical objects, and so on and so forth.

So...UDT dominates all known decision theories

Understanding check:

But does the Bayesian update occur if the input X affects the relative probabilities of the programs without s... (read more)

This is one of those posts where I think "I wish I could understand the post". Way to technical for me right now. I sometimes wish that someone can do a "Non-Technical" and Non-mathematical version of posts like these ones. (but I guess it will take too much time and effort). But then I get away saying, I don't need to understand everything, do I?

PS again:

Don't forget to retract: http://www.weidai.com/smart-losers.txt

Smart agents win.