Explicit Optimization of Global Strategy (Fixing a Bug in UDT1)

8Tyrrell_McAllister

9timtyler

2Vladimir_Nesov

2Tyrrell_McAllister

0Vladimir_Nesov

1Tyrrell_McAllister

1timtyler

3Sniffnoy

1timtyler

3Sniffnoy

1RobinZ

-1timtyler

0RobinZ

0timtyler

0RobinZ

2timtyler

0Emile

0jimrandomh

2cousin_it

0Vladimir_Nesov

0cousin_it

2Vladimir_Nesov

1Wei Dai

0timtyler

8Wei Dai

-4timtyler

0Tyrrell_McAllister

0timtyler

1Mitchell_Porter

3Matt_Stevenson

1timtyler

1dclayh

0Vladimir_Nesov

0jimrandomh

0Sniffnoy

4pengvado

-2timtyler

1wedrifid

New Comment

38 comments, sorted by Click to highlight new comments since: Today at 6:38 PM

What about when there are agents with difference source codes and different preferences?

Set aside different preferences, because I have no idea how to deal with that. It seems like just having different source codes makes things very difficult.

You used "1" and "2" as your indexical labels, which come with a very obvious order. But suppose that Omega labeled one agent ♠ and the other agent ♣. If you and your opposite have different source codes, you have no idea how your opposite internally represents the symbol that they received. For example, you certainly have no guarantee that the two of you order these in lexicographically the same way. So how could you possibly coordinate on a tie-breaking strategy?

Set aside different preferences, because I have no idea how to deal with that. It seems like just having different source codes makes things very difficult.

Where do you think preference lives? It's but a property of the source code, or rather, a way of looking at the source code. If you change the source code, preference changes as well (to some extent).

Where do you think preference lives? It's but a property of the source code, or rather, a way of looking at the source code.

I agree that preferences live in the source code. Having different preferences implies having different source codes. I also agree that preferences are but a way to interpret source code, so that any two source codes might be said to have the same preferences with respect to *some* preference scheme.

So my comment was assuming that we already have an implicit way of mapping source codes to preference schemes, and that that mapping is not injective. That is, different source codes might have the same preference scheme. But these different source codes might still use different internal representation schemes for representing their input.

I think that my issue disappears if each agent has access to the other's source code.

I also agree that preferences are but a way to interpret source code, so that any two source codes might be said to have the same preferences with respect to

somepreference scheme.

I don't understand what the above means (and so can't readily be in agreement with it). ("Preference scheme"?)

So my comment was assuming that we already have an implicit way of mapping source codes to preference schemes, and that that mapping is not injective.

I expect preference mapping to be "essentially" injective (possibly disregarding only stuff that doesn't play any role in the algorithm, including the *way* it computes as well as the externally visible behavior). Preference is what the program *does*, and it's everything that the program does. (But what the program does isn't necessarily *preferable* according to that same program seen as preference.)

I don't understand what the above means (and so can't readily be in agreement with it). ("Preference scheme"?)

At this date almost four years later, I'm not exactly sure what I meant, which is a pretty strong indictment of my ability to write clearly.

I *think* that I meant the following: Take the set of all possible outcomes, partition this set of outcomes into equivalence classes, and then put an order on the resulting set of equivalence classes. The ordered set of equivalence classes is what I was calling a "preference scheme". (... I think.)

ETA: On further thought, I don't think that that's exactly what I meant, but I honestly don't know what I *did* mean. I'd like to think that I could have explained myself better at the time, but I didn't write enough so that even I-now could reconstruct what that explanation would have been.

*Oh,* I see the problem. I was talking about Tyrrell_McAllister's question upthread, in which the assumption of identical source code (i.e. copying) is dropped.

If you don't know much about the other agent - except that it is also trying to win - I figure you should also probably just do the best you can to pick the most mutually-obvious ordering, hoping that they will be doing much the same. Sometimes, it won't work out - but that is doing as well as you can.

That's assuming linear utility. If the most important thing is to consistently get at least a few points, then randomness may be a better strategy.

So how could you possibly coordinate on a tie-breaking strategy?

I'd use alphabetical ordering, and assume my opposite would too. So clubs would come before spades, black before white, etc.

Now, if your opponent doesn't use the same language (or doesn't use a language, only unicode symbols), it almost becomes a game of chance, like "here's an aarvark and a wapiti, the same animals have been shown to some (non-english speaking) Chinese guy, you each get to choose one, if the two of you choose different ones you get a hundred bucks".

For example, you certainly have no guarantee that the two of you order these in lexicographically the same way. So how could you possibly coordinate on a tie-breaking strategy?

You can't, *unless* you can communicate in some way, such as by talking to eachother or reading eachother's source code. In that case it's easy.

You want a general algorithm that plays arbitrary games against other copies of itself "fairly". If utilities are transferable, it's easy, but note that this solution *wasn't* derived from consideration of logical correlations alone. If utilities are non-transferable, I've given up on the problem because the literature offers bewilderingly many different solutions and no clear winner.

You could try a simpler problem as a stepping stone: blame assignment. (I'm currently trying to solve this myself for a future post but what the hell, let a million flowers bloom.) In a given play of a multiplayer game, how much credit/blame should we assign to player A for the payoff that player B got? Concrete example: how many people did one voter personally kill by voting for Hitler and how is this responsibility shared between them and Hitler? I believe this should be easier than NTU fairness but it's still too tricky for me.

It's misleading when you are calling efficient decision-making "fairness". It's not about giving the poor an equal share, it's about making sure you get the best deal there is. (And such an algorithm should be able to play arbitrary games not just against other copies of itself -- by the way, what exactly is a copy is we don't have a formal setting?)

Completely agreed on first point, I view it as something like "fairness according to bargaining power".

what exactly is a copy is we don't have a formal setting?

In a problem this hard, I still consider full source code visibility a reasonable starting point. I'll be very suspicious if a proposed solution works in some other formal setting but *doesn't* work under full source visibility.

This is essentially PD (in the aspects relevant to this post), but without magical identification of Player1.Cooperate and Player2.Cooperate by virtue of using the same label "Cooperate". Consider what happens with your thought experiment if the rule given by Omega is "You should give the same answer", given that they already have differentiating info (assigned number). This distinction shouldn't matter.

Recursion through decision-making of all relevant agents seems conceptually indispensable. When Player1 sees itself again through the eyes of Player2, symmetry or asymmetry between the players (as opposed to identity of the recursive copies of each player) becomes irrelevant.

Consider: if players are different, then Player1 knows about Player2 that knows about Player1 (recursion, a site of TDT-like acausal control); if players are the same, then Player1 knows about identical Player2 (recursion at the first step). If we go the first road, not much is lost, but we get more generality.

Ok, there are currently two suggested ways for agents to achieve logical correlation: through mutual prediction, or by using the same source code, and it sounds like you're saying that the first method is more powerful. But so far I don't really see how it can work at all. Can you explain how the mutual prediction approach would solve the problem given in my post, or any other problem that might show its advantage?

The two of you seem to be missing the point of this post. This sample problem isn't hard or confusing in and of itself (like Newcomb's Problem), but merely meant to illustrate a limitation of the usefulness of logical correlation in decision theory. The issue here isn't whether we can find some way to make the right decision (obviously we can, and I gave a method in the post itself) but whether it can be made through consideration of logical correlation alone.

More generally, some people don't seem to get what might be called "decision theoretic thinking". When some decision problem is posted, they just start talking about how they would make the decision, instead of thinking about how to design an algorithm that would solve that problem and every other decision problem that it might face. Maybe I need to do a better job of explaining this?

Mitchell_Porter didn't *just* solve the problem, he *explained* how he did it.

Did he do it "by consideration of logical correlation alone"? I do not know what that is intended to mean. Correlation normally has to be between two or more variables. In the post you talk about an agent taking account of "logical correlations between different instances of itself". I don't know what that means either.

More to the point, I don't know why it is desirable. Surely one just wants to make the right decisions.

Expected utility maximisation solves this problem fine - if the agent has a tendency to use a similar tie-breaking strategy to Mitchell_Porter . If an agent has no such tendency, and expects this kind of problem, then it will aspire to develop a similar tendency.

If an agent has no such tendency, and expects this kind of problem, then it will aspire to develop a similar tendency.

This sounds like your decision theory is "Decide to use the best decision theory."

I guess there's an analogy to people whose solution to the hard problems that humanity faces is "Build a superintelligent AI that will solve those hard problems."

This was my thought process: To get the $10, my copy and I have to choose differently. I am 1, he is 2. I have to choose A or B... At some point I thought of the mapping A=1, B=2, implicitly as part of the bigger mapping (A...Z)=(1...26) I suppose. I noticed that this was a particular mapping which had spontaneously presented itself to me. So it must be a natural one for me to think of; so there is a good chance my copy will think of it as well. So I select A, hoping my copy went through an analogous process, arrived at the same mapping and selected B.

Here you are relying on omega using two ordering systems that we already find highly correlated.

What if Omega asked you to choose between a blegg and a rube instead of A and B. Along with that, Omega tells you that it did not necessarily use the same ordering of blegg and rube when posing the question to the copy.

EDIT: More thoughts: If you can't rely on an obvious correlation between the player labels and choices, why not have a strategy to make a consistent mapping from the player labels to the choices.

The key to winning this game is having both parties disagree. If both parties know the goal and have a consistent mapping process, it would be trivial for them to arrive at different choices.

A simple mapping would be alphabetize the player labels and the choice labels. Player(1) => choice(1), Player(2) => choice(2), Player(n) => choice(n).

What about when there are agents with difference source codes and different preferences? The result here suggests that one of our big unsolved problems, that of generally deriving a "good and fair" global outcome from agents optimizing their own preferences while taking logical correlations into consideration, may be unsolvable, since consideration of logical correlations does not seem powerful enough to always obtain a "good and fair" global outcome even in the single-player case.

I don't understand this statement. What do you mean by "logical correlations", and how does this post demonstrate that they are insufficient for getting the right solution?

Suppose you're choosing a strategy S for a cooperation game with some other entity X, which you are told nothing about. Then
U(S) = .5 * (S(1)!=X(2)) + .5 * (S(2)!=X(1))
In this case, you have to choose a probability distribution over other entities X, and choose S to optimize the utility function based on that. There's no way around that. If we're told that X was given the same utility function, and is trying to optimize over it, then that greatly narrows down the possibilities for what X is. We assume that X is chosen, by some unspecified but intelligent process, to also optimize U. Fortunately, English culture provides a standard mapping between numbers and letters (A=1, B=2, C=3, ...); so if we assume X has some probability of coming from a similar culture and choosing that mapping for that reason, and will choose an arbitrary random mapping otherwise, then we're better off with the standard mapping.

If the other agent has a different utility function, then that changes your probability distribution over what that agent is. If we're told that the other agent is supposed to implement the utility function "1 if it chooses A, 0 if it chooses B", then its implementation is probably going to be to just return A, so we should always return B.

Now assume that when we enter into the coordination game, we're told something about A, and A is told something about us. Then our utility function is
U(S) = .5*(S(1,X)!=X(2,S)) + .5*(S(2,X)!=X(1,S))
We still need a probability distribution over Xs, but this time the distribution includes Xs that model S. If we're also told that X is supposed to be optimizing the same utility function, then we can assign some probability to it modeling S with each of various techniques, and to it being model-able with each of various techniques. Not all modeling techniques will work on all functions - some of them lead to infinite regress, some are just bad designs that can't model anything accurately, etc - so to maximize the probability of successful coordination we should both make S easy for X to model, and make S try to model X.

Different kinds of games lead to different kinds of likely opponents, hence the field of game theory. A nash equilibrium is any pair of strategies that optimize utility under the assumption that the other is their opponent.

When describing UDT1 solutions to various sample problems, I've often talked about UDT1 finding the function S* that would optimize its preferences over the world program P, and then return what S* would return, given its input. But in my original description of UDT1, I never explicitly mentioned optimizing S as a whole, but instead specified UDT1 as, upon receiving input X, finding the optimal output Y* for that input, by considering the logical consequences of choosing various possible outputs. I have been implicitly assuming that the former (optimization of the global strategy) would somehow fall out of the latter (optimization of the local action) without having to be explicitly specified, due to how UDT1 takes into account logical correlations between different instances of itself. But recently I found an apparent counter-example to this assumption.

(I think this "bug" also exists in TDT, but I don't understand it well enough to make a definite claim. Perhaps Eliezer or someone else can tell me if TDT correctly solves the sample problem given here.)

Here is the problem. Suppose Omega appears and tells you that you have just been copied, and each copy has been assigned a different number, either 1 or 2. Your number happens to be 1. You can choose between option A or option B. If the two copies choose different options without talking to each other, then each gets $10, otherwise they get $0.

Consider what happens in the original formulation of UDT1. Upon receiving the input "1", it can choose "A" or "B" as output. What is the logical implication of S(1)="A" on the computation S(2)? It's not clear whether S(1)="A" implies S(2)="A" or S(2)="B", but actually neither can be the right answer.

Suppose S(1)="A" implies S(2)="A". Then by symmetry S(1)="B" implies S(2)="B", so both copies choose the same option, and get $0, which is clearly not right.

Now instead suppose S(1)="A" implies S(2)="B". Then by symmetry S(1)="B" implies S(2)="A", so UDT1 is indifferent between "A" and "B" as output, since both have the logical consequence that it gets $10. So it might as well choose "A". But the other copy, upon receiving input "2", would go though this same reasoning, and also output "A".

The fix is straightforward in the case where every agent already has the same source code and preferences. UDT1.1, upon receiving input X, would put that input aside and first iterate through all possible input/output mappings that it could implement and determine the logical consequence of choosing each one upon the executions of the world programs that it cares about. After determining the optimal S* that best satisfies its preferences, it then outputs S*(X).

Applying this to the above example, there are 4 input/output mappings to consider:

_{1}(1)="A", S_{1}(2)="A"_{2}(1)="B", S_{2}(2)="B"_{3}(1)="A", S_{3}(2)="B"_{4}(1)="B", S_{4}(2)="A"Being indifferent between S

_{3}and S_{4}, UDT1.1 picks S*=S_{3}and returns S_{3}(1)="A". The other copy goes through the same reasoning, also picks S*=S_{3}and returns S_{3}(2)="B". So everything works out.What about when there are agents with difference source codes and different preferences? The result here suggests that one of our big unsolved problems, that of generally deriving a "good and fair" global outcome from agents optimizing their own preferences while taking logical correlations into consideration, may be unsolvable, since consideration of logical correlations does not seem powerful enough to always obtain a "good and fair" global outcome even in the single-player case. Perhaps we need to take an approach more like cousin_it's, and try to solve the cooperation problem from the top down. That is, by explicitly specifying a fair way to merge preferences, and simultaneously figuring out how to get agents to join into such a cooperation.