Identifiability Problem for Superrational Decision Theories

Bunthut

17 Identifiability Problem for Superrational Decision Theories

by Bunthut

9th Apr 2021

AI Alignment Forum

2 min read

17 Ω 10

Superrationality, and generalizations of it, must treat options differently depending on how they're named.

Consider the penny correlation game: Both players decide independently on either head or tails. Then if they decided on the same thing, they each get one util, otherwise they get nothing. You play this game with an exact copy of yourself. You reason: since the other guy is an exact copy of me, whatever I do he will do the same thing. So we will get the util. Then you pick heads because its first alphabetically or some other silly consideration, and then you win. How good that you got to play with a copy, otherwise you would have only gotten half an util.

Now consider the penny anti-correlation game: Both players decide independently on either head or tails. If they decided the same thing, they get nothing, otherwise they get one util each. You play this game with an exact copy of yourself. You reason: since the other guy is an exact copy of me, whatever I do he will do the same thing. So the best I can do is to pick randomly with 50% chance, that way I get half an util. Thats if the gamemaster is nice. If he isnt nice, then identical copies in identical environments get the same result from their RNG. In that case you lose the game whatever you do. How bad that you had to play with a copy, otherwise you could have gotten half an util.

The problem is that these are the same game, only with the labels for one players actions switched (or for the other player. which is exactly the problem). Despite this, superrational reasoning gives us different results. This could happen because we have taken a theoretical symmetry from a physical one, and physical symmetry can pay attention to the detailed makeup of symbols. If we are aware of the difference, it should not be too surprising that there is a setup exploiting it.

Now I think the reasoning presented is correct in both cases, and the lesson here is for our expectations of rationality. From what I've seen people formed their expectations about what algorithm-controlling decision theories will do when they are worked out largely around rationality conditions, and what ought to be the outcome. In this way, common knowledge of rationality was supposed to guarantee various things, like pareto-optimal equilibria or an existent and unique solution concept. This has taken a blow here. Superrationality, from which we originally tried to generalize these properties, doesn't provide them even for the symmetrical games where it is applicable. Furthermore, this is a strong example of a classical rationality condition being violated, and in a way that is unfixable in principle. No matter what rationality ends up being, you will not win the anti-correlation game with your copy, and you will win the correlation game, even though they only differ by a renaming.

Decision TheoryEmbedded AgencyGame TheoryRationality

Frontpage

17 Ω 10

Identifiability Problem for Superrational Decision Theories

New Comment

16 comments, sorted by

top scoring

Click to highlight new comments since: Today at 3:59 AM

[-]Gurkenglas3y90

The problem is that these are the same game, only with the labels for one players actions switched

For two objects, in this case games, to be considered equivalent you usually want not merely a bijection between them but an isomorphism. A game G will be a set of labels L(G) with extra structure: Morphisms G->H will be particular functions L(G)->L(H). The structure your morphisms preserve will be what one can reason. The unusual fact you describe is that your function L(G)->M corresponds to no morphism G->H.

[-]Bunthut3y10

What is and isn't an isomorphism depends on what you want to be preserved under isomorphism. If you want everything thats game-theoretically relevant to be preserved, then of course those games won't turn out equivalent. But that doesn't explain anything. If my argument had been that the correct action in the prisoners dilemma depends on sunspot activity, you could have written your comment just as well.

[-]Gurkenglas3y20

It's easy to get confused between similar equivalence relations, so it's useful to formally distinguish them. See the other thread's arguing about sameness. Category theory language is relevant here because it gives a short description of your anomaly, so it may give you the tools to address it. And it is in fact unusual: For the cases of the underlying sets of a graph, group, ring, field, etc., one can find a morphism for every function.

We can construct a similar anomaly for the case of rings by saying that every ring's underlying set contains 0 and 1, and that these are its respective neutral elements. Then a function that swaps 0 and 1 would have no corresponding ring morphism. The corresponding solution for your case would be to encode the structure not in the names of the elements of the underlying set, but in something that falls away when you go to the set. This structure would encode such knowledge as which decision is called heads and which tails. Then for any game and any function from its underlying set you could push the structure forward.

[-]abramdemski3yΩ220

Now I think the reasoning presented is correct in both cases, and the lesson here is for our expectations of rationality.

I agree that the reasoning is correct in both cases (or rather: could be correct, assuming some details), but the lesson I derive is that we have to be really careful about our assumptions here.

Normally, in game theory, we're comfortable asserting that re-labeling the options doesn't matter (and re-numbering the players also doesn't matter). But normally we aren't worried about anthropic uncertainty in a game.

If we suppose that players can see their numbers, as well, this can be used as a signal to break symmetry for anti-matching. Player 1 can choose option 1, and player 2 can choose option 2. (Or whatever -- they just have to agree on an anti-matching policy acausally.)

Thinking physically, the question is: are the two players physically precisely the same (including environment), at least insofar as the players can tell? Then anti-matching is hard. Usually we don't need to think about such things for game theory (since a game is a highly abstracted representation of the physical situation).

But this is one reason why correlated equilibria are, usually, a better abstraction than Nash equilibria. For example, a game of chicken is similar to anti-matching. In correlated equilibria, there is a "fair" solution to chicken: each player goes straight with 50% probability (and the other player swerves). This corresponds to the idea of a traffic light. If traffic lights were not invented, some other correlating signal from the environment might be used (particularly as we assume increasingly intelligent agents). This is a possible game-theoretic explanation for divination practices such as reading entrails.

Nash equilibria, otoh, are a better abstraction for the case where there truly is no "environment" to take complicated signals from (besides what you explicitly represent in the game). It better fits a way of thinking where models are supposed to be complete.

[-]Bunthut3yΩ010

are the two players physically precisely the same (including environment), at least insofar as the players can tell?

In the examples I gave yes. Because thats the case where we have a guarantee of equal policy, from which people try to generalize. If we say players can see their number, then the twins in the prisoners dilemma needn't play the same way either.

But this is one reason why correlated equilibria are, usually, a better abstraction than Nash equilibria.

The "signals" players receive for correlated equilibria are already semantic. So I'm suspicious that they are better by calling on our intuition more to be used, with the implied risks. For example I remember reading about a result to the effect that correlated equilibria are easier to learn. This is not something we would expect from your explanation of the differences: If we explicitly added something (like the signals) into the game, it would generally get more complicated.

[-]abramdemski3yΩ220

The "signals" players receive for correlated equilibria are already semantic. So I'm suspicious that they are better by calling on our intuition more to be used, with the implied risks. For example I remember reading about a result to the effect that correlated equilibria are easier to learn. This is not something we would expect from your explanation of the differences: If we explicitly added something (like the signals) into the game, it would generally get more complicated.

It's not something we would naively expect, but it does further speak in favor of CE, yes?

In particular, if you look at those learnability results, it turns out that the "external signal" which the agents are using to correlate their actions is the play history itself. IE, they are only using information which must be available to learning agents (granted, sufficiently forgetful learning agents might forget the history; however, I do not think the learnability results actually rely on any detailed memory of the history -- the result still holds with very simple agents who only remember a few parameters, with no explicit episodic memory (unlike, eg, tit-for-tat).

[-]adamShimi3yΩ120

I don't see how the two problems are the same. They are basically the agreement and symmetry breaking problems of distributed computing, and those two are not equivalent in all models. What you're saying is simply that in the no-communication model (where the same algorithm is used on two processes that can't communicate), these two problems are not equivalent. But they are asking for fundamentally different properties, and are not equivalent in many models that actually allow communication.

[-]Bunthut3yΩ010

"The same" in what sense? Are you saying that what I described in the context of game theory is not surprising, or outlining a way to explain it in retrospect?

Communication won't make a difference if you're playing with a copy.

[-]adamShimi3yΩ020

Well, if I understand the post correctly, you're saying that these two problems are fundamentally the same problem, and so rationality should be able to solve them both if it can solve one. I disagree with that, because from the perspective of distributed computing (which I'm used to), these two problems are exactly the two kinds of problems that are fundamentally distinct in a distributed setting: agreement and symmetry-breaking.

Communication won't make a difference if you're playing with a copy.

Actually it could. Basically all of distributed computing assumes that every process is running the same algorithm, and you can solve symmetry-breaking in this case with communication and additional constraint on the scheduling of processes (the difficulty here is that the underlying graph is symmetric, whereas if you had some form of asymmetry (like three processes in a line, such that the one in the middle has two neighbors but the others only have one), they you can use directly that asymmetry to solve symmetry-breaking.

(By the way, you just gave me the idea that maybe I can use my knowledge of distributed computing to look at the sort of decision problems where you play with copies? Don't know if it would be useful, but that's interesting at least)

[-]Bunthut3yΩ010

Well, if I understand the post correctly, you're saying that these two problems are fundamentally the same problem

No. I think:

...the reasoning presented is correct in both cases, and the lesson here is for our expectations of rationality...

As outlined in the last paragraph of the post. I want to convince people that TDT-like decision theories won't give a "neat" game theory, by giving an example where they're even less neat than classical game theory.

Actually it could.

I think you're thinking about a realistic case (same algorithm, similar environment) rather than the perfect symmetry used in the argument. A communication channel is of no use there because you could just ask yourself what you would send, if you had one, and then you know you would have just gotten that message from the copy as well.

I can use my knowledge of distributed computing to look at the sort of decision problems where you play with copies

I'd be interested. I think even just more solved examples of the reasoning we want are useful currently.

[-]adamShimi3yΩ120

As outlined in the last paragraph of the post. I want to convince people that TDT-like decision theories won't give a "neat" game theory, by giving an example where they're even less neat than classical game theory.

Hum, then I'm not sure I understand in what way classical game theory is neater here?

I think you're thinking about a realistic case (same algorithm, similar environment) rather than the perfect symmetry used in the argument. A communication channel is of no use there because you could just ask yourself what you would send, if you had one, and then you know you would have just gotten that message from the copy as well.

As long as the probabilistic coin flips are independent on both sides (you also mention the case where they're symmetric, but let's put that aside for the example), then you can apply the basic probabilistic algorithm for leader election: both copies flip a coin n times to get a n-bit number, which they exchange. If the numbers are different, then the copy with the smallest one says 0 and the other says 1; otherwise they flip a coin and return the answer. With this algorithm, you have probability of deciding different values, and so you can get as close as you want to 1 (by paying the price in more random bits).

I'd be interested. I think even just more solved examples of the reasoning we want are useful currently.

Do you have examples of problems with copies that I could look at and that you think would be useful to study?

[-]Bunthut3y10

Hum, then I'm not sure I understand in what way classical game theory is neater here?

Changing the labels doesn't make a difference classically.

As long as the probabilistic coin flips are independent on both sides

Yes.

Do you have examples of problems with copies that I could look at and that you think would be useful to study?

No, I think you should take the problems of distributed computing, and translate them into decision problems, that you then have a solution to.

[-]shminux3y20

Despite this, superrational reasoning gives us different results.

what is that "superrational" reasoning that gives different results?

[-]Bunthut3y10

...the one outlined in the post?

[-]TekhneMakre3y10

From a superrational perspective (in the game with no randomness), in both cases there's two actions; in the correlation game both actions give a util, in the anti-correlation game both actions give no utils. The apparent difference is based on the incoherent counterfactual "what if I say heads and my copy says tails", which doesn't translate into the superrational perpective.

[-]Bunthut3y10

The apparent difference is based on the incoherent counterfactual "what if I say heads and my copy says tails"

I don't need counterfactuals like that to describe the game, only implications. If you say heads and your copy tails, you will get one util, just like how if 1+1=3, the circle can be squared.

The interesting thing here is that superrationality breaks up an equivalence class relative to classical game theory, and peoples intuitions don't seem to have incorporated this.

Moderation Log