A test for symbol grounding methods: true zero-sum games

by Stuart_Armstrong2 min read26th Nov 20192 comments


Ω 10

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Imagine there are two AIs playing a debate game. The game is zero-sum; at the end of the debate, the human judge assigns the winner, and that AI gets a reward, while the other one gets a .

Except the game, as described, is not truly zero-sum. That is because the AI "get" a reward. How is that reward assigned? Presumably there is some automated system that, when the human presses a button, routes to one AI and to another. These rewards are stored as bits, somewhere "in" or around the two AIs.

Thus there are non zero-sum options: you could break into the whole network, gain control of the automated system, and route to each AI - or, why not, or even or whatnot[1].

Thus, though we can informally say that "the AIs are in a zero-sum game as to which one wins the debate", that sentence is not properly grounded in the world; it is only true as long as certain physical features of the world are maintained, features which are not mentioned in that sentence.

Symbol grounding implies possibility of zero-sum

Conversely, imagine that an AI has a utility/reward which is properly grounded in the world. Then it seems that we should be able to construct an AI with utility/reward which is also properly grounded in the world. So it seems that any good symbol grounding system should allow us to define truly zero sum games between AIs.

There are, of course, a few caveats. Aumann's agreement theorem requires unboundedly rational agents with common priors. Similarly, though properly grounded and are zero-sum, the agents might not be fully zero-sum with each other, due to bounded rationality or different priors.

Indeed, it is possible to setup a situation where even unboundedly rational agents with common prior will knowingly behave in not-exactly zero-sum ways with each other; for example, you can isolate the two agents from each other, and feed them deliberately biased information.

But those caveats aside, it seems that proper symbol grounding implies that you can construct agents that are truly zero-sum towards each other.

Zero-sum implies symbols grounded?

Is this an equivalence? If two agents really do have zero sum utility or reward functions towards each other, does it mean that those functions are well grounded[2]?

It seems that it should be the case. Zero-sum between and means that, for all possible worlds , . There are no actions that we - or any agent - could do that breaks that fundamental equality. So it seems that must be defined by features of the world; grounded symbols.

Now, these grounded symbols might not be exactly what we thought they were; its possible we thought was defined on human happiness, but it is actually only means current in a wire. Still, must then be defined in terms of absence of current in the wire. And, whatever we do with the wire - cut it, replace it, modify it in cunning ways - and must reach opposite on that.

Thus it seems that either there is some grounded concept that and are opposite on, or and contain exhaustive lists of all special cases. If we further assume that and are not absurdly complicated (in a "more complicated than the universe" way), we can rule out the exhaustive list.

So, while I can't say with full confidence that a true zero-sum game must mean that the utilities are grounded, I would take such a thing as a strong indication that they are.

  1. If you thought that was large, nothing will prepare you for - the fast-growing hierarchy indexed by the large Veblen Ordinal. There is no real way to describe how inconceivably huge this number is. ↩︎

  2. Assuming the functions are defined in the world to some extent, not over platonic mathematical facts. ↩︎


Ω 10

2 comments, sorted by Highlighting new comments since Today at 4:07 PM
New Comment

Designing a true 0 sum game situation is quite straightforward. Or at least a situation which both AI's think is zero sum, and don't try to cooperate. Consider both AI's to be hypercomputers with a cartesian boundary. The rest of the world is some initially unknown Turing machine. Both agents are the obvious 2 player generalization of AIXI, The reward signal is shared after the magic incorruptible Cartesian boundary.

This is something that could be programmed on an indestructible hypercomputer.

I also suspect that some of the easiest shared 0 sum goals to make might be really wierd. Like maximise the number of ones on the right side of the tape head in a Turing machine representation of the universe.

You could even have two delusional AI's that were both certain that phlogisten existed, one a phlogisten maximizer, the other a phlogisten minimizer. If they come up with the same crazy theories about where the phlogisten is hiding, they will act 0 sum.

I don 't think this is straightforward in practice - and putting a cartesian boundary in place is avoiding exactly the key problem. Any feature of the world used as the item to minimize/maximize is measured, and uncorruptable measurement systems seems like a non-trivial problem. For instance, how do I get my GAI to maximize blue in an area instead of maximizing the blue input into their sensor when pointed at that area? We need to essentially solve value loading and understand a bunch of embedded agent issues to really talk about this.