Computers exchanging messages on a network must decide how fast or slow to transmit messages. If everyone transmits too slowly then the network is underutilized, which is bad for all. If everyone transmits too quickly then most messages on the network are actually flow control messages of the form "your message could not be delivered, please try again later", which is also bad for everyone.

Unfortunately, this leads to a classic prisoner's dilemma. It is in each node's own self-interest to transmit as quickly as possible, since each node has no information about when exactly an intermediate node will accept/drop a message, so transmitting a message earlier never decreases the probability that it will be successful. Of course, this means that the Nash equilibrium is a near-complete network breakdown in which most messages are flow control messages, which is bad for everyone.

Interestingly, some folks at MIT noticed this, and also noticed that the idea of superrationality (of Douglas Hofstadter origins, and the grandfather of TDT and friends) is one way to get past prisoner's dilemmas --- at least if everyone is running the same algorithm, which, on many networks, people mostly are.

The idea put forward in the paper is to design flow control algorithms with this in mind. There is an automated design process in which flow control algorithms with many different parameter settings are sampled and evaluated. The output is a program that gets installed on each node in the network.

Now, to be fair, this isn't exactly TDT: the end-product algorithms do not explicitly consider the behavior of other nodes in the network (although they were designed taking this into account), and the automated design process itself is really just maximizing an ordinary utility function since it does not expect there to be any other automated designers out there. But nevertheless, the link to superrationality, and the fact that the authors themselves picked up on it, was, I thought, quite interesting.


New Comment
11 comments, sorted by Click to highlight new comments since: Today at 1:02 AM

Another unsatisfying Nash equilibrium in traffic control I'd like to see analyzed from a modern decision theory perspective is Braess's Paradox.

Initially it looks like so long as a sufficient number of agents are using a decision theory that can provably cooperate on one shot PD (as in the Modal Agent paper discussed recently) they can coordinate to decrease the individual cost below the Nash Equilibrium. The number of agents required depends on the network graph. Agents that can provably pick a Nash-suboptimal path if enough other agents provably pick complementary paths such that the individual cost is reduced will have lower costs than the original Nash Equilibrium.

In the A-B network example on Wikipedia the number of agents required to cooperate would be >1000 to beat the 80 minute equilibrium cost by lowering the T/100 path costs below 35 minutes each by half the agents taking the start-A-end path and the other half taking the start-B-end path, leaving <3500 start-A-B-end paths.

prisoner's dilemma

Pretty confident it's actually an example of the Tragedy of the Commons, as there are multiple players, the thing of interest is continuous, and no one player desires to send as many packets as possible.

Is there any way in which The Prisoner's Dilemma and The Tragedy of the Commons do not describe identical games? (I mean, any way other than that the former is traditionally conceptualized as a two-player game while the latter can have any number of players.)

Yes; in the Tragedy of the Commons the actions of the "other player," whether or not more than n people use the commons, partially depends on your choice. See here.

Also, as I just said, the thing of interest (how many packets you send/how many goats you graze) is continuous (ish) rather than a discrete choice. This has an important consequence: while an analogy to the PD would lead one to believe that the Nash Equilibrium would be everybody sending as many packets as possible, that's not actually the right answer in the Tragedy of the Commons: no one player desires to send as many packets as possible.

Relatedly, most TCP scheduler are variants of the Reno algorithm, which basically means that they increase transmission rate until (the network is so congested that) packets begin dropping. In contrast, Vegas-type schedulers increase transmission rate until packets begin to take longer to arrive, which starts happening in congested networks shortly before packets are actually lost. A network of Vegas machines has considerably lower latency, and only minimally worse throughput, than a network of Reno machines.

Unfortunately, since Reno backs off later than Vegas, a mixed Vegas/Reno network ends with the Reno machines consuming the vast majority of bandwidth.

Interestingly, while almost all TCP schedulers are Reno variants (i.e. efficient in the presence of likely neighbours), there is basically no-one who entirely foregoes a scheduler and just sends as fast as possible, which was the original pre-Reno behaviour (and which is pretty optimal for the individual, at least until the entire internet collapses due to ridiculous levels of congestion. This has happened.)

Unfortunately, since Reno backs off later than Vegas, a mixed Vegas/Reno network ends with the Reno machines consuming the vast majority of bandwidth.

Unless they're idle most of the time, that is. Anybody who's run a modern BitTorrent client alongside a web browser has been in this situation: the congestion control protocol used by most BitTorrent clients watches for packet delays and backs off before TCP, so it's much lower-priority than just about everything else. Even so, it can end up using the vast majority of bandwidth, because nobody else was using it.

The PD is iterated, so the agents ought to be able to achieve cooperation without needing superrationality.

I don't think that would work when there are many players per game.

Do the routers even "play"? Do the routers, just executing their programming, count as "agents" with "goals"? Assuming that the users don't normally change their router's firmware, this seems "merely" like an optimization problem, not like a problem of game theory.

You're right - most users don't rewrite their TCP stack. But suppose you're designing the next version of TCP and you think "hey, instead of using fixed rules, let's write a TCP stack that optimizes for throughput". You will face a conceptual issue as you realize that the global outcome is now total network breakdown. So what do you optimize for instead? Superrationality says: make decisions as though deciding the output for all nodes at the current information set. This is conceptually helpful because it tells you what you should be optimizing for.

Now if you start out from the beginning (as in the paper) by thinking of optimizing over algorithms, with the assumption that the output will be run on every node then you're already doing superrationality. That's all superrationality is!

New to LessWrong?