Four levels of understanding decision theory

This post received a pretty mixed-to-negative reception.

Looking back on my own writing, I think there are at least a couple of issues:

It's pretty high context; much of the post relies on the reader already understanding or at least being familiar with https://arxiv.org/abs/1401.5577 and logical decision theories. To readers who are familiar with those things, the ideas in this post might not be very interesting or novel.
It's somewhat unmotivated: it's not clear what position or misconception a real person might actually have that this post clears up or argues against.

This post is an attempt to explain why a bunch of students who have learned all about game theory and decision theory won't necessarily end up with a bunch of (C,C) outcomes in a classroom simulation. This is true even if the students really want to cooperate, and they have correctly understood the class material on a very deep level. There's still a missing "implementation" piece involving legibility and counterparty modeling that requires separate cognitive skills (or the ability to set up a bot arena, make binding commitments / arrangements outside the simulation, etc.), which aren't necessarily closely related to understanding the decision theory itself.

But maybe this point is obvious even to the (fictional) students themselves. Anyway, regardless of whether you read or liked this post, I recommend reading planecrash, especially if you like fiction with lots of decision theory mixed in.

Pretty cool!
Just to add, although I think you already know: we don't need to have a reflexive understanding of your DT to put it into practice, because messy brains rather than provable algo etc....
And I always feel it's kinda unfair to dismiss as orthogonal motivations "valuing friendliness or a sense of honor" because they might be evolutionarily selected heuristics to (sort of) implement such acausal DT concerns!

[-]Vladimir_Nesov3y20

without actually being capable of performing the counterparty modeling, legibility, and other cognitive work necessary to implement that decision theory to any degree of faithfulness

This is not needed, you can just submit PrudentBot as your champion for a given interaction, committing to respect the adjudication of an arena that has the champions submitted by yourself and your counterparty. The only legibility that's required is the precommitment to respect adjudication of the arena, which in some settings can be taken out of players' hands by construction.

[-]Max H3y20

PrudentBot is modelling its counterparty, and the setup in which it runs is what makes the modelling and legibility possible. To make PrudentBot work, the comprehension of decision theory, counterparty modelling, and legibility are all required. It's just that these elements are spread out, in various ways, between (a) the minds of the researchers who created the bots (b) the source code of the bots themselves (c) the setup / testbed that makes it possible for the bots to faithfully exchange source code with each other.

Also, arenas where you can submit a simple program are kind of toy examples - if you're facing a real, high-stakes prisoner's dilemma and you can set things up such that you can just have some programs make the decisions for you, you're probably already capable of coordinating and cooperating with your counterparty sufficiently well that you could just avoid the prisoner's dilemma entirely, if it were happening in real life and not a simulated game.

[-]Vladimir_Nesov3y20

PrudentBot's counterparty is another program intended to be legible, not a human. The point is that in practice it's not necessary to model any humans, humans can delegate legibility to programs they submit as their representatives. It's a popular meme that humans are incapable of performing Löbian cooperation, because they can't model each other's messy minds, that only AIs could make their own thinking legible to each other, granting them unique powers of coordination. This is not the case.

if it were happening in real life and not a simulated game

Programs and protocols become real life when they are given authority to enact their computations. To the extent Pareto inefficient outcomes actually happen in real life, it's worth replacing negotiations with things like this, and fall back to BATNA when the arena says (D,D).

[-]Max H3y20

The point is that in practice it's not necessary to model any humans,

Right, but my point is that it's still necessary for something to model something. The bot arena setup in the paper has been carefully arranged so that the modelling is in the bots, the legibility is in the setup, and the decision theory comprehension is in the author's brains.

I claim that all three of these components are necessary for robust cooperation, along with some clever system design work to make each component separable and realizable (e.g. it would be much harder to have the modelling happen in the researcher brains and the decision theory comprehension happen in the bots).

Two humans, locked in a room together, facing a true PD, without access to computers or an arena or an adjudicator, cannot necessarily robustly cooperate with each other for decision theoretic reasons, even if they both understand decision theory.

[-]Vladimir_Nesov3y*20

When you don't model your human counterparty's mind anyway, it doesn't matter if they comprehend decision theory. The whole point of delegating to bots is that only understanding of bots by bots remains necessary after that. If your human counterparty doesn't understand decision theory, they might submit a foolish bot, while your understanding of decision theory earns you a pile of utility.

So while the motivation for designing and setting up an arena in a particular way might be in decision theory, the use of the arena doesn't require this understanding of the human users, and yet it can shape incentives in a way that defeats bad equilibria of classical game theory.

[-]Dagon3y2-1

It's nice to separate the levels between modeling a decision theory, analyzing multiple theories, and actually implementing decisions. I don't think I'd number them that way, or imply they're single-dimensional.

It seems level 1 is a prereq for level 4, and 2 and 3 are somewhat inter-dependent, but it's not clear that an agent needs to understand the formal theory in order to actually avoid D,D. The programmer of the agent does, but even then "understand" may be too much of a stretch, if evolutionary or lucky algorithms manage it.

[-]Max H3y10

but it's not clear that an agent needs to understand the formal theory in order to actually avoid D,D.

They definitely don't, in many cases - humans in PDs cooperate all the time, without actually understanding decision theory.

The hierarchy is meant to express that robustly avoiding (D,D) for decision theory-based reasons, requires either that the agent itself, or its programmers, understand and implement the theory.

Each level is intended to be a pre-requisite for the preceding levels, modulo the point that, in the case of programmed bots in a toy environment, the comprehension can be either in the bot itself, or in the programmers that built the bot.

I don't see how level 2 depends on anything in level 3 - being at level 2 just means you understand the concept of a Nash equilibrium and why it is an attractor state. You have a desire to avoid it (in fact, you have that desire even at level 1), but you don't know how to do so, robustly and formally.

[-]metacoolus3y20

I appreciate the detailed taxonomy in this post, and it's an insightful way to analyze the gaps between understanding and implementing decision theory. However, I believe it would be even more compelling to explore how AI-based cognitive augmentation could help humans bridge these gaps and better navigate decision processes. Additionally, it would be interesting to examine the potential of GPT-style models to gain insight into AGI and its alignment with human values. Overall, great read!

[-]Max H3y20

Thanks! Glad at least one person read it; this post set a new personal record for low engagement, haha.

I think exploring ways that AIs and / or humans (through augmentation, neuroscience, etc.) could implement decision theories more faithfully is an interesting idea. I chose not to focus directly on AI in this post, since I think LW, and my own writing specifically, has been kinda saturated with AI content lately. And I wanted to keep this shorter and lighter in the (apparently doomed) hope that more people would read it.

	1: C	1: D
2: C	(3, 3)	(5, 0)
2: D	(0, 5)	(2, 2)

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

12

Four levels of understanding decision theory

12

12

Level 1: Understanding that good things are good, and bad things are bad

Level 2: An understanding of, and desire to avoid, the Nash equilibrium

Level 3: Knowing and understanding formal theories for when and how to avoid Nash equilibria

Level 4: Actually avoiding (D,D) for decision theory reasons