Decision theory and zero-sum game theory, NP and PSPACE

[-]Wei Dai8y240

Joan Feigenbaum's Games, Complexity Classes, and Approximation Algorithms may be a good citation for zero-sum game theory = PSPACE.

I'm not sure Decision Theory = NP makes sense. If a problem is in NP then you can produce a certificate (verifiable in polynomial time) showing that a solution is correct. If you're iterating over possible actions to find which one has the highest expected utility, I don't see what certificate you can output (in general) that would let someone check that the chosen action is the highest EU one, without going through the whole computation all over again. Perhaps I'm expecting the analogy to be tighter than intended?

The recent AI safety via debate paper gave a Supervised Learning = P, Reinforcement Learning = NP, DEBATE = PSPACE analogy (section 2.2), which does make sense to me.

[-]jessicata8y40

Thanks for the links, those are relevant.

If you have an NP oracle, you can first ask it "is there an action with EU >= k" for some medium k value, then binary search until you find the highest k value (within epsilon) for which there is an action with EU >= k, assuming that EU can be computed in polynomial time. I agree that you can't prove that a particular sequence of actions has the highest possible EU.

[-]daozaich8y40

"what move should open with in reversi" would be considered as an admissible decision-theory problem by many people. Or in other words: Your argument that EU maximization is in NP only holds for utility functions that permit computation in P of expected utility given your actions. That's not quite true in the real world.

[-]Marcello8y190

I agree with some of the other commenters that the term "decision theory" ought to be reserved for the overarching problem of which decision algorithm to use, and that the distinction you're referring to ought to be called something like "adversarial" vs "non-adversarial" or "rival" vs "non-rival". Nonetheless, I think this is an interesting handle for thinking about human psychology.

If we view these as two separate modes in humans, and presume that there's some kind of subsystem that decides which mode to use, then false positives of that subsystem look like potential explanations for things like The Imp of the Perverse, or paranoia.

[-]cousin_it8y110

Another analogy is that Bayesianism = decision theory and frequentism = game theory. Bayesians try to find the best response to a particular mixed strategy of nature (prior), while frequentists try to achieve "frequentist guarantees" which is like minimaxing against nature. Wald's complete class theorems are pretty much about that.

[-]abramdemski8y90

Is there yet another cognitive mode for general-sum game theory? I suspect so. I think I've seen several times that the purely cooperative case (which might be like the "decision theory" case here) and the purely competitive case ("zero sum" here) have tractable analyses, but questions of general games are open. I ran into this in multi-agent RL (though I'm not sure if it is still the case -- I was looking at older papers).

This makes sense: it is easier to assume someone is either with you or against you than to navigate complicated preferences.

In terms of computational complexity, finding Nash equilibria is PPAD complete, but I can't say anything about what that means for the story here.

[-]Zhu Xiaohu4y10

Mention a recent interesting work here: On the Complexity of Computing Markov Perfect Equilibrium in General-Sum Stochastic Games gave a related analysis on the comuting of Markov PE for RL agents.

[-]Ben Pace8y70

This post helped me understand NP noticeably more than the complexity course I was just examined on - thanks. I didn't know there was such a thing as descriptive complexity theory (i.e. lists of correspondences between various subsets/closures of FOL/SOL for a bunch of difference classes). I'll probably read about it a bunch this weekend.

[-]abramdemski8y70

It seems unfortunate that "zero sum" is the term of art. Utility functions are only meaningful up to (positive) scalar multiplication and a constant. So, zero-sum games can be constant-sum if we add a constant to someone's utility function, or ever variable sum if we multiply one of the player's utility functions by a constant. "adversarial games" or "(purely) competitive games" or something like that would be more descriptive.

[-]habryka8y50

Promoted to curated. Here are my thoughts:

I think this post hits a narrow target of taking a fairly high-level and complicated intuition, and successfully formalizing parts of it by putting it into the context of existing work in an established technical field. I think this kind of work is one of the most promising avenues for making progress in rationality, and has been one of the key drivers of progress historically (see as an example of this Eliezer's Technical Explanation of a Technical Explanation).

Two things that seem like potentially great additions to this post are maybe one or two exercises that could test your understanding of the analogy and idea (one of the major benefits of technical posts is that you can often add exercises that have a clear-cut answer), and some more exploration of how this frame on decision theory and game theory interacts with existing writing on LW (i.e. more exploration of the form "this is what this implies about this thought experiment that was discussed previously").

[-]paulfchristiano7y40

Note that imperfect information games are complete for EXP rather than PSPACE. (I'm surprised that I'd never run into this fact explicitly before. Scott Aaronson pointed it out to me today.)

And of course cooperative games with two non-communicating players are complete for NEXP.

ETA: also note that playing a "game against nature" is also PSPACE complete, if nature is stochastic (since IP=PSPACE).

[-]Dagon8y40

Edit: I misunderstood. Left for historical reasons, but not important. Please ignore.

Wait - may I ask for some remedial instruction on why zero-sum game-theory isn't a strict subset of decision theory, just with pessimistic assumptions about some of the variables? I can turn any decision problem into a game-theory problem by adding an opponent with opposite values from me, right? I strongly suspect that _both_ are NP generally, and I can give many specific examples of both that can be solved in polynomial time.

I'm confused by your "cognitive modes" idea as well - it may be true, but I don't see how it follows, if decision theory is more difficult (NP) than game theory (P), that

In decision theory, you plan with no regard to any opponents interfering with your plans, allowing you to plan on arbitrarily long time scales. In zero-sum game theory, you plan on the assumption that your opponent will interfere with your plans (your ∃s are interleaved with your opponent’s ∀s), so you can only plan as far as your opponent lacks the ability to interfere with these plans

This seems to lean in the opposite direction - in decision theory, you can easily answer as far out as you like, in zero-sum game-theory you're far more limited. It is also more about certainty of game state than about motivation or existence of opponent. A plan with no opponents but also a bunch of randomness (or unknown starting state; same thing for these purposes) is MORE impossible to follow than a plan with known state and an active opponent.

I do somewhat agree with your final paragraph, but I don't agree with your premise nor reasoning to get there.

[-]ESRogs8y40

if decision theory is more difficult (NP) than game theory (P)

Where did this come from? Did you confuse PSPACE with P?

[-]Dacyn8y10

If you add an opponent with opposite values from you into a decision theory problem, you are adding a component that is approximately as hard to compute the behavior of as it is to solve the problem as a whole (since it depends on a computation by the opponent which is of roughly the same form).

P and PSPACE are not the same.

[-]Dagon8y20

My point is exactly that P and PSPACE are not the same. Addition of an opponent cannot turn NP into P, so "decision problem" vs "zero-sum game"cannot be the distinction which defines the problem space.

[-]jessicata8y10

Why do you think zero-sum game theory is P? The quantified Boolean formula problem is PSPACE-complete.

[-]Dagon8y30

Oh my - I completely misread the post, thinking it was comparing P vs PSPACE, when in fact it's comparing NP to PSPACE. My point that competitive decisions are just a special-case of decision theory stands, but it's far less important.

Nevermind!

[-]Dacyn8y30

Typo: $x_{n}$ , $y_{n}$ should be $x_{k}$ , $y_{k}$

[-]jessicata8y10

Fixed, thanks

[-]Alex Flint8y20

Minor nit: I always thought the term "decision theory" referred to the meta-level task of formulating an algorithm which, given fully specified beliefs and values, tells you how to compare possible actions. On the contrary, when I see someone making a concrete decision using an EU calculation or some such, I don't think of them as "doing decision theory". So perhaps "decision making" or "positive sum game theory" rather than "decision theory"? It probably doesn't matter much.

[-]zulupineapple8y20

If you can perfectly simulate your opponent, then a zero sum game is also in NP. Of course, in practice you can't, but you do have some reasonable beliefs about what the opponent will do, which, I think, moves most real life problems into the same gray-zone as high-reliability engineering.

Also, I suspect that real-life uncertainty about "f" in your NP problems likewise pushes them up in complexity, into that same gray-zone.

There may be two clear clusters in theory, but it's not certain that they remain distinct in practice.

LESSWRONG
LW

LESSWRONG
LW

56

Decision theory and zero-sum game theory, NP and PSPACE

56

56

Decision theory and NP

Zero-sum game theory and PSPACE

Different cognitive modes

Conclusion