Cleo Nardo

DMs open.


Game Theory without Argmax

Wiki Contributions


 isn't equivalent to  being Nash.

Suppose Alice and Bob are playing prisoner's dilemma. Then the best-response function of every option-profile is nonempty. But only one option-profile is nash.

 is equivalent to  being Nash.

Yes, , i.e. the cartesian product of a family of sets. Sorry if this wasn't clear, it's standard maths notation. I don't know what the other commenter is saying.

The impression I got was that SLT is trying to show why (transformers + SGD) behaves anything like an empirical risk minimiser in the first place. Might be wrong though.

My point is precisely that it is not likely to be learned, given the setup I provided, even though it should be learned.


How am I supposed to read this?

What most of us need from a theory of deep learning is a predictive, explanatory account of how neural networks actually behave. If neural networks learn functions which are RLCT-simple rather than functions which are Kolmogorov-simple, then that means SLT is the better theory of deep learning.

I don't know how to read "x^4 has lower RLCT than x^2 despite x^2 being k-simpler" as a critique of SLT unless there is an implicit assumption that neural networks do in fact find x^2 rather than x^4.

Wait I mean a quantifier in .

If we characterise an agent with a quantifier , then we're saying which payoffs the agent might achieve given each task. Namely,  if and only if it's possible that the agent achieves payoff  when faced with a task .

But this definition doesn't play well with a nash equilibria.

Thanks v much! Can't believe this sneaked through.

The observation is trivial mathematically, but it motivates the characterisation of an optimiser as something with the type-signature.

You might instead be motivated to characterise optimisers by...

  • A utility function 
  • A quantifier 
  • A preorder  over the outcomes
  • Etc.

However, were you to characterise optimisers in any of the ways above, then the nash equilibrium between optimisers would not itself be an optimiser, and therefore we lose compositionality. The compositionality is conceptually helpfully because it means that your  definitions/theorems reduce to the  case.

Yep, I think the Law of Equal and Opposite Advice applies here.

One piece of advice which is pretty robust is — You should be about to explain your project to any other MATS mentee/mentor in about 3 minutes, along with the background context, motivation, theory of impact, success criteria, etc. If the inferential distance from the average MATS mentee/mentor exceed 3 minutes, then your project is probably either too vague or too esoteric.

(I say this as someone who should have followed this advice more strictly.)


In a subsequent post, everything will be internalised to an arbitrary category  with enough structure to define everything. The words set and function will be replaced by object and morphism. When we do this,  will be replaced by an arbitrary commutative monad .

In particular, we can internalise everything to the category Top. That is, we assume the option space  and the payoff  are equipped with topologies, and the tasks will be continuous functions , and optimisers will be continuous functions  where  is the function space equipped with pointwise topology, and  is a monad on Top.

In the literature, everything is done with galaxy-brained category theory, but I decided to postpone that in the sequence for pedagogical reasons.

Load More