Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Some Existing Selection Theorems

8Adele Lopez

6Adele Lopez

New Comment

2 comments, sorted by Click to highlight new comments since: Today at 5:20 PM

Zurek's einselection seems like perhaps another instance of this, or at least related. The basic idea is (very roughly) that the preferred basis in QM is preferred because persistence of information selects for it.

This post illustrates how various existing

Selection Theoremsand related results fit the general framework - what they tell us about agent type signatures, and how they (usually implicitly) tell us what agent types will be selected. I invite others to leave comments about any other Selection Theorems you know of - for instance,Radical ProbabilismandLogical Inductorsare results/frameworks which can be viewed as Selection Theorems but which I haven’t included below.This post assumes you have read the

intro post on the Selection Theorem program. The intended audience is people who might work on the Selection Theorem program, so these blurbs are intended to be link-heavy hooks and idea generators rather than self-contained explanations.## The Gooder Regulator Theorem

The

Gooder Regulator Theoremtalks about the optimal design of a “regulator” (i.e. agent) in an environment like this:When viewed as a Selection Theorem, the outer optimization process selects for high values of Z and low-information models M (i.e. models which don’t take up much space). Assuming that Z is a “sufficiently flexible” function of Y, the theorem says that the optimal “model” M is isomorphic to the Bayesian posterior distribution (s↦P[S=s|X]). In other words, the system’s internal structure includes an explicit Bayesian world model.

## Coherence Theorems

This cluster of theorems is the most common foundation for agent models today. It includes things like

Dutch Book Theorems,Complete Class Theorem,Savage’s Theorem,Fundamental Theorem of Asset Pricing, variations of these, and probably others as well. These theorems provide many paths to the same agent type signature: Bayesian expected utility maximization.Besides the obvious type-signature assumption (the “bets”), these theorems also typically have some more subtle assumptions built in - like the

need for a money-like resourceor theabsence of internal agent stateorsomething to do with self-prediction. They apply most easily to financial markets; other applications usually require some careful thought about what to identify as “bets” so that the “bets” work the way they need to in order for the theorems to apply.Typically, these theorems say that a strategy which does

notsatisfy the type signature is strictly dominated by some other strategy. Assuming a rich enough strategy space and a selection process which can find the dominating strategies, we therefore expect selection to produce a strategy which does satisfy the type signature (at least approximately).Ifthe assumptions of the theorem can actually be fit to the selection process, that is.## Kelly Criterion

The

Kelly criterionuses a similar setup to the Coherence Theorems, with the added assumption that agents makesequential, independentbets and can bet up to their total wealth each time (a model originally intended for traders in financial markets or betting markets). Under these conditions, agents which maximize theirexpected log wealthat each timestep achieve the highest long-run growth rate with probability 1.The type signature implied by the Kelly criterion is similar to the previous section, except the utility is specifically log wealth.

As a selection theorem, the Kelly criterion is especially interesting because it’s

specifically about selection. It does not give any fundamental philosophical reason why one “should” want to maximize expected log wealth; it just says that agents whichdomaximize log wealth will be selected for. So, in environments where the Kelly assumptions apply, those are the agents we should expect to see.## Subagents

Fun fact: financial markets themselves make exactly the kind of “bets” required by the Coherence Theorems, and are the ur-example of a system not dominated by some other strategy. So, from the Coherence Theorems, we expect financial markets to be equivalent to Bayesian expected utility maximizers, right? Well, it turns out they’re not - a phenomenon economists call “nonexistence of a representative agent”. (Though, interestingly, a market of Kelly criterion agents

isequivalent to a Bayesian expected utility maximizer.)When we dive into the details, the main issue is that markets have

internal statewhich can’t be bet on. If we update Coherence to account for that, then it looks likemarkets/committees of expected utility maximizersare the appropriate type signature for non-dominated strategies (rather than single utility maximizers). In other words, this type signature hassubagents.Again, the type signature is mostly similar to the Coherence Theorems, but tweaked a bit.

(Note: this type signature is only conjectured in the linked post; the post proves only the non-probabilistic version.)

## Instrumental Convergence and Power-Seeking

Turner’s

theorems on instrumental convergencesay that optimal strategies for achievingmostgoals involve similar actions - i.e. “power-seeking” actions - given some plausible assumptions on the structure of the environment. These theorems are not Selection Theorems in themselves, but they offer a possible path to construct a money-like “utility measuring stick” for selected agents in systems with no explicit “money” - which would allow us to more broadly apply variants of the Coherence Theorems.## Description Length Minimization = Utility Maximization

The

equivalence between Description Length Minimization and Utility Maximizationtells us two things:This result is interesting mainly because it offers a way to apply information-theoretic tools directly to goals, but we can also view it as a (very weak) Selection Theorem in its own right.

This result can also be viewed as a way to characterize the selection process (i.e. outer optimizer), rather than the selected agent.