# 5

Let's consider a very simple situation: you're an agent playing a game like this: A random point is generated, which is shown to you. Then you have to choose an action . This leads to a world-state , where is some "hidden variable", which is randomly generated but which you don't see.

Now, the way to make decisions in a situation like this is to form some "statistical model". In this very simple situation, maybe that just means a set of conditional probabilities , giving the probability of each world-state as a function of your action and the you see.

On the other hand, it could also be for various subsets - giving the probability of various *outcomes". If these outcomes don't partition completely, this is a "coarse" world-model - you only make predictions about some partial information about .

We could even have a model like , which makes prediction given only partial information about the "known" variables.

Now what makes a model like this good? There are two families of answers to this:

• A model is good if it matches the world - if the probability really is the true conditional probability given those inputs. Note that this generalizes straightforwardly to the case of and , but not to the case . The probability is not so straightforward to define.
• A model is good if it gives good outcomes. Given some utility function , for example, a model like the above recommends some action for each , namely the one that maximizes the expected utility. Here there are also some questions about what to do if you only have a coarse model.

The big advantage of the former viewpoint - "epistemic rationality" - is that it doesn't require you to know the utility function. You can go out into the world and discover an accurate model, and then figure out what to optimize. This also means that you can give your model to other agents with other utility functions facing the same problem, and it will help them out too.

Attainable utility is an approach to measuring the "impact" of actions, which goes something like this

• To measure the impact of a given action relative to a given utility function, we just see how much it changes the "attainable utility", i.e the utility we expect to obtain if we try to optimize it from now on.
• To measure the impact of a given action non-relatively, we aggregate the above value for all possible utility functions ("aggregate" here is supposed to be a bit vague, see the link for details).

Has anybody used this approach to evaluating models? Saying "a model is good if it leads to effective actions for [all/a wide range of/most] utility functions"?

# 5

New Comment

I was playing with a variant of the Good Regulator Theorem recently which smells pretty similar to this.

The usual theorem says, roughly, "there exists an optimal policy which first constructs a model of the environment from its inputs, then makes a choice as a function of the model rather than a function of the inputs directly". (In the original theorem, this was a deterministic model, and the inputs were assumed to be rich enough to perfectly reconstruct the environment state. This is easy to relax to a probabilistic model, however, with less-than-perfectly-informative inputs.)

The obvious shortcoming of this theorem is that it only says "there exists an optimal policy..."; in general, there may be far simpler optimal policies which do not explicitly build a model before making a choice. So: under what circumstances must an optimal policy build a model?

The approach I was playing with is conceptually similar to some of the ideas from Risks From Learned Optimization. Basically: an information bottleneck can force the use of a model. In the first timestep, the "agent" receives some input data X, and must choose a summary S(X) of that data to pass to itself in the second timestep. In the second timestep, it receives both the summary S(X) and some additional data Z. We can think of Z as "choosing which game the agent is playing", i.e. Z chooses a utility function. The summary S(X) (i.e. the model) must therefore summarize all information relevant to any of the possible games which Z could choose, in order to achieve optimal play. For sufficiently rich Z, that means that the summary must include a full model of the environment.

(Drawing the parallel to mesa-optimizers: the first-timestep decision is analogous to the outer optimizer, the second-timestep decision is analogous to the inner optimizer. The inner optimizer has to work with models and optimization and whatnot mainly because it needs to process a bunch of information Z which is not available to the outer optimizer ahead of time; that's why the first-timestep decision can't just be "make decision Y in the next timestep".)

Linking this back to the things you're talking about: roughly speaking, if a model contains enough information for optimal play against a sufficiently rich set of utility functions, then the model "matches the world" (at least those parts of the world relevant to the utility functions).

For sufficiently rich Z, that means that the summary must include a full model of the environment.

Is this a thoerem you've proven somewhere?

I have it in a notebook, might make a post soonish.

I ask because I already have a result that says this in MDPs: you can compute all optimal value functions iff you know the environment dynamics up to isomorphism.

(John made a post, I'll just post this here so others can find it: https://www.lesswrong.com/posts/Dx9LoqsEh3gHNJMDk/fixing-the-good-regulator-theorem)