Optimizers: To Define or not to Define

Introduction

The definition of "optimizer" has been a sort of rationalist pastime for a many years now. Many different ones have been suggested, and many have fallen down. Most have been in terms of the behaviour of the system in question, which is probably not what we really want, as we might want to tell if something is an optimizer without having to observe it in action, developing an optimizer detector so-to-speak.

Obviously as LWers we are not searching for some "true essence" of the meaning behind the word, we are really searching through different descriptors which are good at slicing through thingspace, and identifying what we already call optimizers. If we pick sensible descriptors, they ought to correlate with one another, and if we do enough of this, we might find that some cause one another. Finding the root causal nodes that make some things optimizers would be very useful. That would be about as close to a definition as we could ask for.

A New Description

I think that a key part of optimization is observation of the optimized domain, which from an outside perspective means there is mutual information between the optimizer and the optimized domain. The other key component is that the optimizer acts on the external world in a way which depends on this mutual information. To really be an optimizer it should also fit existing criteria like compressing possible futures. We can formalize this mathematically as such:

We have an optimizing agent and an external optimization domain $X$ . To make most of the maths work later we ought to think of them as distributions over optimizers and domains.

Observation is a function which depends on X and maps one distribution of A to another: (In principle it can be defined in terms of individual A and X values, but the function can be extended to probability distributions over A and X the same way other functions can)

$O b s_{X} : A_{o l d} \to A_{n e w}$

It also ought to have the property of increasing mutual information between A and X, in order to be a decent learning process.

1: $I (A; X) < I (O b s_{X} (A); X)$

The next part is an output (action) function:

$A c t_{A} : X_{o l d} \to X_{n e w}$

Which should have the property of compressing domain-space:

2: $H (X) > H (A c t_{A} (X))$

And we should also add another condition, that the more an optimizer learns, the better it is at optimizing. First let us define an "improvement function" which will help avoid notational confusion: (note that $I m p$ should generally be positive thanks to our above inequality)

$I m p (A, X) = H (X) - H (A c t_{A} (X))$

This is the optimizing power of a given action. Now we can formalize our notion of learning

3: $I m p (A, X) < I m p (O b s_{X} (A), X)$

Equivalent:

$H (X) - H (A c t_{A} (X)) < H (X) - H (A c t_{O b s_{X} (A) X} A)$

Also Equivalent:

$H (A c t_{A} (X)) > H (A c t_{O b s_{X} (A)} X)$

This last one looks like the framing "once we have observed the environment, our action results in more compression of the environment "

Most of these inequalities will not hold for all possible cases (and therefore distributions) of A and X, but for the majority of reasonable distributions they ought to, otherwise we wouldn't have a very useful optimizer.

How does this apply to already-existent discussion points on what an optimizer is?

A discussion point on optimization is "Does my liver optimize my bank account?". Within this framework, the answer is no. Since I have stolen the idea of optimization as compression, I have sort of cheated. However even without this, there is no mutual information between my liver and my bank account, and if there was, getting more information from the bank account to the liver would be unlikely to change how the liver acted towards the bank account in a meaningful way. With other examples I will also avoid using this point if possible.

Thermostats are another talking point. They have 1 bit of mutual information with the outside world: "is the temperature greater than some threshold?". They act on this information with one bit of output: "is the heating on or off?". They also compress the potential future of temperatures range under general conditions, if they didn't we wouldn't use them!

However whether or not this range is narrower than would occur without the thermostat actually depends on the external world. If the thermostat is in a spaceship in deep space, there is probably a much wider range of temperatures reached with it on than off: in the long run fluctuating between 20 and 22 degrees has a lot more entropy (in the sense of uncertainty over temperatures, not just in the thermodynamic sense) than an eternal 2.725 kelvin. If it is in a typical house then preventing the house from ever dropping between 20 degrees but doing nothing otherwise does compress the space of potential futures.

This perhaps shows an inconsistency in the model. On the other hand, we might expect some systems to perform like an optimizer in some cases and not others: A chess-playing AI could well be said to observe the board and optimize towards the set of positions which are good for it, but it won't act as an optimizer on a go board. Then again the thermostat on the spaceship does seem to still be doing something optimizer-ey, and beyond the

This description also helps distinguish between the optimizer and the optimized, and also identifies mesa-optimizers, with the classic example being humans. Human brains do not have much mutual information with their reproductive success, only proxies for it (status, frequency of orgasm, even number of children we appear to have is not exactly reproductive success) and generally optimize for those.

Conclusions

In this framework, we have three criteria on which to evaluate optimizers. Hopefully they will correlate in meaningful ways, but perhaps they won't. Also as with many mathematical formulations of AI-like theory, this might not be as mathematically rigorous as the notation would lead you to believe.

There is also possibly some ambiguity as to how wide we should draw the boundaries of the optimization domain X. While it makes sense (or seems to) to say that something the optimizer does not observe cannot (even indirectly, so as to have no information about it at all) be part of the optimization domain, it also makes sense (or seems to) that variables not optimized could be part of the observed domain (for example observing "upstream causes" like observing someone's genome to optimize for low frequency of a disease with a genetic component). Possibly we need two categories, X which is optimized and a larger observation domain Y, which is things with some sort of mutual information with Y.

The biggest problem is that we still have to observe the optimizer (or non-optimizer) in action to determine if it is indeed an optimizer. This fails the goal of making an optimizer detector. To detect optimizers usig this framework we might need to understand the very basics of what learning algorithms look like, which could be pretty much GAI-complete (i.e. by the time we can do this, we can build a general AI, and it is crunch time, and worrying about optimizers is not important anymore). This has helped me clarify my own thoughts on optimizers if nothing else. I am open to any and all feedback.

LESSWRONG
is fundraising!
LW