A new definition of "optimizer"

Chantiel

Here, I provide a new definition of "optimizer", as well as better explain a previous one I gave. I had previously posted a definition, but I think the definition I gave was somewhat wrong. It also didn't help that I accidentally wrote my previous definition incorrectly, and only days later realized I had written out the definition mistakenly (and then the modified definition contained a new mistake.) The way I presented my definition in the last article was also overly confusing, so I'll try to explain it better.

First, I'll present a someone else's proposed definition of "optimizer" and explain how my definitions are intended to improve upon is. Then, I'll summarize the definition I (meant to) give in my previous post, point out some potential problems, and then outline changes that could fix them. However, these modification make the definition more complicated, so I then I provide a new, though informal, definition of optimizer that eloquently avoid the issues with the original definition. Then, I compare my definition to a previous proposed definition of "optimizer" and explain how my definition is intended to improve upon it. Afterwards, I explain why defining "optimizer" doesn't matter much, anyways.

Alex Flint's definition

The article, "The ground of optimization" defines "optimizer", saying:

An optimizing system is a system that has a tendency to evolve towards one of a set of configurations that we will call the target configuration set, when started from any configuration within a larger set of configurations, which we call the basin of attraction, and continues to exhibit this tendency with respect to the same target configuration set despite perturbations.

Though this definition has some value, it is extremely broad and includes things that wouldn't be normally considered optimizers. For example, it considers toasters to be optimizers. Specifically, consider a toaster left unattended toasting a piece of bread. No matter how you reconfigure the toast within the bread, it will still end up getting toasty or burnt. So the bread robustly evolves towards being toasty or burnt and is thus, by the above definition, an optimizer.

Similarly, consider a rocket traveling through space that's set the burn through all its fuel. Consider the target set of configurations to be "has no fuel left". The rocket robustly evolves towards that state. No matter where you place the rocket, or how you rearrange its contents, it ends up without any more fuel.

As another example, consider a container with two chemicals, A, and B, that, together, make a chemical reaction causing a new chemical, C (with no by-products). After the chemicals have been mixed, From a wide range of configurations, the system of chemicals tends towards the configurations, "Everything is C". So from the large configuration space of "some combinations, or A, B, and C" it tends towards the smaller target configuration. So it would be considered an optimizer.

For a final example, more jarring, example, consider an animal starving to death in an area without any food. Take the target configuration space to be "Has ~0% body fat", and take the broader configuration space to be the animal at any weight. If the animal is anywhere in the broader configuration space, it will evolve towards being extremely thin. So, according to this definition, the animal, desperate to find food, is actually a low-body-mass optimizer.

More generally, pretty much any process that tends to change in some systematic manner over time would be considered, by this definition, to be an optimizer.

Normally, people don't consider toasters, leaves, and rocket ships to be optimizers. I think a definition that so broad is includes them may be impractical. I think the main use of formalizing "optimizer" may be to detect mesaoptimization. But If you were designing a system to avoid mesaoptimization in code, I doubt you would want to detect and flag a class of systems so broad that it includes leaves and toasters.

My definitions, by contrast, doesn't consider any of these things to be optimizers. The class my definitions consider to be optimizers is much more narrow, and I think aligns much more closely with intuitions about what counts as an optimizer.

My previous definition

In the previous post, I (intended to) define "optimizer" as "something that score 'unusually' highly on some objective function, such that for a very wide variety of other objective functions there exists a 'reasonably concrete' change of 'reasonably' short description length that would turn the system into one that scores 'unusually' highly on the other function."

'reasonably short description length' means "'significantly' shorter than specifying a system that scores unusually highly on the other objective function from scratch (that is, without referencing any other system). 'unusually highly', roughly, means 'if you were to browse around thingspace for other things and see how they score on the objective function, very few would score as highly as the optimizer'. 'reasonably concrete' just means you can't say things like, "a toaster, except instead it's an awesome paperclip maximizer!" Formalizing this completely precisely might be a bit tricky.

Problems and changes for my previous definition

Anyways, a potential problem with the above definition is that optimizers don't always score 'unusually highly' on their objective functions. Sometimes things are very counterproductive on the things they try to optimize for, for example someone orchestrating a war that was intended to make things better but actually made things much worse.

To fix this potential problem, I will just omit the requirement in the definition that the system score unusually highly on some objective function. The definition's simpler this way, anyways.

I'm not actually entirely sure this change is necessary; someone who does something that backfires horribly could normally still be seen as optimizing for some instrumental goal that is intended to accomplish what they actually care about but doesn't. Still I don't know if this would apply to every optimizer, since some might completely fail at everything, and the requirement added complexity to the definition, so I omitted it.

So, to justify my definition, note that if something is an optimizer that means it must be an implementation, analog or virtual, of some sort of optimization algorithm. And so you can modify the system to optimize for something else by describing a change that changes what is effectively the objective function of the system.

For example, humans are optimizers, because if you wanted to turn a human into a paperclip/cardboard box/whatever optimizer, you could just describe a change to their psychology that makes them obsessed with those things. And AIs are optimizers because you could just change their utility function, goals, or rewards to make them optimize for whatever.

Systems that evolve are optimizers, because you could create some modification to change what is reproductively successful in order to make creatures evolve towards something else.

Toasters aren't optimizers, because though they have remarkable bread-toasting abilities, I don't think you could specify a reasonably concrete change to a toaster to would make it turn into a paperclip optimizer in a way that would have significantly shorter description length than just describing a paperclip optimizer from scratch.

There's another potential problem. I considered trying to find clever ways to subvert the definition by defining an optimization processes in which the objective function is really tightly intertwined with the optimization process in a way that makes it really complicated to optimize the objective function without breaking the optimization mechanism. I couldn't think of any way to do this, but there might be one. So I'll describe a change to deal with this.

The basic idea is to realize that if there's some clever strategy of making such an optimizer, the most succinct way to describe this is probably to describe some optimization process that tried to make a system that optimizes for its ability to optimize for something while also optimizing for it being really hard to change to optimizing to something else. However, that system of generating the current system is also an optimization process. If its optimization-ability is also "obfuscated" like above, then it's probably most succinctly described it as the output of some other optimization process optimizing for such an obfuscated system with the right objective function. And I think that at some point there needs to be a "non-obfuscated" optimizer is a terse description of such a system.

So, the potential problem in the definition could be dealt with, perhaps, by changing the definition of "optimizer" by also describing anything that's most succinctly described as the output of a system optimizing for making an "obfuscated optimizer" is also an optimizer.

This feels a bit ad-hocy, but I guess it works. Maybe there's a more parsimonious way to deal with this. Or maybe we can just ignore this.

Another potential problem is that you could define some enormous system that features ad-hoc methods of accomplishing any objective function you can think of, as well as a system of choosing which ad-hoc method to use. My given definition would, mistakingly I guess, classify it as an optimize.

This doesn't seem like it really matters; if a system really is so complicated and general-purpose in its abilities, it sounds potentially very powerful and dangerous, and detecting if something is powerful or dangerous seems to be the main reason to define something as an optimizer anyways.

Still, if you want to modify the definition, you specify somehow that you can't just describe the system as a giant group of parts, each of which uses methods to score highly on a given objective function without being useful for specifying something hat scores well on some other objective function.

A new definition

Now that I've added changes to the definition that made is concerningly complicated and ad-hocky, I want provide a more eloquent, though informal, definition: "an optimizer is something that can perform well on some objective function, for a reason that generalizes". To be clear, "the reason that generalizes" means, that, the explanation can be generalized to create or understand systems that score highly on other functions, for a very wide variety of functions.

To justify the definition, an optimizer is an implementation, physical or virtual, of an optimization algorithm, and optimization algorithms can be used with other functions. And can you even imagine an optimizer that uses only completely ad-hock methods? I can't.

Humans are good at things for reasons that can generalize to other goals (because they have intelligence), and AIs are like this, too. Toasters aren't considered optimizers under this definition, because the mechanism behind their toasting abilities would not help with general goals.

The definition could potentially formalized by having some formal language describing "why the system works", and then show that there is a "reasonably simple" way of translating this explanation to an explanation of why some other systems scores well on some other functions.

There's another way to potentially formalize the definition. Note that if the reason generalizes, then that means a intelligent agent, with insufficient intelligence to come up with the system on its own, could use its understanding of the system to create other systems that score well on other functions. So you could potentially define some intelligent system of rather limited intelligence, give it an explanation of how an optimizer works, than ask is to make optimizers for other things. If it succeeds, then whatever you explain to it was an optimizer.

It doesn't matter

Anyways, I want to point out that I don't think it matters much if something's an optimizer, anyways. I think that the main reason it matters that something is an optimizer in AI is that they have the potential to become remarkably useful or dangerous, so you could just look for systems that have that property even if they aren't actually optimizers.

For example, some highly-advanced alien drone that operates on mere reflex to take control of worlds isn't an optimizer, but it's still just as dangerous as one due to its remarkably high ability to score highly on objective functions like "make the world taken over by aliens." And by not talking specifically about optimizers, you can avoid loopholes in a definition of it.

LESSWRONG
LW

5