Bottle Caps Aren't Optimisers

[-]evhub6yΩ9260Nomination for 2018 Review

Daniel Filan's bottle cap example was featured prominently in "Risks from Learned Optimization" for good reason. I think it is a really clear and useful example of why you might want to care about the internals of an optimization algorithm and not just its behavior, and helped motivate that framing in the "Risks from Learned Optimization" paper.

[-]DanielFilan6yΩ5200

Daniel Filan's bottle cap example

Note that Abram Demski deserves a large part of the credit for that specific example (somewhere between 'half' and 'all'), as noted in the final sentence of the post.

[-]Raemon6yΩ130

A reminder, since this looks like it has a few upvotes from AF users: posts need 2 nominations to proceed to the review round.

[-]DanielFilan6yΩ6150Review for 2018 Review

Review by the author:

I continue to endorse the contents of this post.

I don't really think about the post that much, but the post expresses a worldview that shapes how I do my research - that agency is a mechanical fact about the workings of a system.

To me, the main contribution of the post is setting up a question: what's a good definition of optimisation that avoids the counterexamples of the post? Ideally, this definition would refer or correspond to the mechanistic properties of the system, so that people could somehow statically determine whether a given controller was an optimiser. To the best of my knowledge, no such definition has been developed. As such, I see the post as not having kicked off a fruitful public conversation, and its value if any lies in how it has changed the way other people think about optimisation.

[-]Li peng Ye2yΩ010

Yes, I kind of agree with you

[-]orthonormal6yΩ5120

I'm surprised nobody has yet replied that the two examples are both products of significant optimizers with relevant optimization targets, and that the naive definition seems to work with one modification:

A system is downstream from an optimizer of some objective function to the extent that that objective function attains much higher values than would be attained if the system didn't exist, or were doing some other random thing.

[-]DanielFilan6yΩ260

I'm surprised nobody has yet replied that the two examples are both products of significant optimizers with relevant optimization targets.

Yes, this seems pretty important and relevant.

That being said, I think that that definition suggests that natural selection and/or the earth's crust are downstream from an optimiser of the number of Holiday Inns, or that my liver is downstream from an optimiser from my income, both of which aren't right.

Probably it's important to relate 'natural subgoals' to some ideal definition - which offers some hope, since 'subgoal' is really a computational notion, so maybe investigation along these lines would offer a more computational characterisation of optimisation.

[EDIT: I made this comment longer and more contentful]

[-]orthonormal6yΩ130

Okay, so another necessary condition for being downstream from an optimizer is being causally downstream. I'm sure there are other conditions, but the claim still feels like an important addition to the conversation.

[-]SilentCal7y90

I think what we need is some notion of mediation. That is, a way to recognize that your liver's effects on your bank account are mediated by effects on your health and it's therefore better thought of as a health optimizer.

This has to be counteracted by some kind of complexity penalty, though, or else you can only ever call a thing a [its-specific-physical-effects-on-the-world]-maximizer.

I wonder if we might define this complexity penalty relative to our own ontology. That is, to me, a description of what specifically the liver does requires lots of new information, so it makes sense to just think of it as a health optimizer. But to a medical scientist, the "detoxifies..." description is still pretty simple and obviously superior to my crude 'health optimizer' designation.

[-]FeepingCreature7y10

The model of the bank account compresses the target function of the brain, even when expressed in terms of specific physical effects on the world. Further, the model of health compresses the target function of the liver better than the bank account.

[-]Michaël Trazzi7y80

Let me see if I got it right:

Defining optimizers as an unpredictable process maximizing an objective function does not take into account algorithms that we can compute
Satisfying the property P "give the objective function higher values than an inexistence baseline" is not sufficient:

the lid satisfies (P) with "water quantity in bottle" but is just a rigid object that some optimizer put there. However, not the best counter-example because not a Yudkwoskian optimizer.
if a liver didn't exist or did other random things then humans wouldn't be alive and rich, so it satisfies (P) with "money in bank account" as the objective function. However, the better way to account for its behaviour (cf. Yudkowskian definition) is to see it as a sub-process of an income maximizer created by evolution.

One property that could work: have a step in the algorithm that provably augments the objective function (e.g. gradient ascent).

Properties I think are relevant:

intent: the lid did not "chose" to be there, humans did
doing something that the outer optimizer cannot do "as well" without using the same process as the inner optimizer : would be very tiring for humans to use our hands as lids. Humans cannot play go as well as Alpha Zero without actually running the algorithm.

[-]Stuart_Armstrong7yΩ370

I think my syntax/semantics idea is relevant to this question - especially the idea of different sets of environments. https://www.lesswrong.com/posts/EEPdbtvW8ei9Yi2e8/bridging-syntax-and-semantics-empirically

For example, suppose we have a super-intelligent bottle cap, dedicated to staying on the bottle (and with some convenient manufacturing arms and manufacturing capability. This seems to be exactly an optimiser, one that we mere humans cannot expect to be able to get off the bottle.

In contrast the standard bottle cap will only remain on the bottle in a much narrower set of circumstances (though the superintelligent bottle cap will also remain on in those circumstances).

So it seems that what distinguishes the standard bottle cap from a genuine optimiser, is that the genuine optimiser will accomplish its role in a much larger set of (possibly antagonistic) environments, while the standard bottle cap will only do so in a much smaller set of circumstances.

[-]mako yass7y10

A larger set of circumstances... how are you counting circumstances? How are you weighting them? It's not difficult to think of contexts and tasks where boulders outperform individual humans under the realistic distribution of probable circumstances.

[-]Stuart_Armstrong6yΩ140Nomination for 2018 Review

It's helped me hone my thinking on what is and isn't an optimiser (and a wireheader, and so on, for associated concepts).

[-]Jameson Quinn7y20

Can you define it in terms of "sensory", "motor", and "processing"? That is, in order to be an optimizer, you must have some awareness of the state of some system; at least two options for behavior that affect that system in some way; and a connection from awareness to action that tends to increase some objective function.

Works for bottle cap: no sensory, only one motor option.

Works for liver: senses blood, does not sense bank account. Former is a proxy for latter but a very poor one.

For bubbles? This definition would call bubbles optimizers of finding lower pressure areas of liquid, iff you say that they have the "option" of moving in some other direction. I'm OK with having a fuzzy definition in this case; in some circumstances, you might *want* to consider bubbles as optimizers, while in others, it might work better to take them as mechanical rule-followers.

[-]Elo7y20

Discernment seems to be part of the definition. Choosing a and not B. And then having amplified potential to optimise optimising. Choosing the self choice of what that thing is.

[-]Shmi7y2-1

There is no such thing as an optimizer except in the mind of a human anthropomorphizing that entity. I wrote about it some time ago. Let me quote, sorry it is long. One can replace "agent" with "optimizer" in the following.

... Something like a bacterium. From the human point of view, it is alive, and has certain elements of agency, like the need to feed, which is satisfies by, say, moving up a sugar gradient toward richer food sources so it can grow. It also divides once it is mature enough, or reached a certain size. It can die eventually, after multiple generations, and so on.

The above is a very simplified black-box description of bacteria, but still enough to make at least some humans care to preserve it as a life form, instead of coldly getting rid of it and reusing the material for something else. Where does this compassion for life come from? I contend that it comes from the lack of knowledge about the inner workings of the “agent” and consequently lack of ability to reproduce it when desired.

I give a simple example to demonstrate how lack of knowledge makes something look “alive” or “agenty” to us and elicits emotional reactions such as empathy and compassion. Enter

Bubbles!

Let’s take a… pot of boiling water. If you don’t have an immediate intuitive picture of it in mind, here is a sample video. Those bubbles look almost alive, don’t they? They are born, they travel along a gradient of water pressure to get larger, while changing shape rather chaotically, they split apart once they grow big enough, they merge sometimes, and they die eventually when reaching the surface. Just like a bacteria.

So, a black-box description of bubbles is almost indistinguishable from a black-box description of something that is conventionally considered alive. Yet few people feel compelled to protect bubbles, say, by adding more water and keeping the stove on, and have no qualms whatsoever to turn off the boiler and letting the bubbles “die”. How come?

There are some related immediate intuitive explanations for it:

We know “how the bubbles work” — it’s just water vapor after all! The physics is known, and the water boiling process can be numerically simulated from the relevant physics equations.

We know how to make bubbles at will — just pour some water into a pot and turn the stove on.

We don’t empathize with bubbles as potentially experience suffering, something we may when observing, say, a bacteria writhe and try to escape when encountering an irritant.

We see all bubbles as basically identical, with no individuality, so a disappearing bubble does not feel like a loss of something unique.

Thus something whose inner workings we understand down to the physical level and can reproduce at will without loss of anything “important” no longer feels like an agent. This may seem rather controversial. Say, you poke a worm and it wriggles and squirms, and we immediately anthropomorphize this observation and compare it to human suffering in similar circumstances. Were we to understand the biology and the physics of the worm, we may have concluded that the reactions are more like that of a wiggling bubble than that of a poked human, assuming the brain structure producing the quale “suffering” does not have an analog in the worm’s cerebral ganglion. Alternatively, we might conclude that worms do have a similar structure, producing suffering when interacted with a certain way, and end up potentially extending human morals to cover worms, or maybe also bacteria. Or even bubbles.

[-]TAG7y100

There is no such thing as an optimizer except in the mind of a human anthropomorphizing that entity.

Is there some other set of concepts that don't exist only in the human mind?

[-]DanielFilan7y60

I claim that this is wrong: I can understand down to the physical level and reproduce at will something which implements the UCB algorithm, and it still seems like an optimisation algorithm to me.

[-]Shmi7y20

Hmm, I don't have a good understanding of this algorithm, from your link I gather that this is still an agent who follows the algorithm, not a physical system without an agent anywhere in there, like, say, a chess bot. But it could be my misunderstanding.

[-]David James1y11

Nevertheless, it seems wrong to say that my liver is optimising my bank balance, and more right to say that it "detoxifies various metabolites, synthesizes proteins, and produces biochemicals necessary for digestion"---even though that gives a less precise account of the liver's behaviour.

I'm not following why this is a less precise account of the liver's behavior.

[-]Gurkenglas7y10

Is there a difference between what you call optimizer and what Paul Christiano calls daemon?

[-]DanielFilan7y40

I think that everything that Paul would call a daemon, I would call an optimiser.

Things that I would call optimisers that Paul would (probably?) not call daemons:

A program that ran gradient descent in order to solve a linear regression problem.
The UCB algorithm that optimises payoffs in multi-armed bandit problems.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

103

Bottle Caps Aren't Optimisers

103

Ω 31

103

Ω 31