Optimization Provenance

[-]Wei Dai6y*Ω580

I'm having trouble building an intuitive picture of what kind of thing an optimization provenance is. Would it be possible give some toy examples of optimizers and what their optimization provenance might look like?

[-]Vanessa Kosoy6yΩ350

The idea of explicit world models reminds me of my research direction concerning cellular decision processes. Note that the "grid" of such a process can be replaced by an arbitrary graph, which can also evolve dynamically, and that makes it quite close to the notion of, representing the world as a collection of objects and relationships/interactions. I did not know about the sheaf theoretic angle (added to reading list, thank you!), it might be interesting to see whether these two combine in a natural way.

[-]Steven Byrnes6y20

Couple things:

(1) You might give some thought to trying to copy (or at least understand) the world model framework of the human brain. There's uncertainty in how that works, but a lot is known, and you'll at least be working towards something that we know for sure is capable of getting built up to a human level world-model within a reasonable amount of time and computation. As best as I can tell (and I'm working hard to understand it myself), and grossly oversimplifying, it's a data structure with billions of discrete concepts, and transformations between those concepts (composition, cause-effect, analogy, etc...probably all of those are built out of the same basic "transformation machinery" with different contexts acting as metadata). All these concepts are sitting in the top layer of some kind of loose hierarchy, whose lowest layer consists of (higher-level-context-dependent) probability distributions over spatiotemporal sequences of sensory inputs. See my Jeff Hawkins post for one possible point of departure. I've found a couple other references that are indirectly helpful, and like I said, I'm still trying to figure it out. I'm still trying to understand the "sheaves" approach , so I won't comment on how these compare.

(2) "This conception will be the result of an optimizer, and so this should be in the optimization provenance" - this seems to be important and I don't understand it. Better understanding the world consists (in part) of chunking sequences of events and actions, suppressing intermediate steps. Thus we say and think "I'll put some milk in my coffee," leaving out the steps like unscrewing the top of the jug. The process of "explore the world model, chunking sequences of events when appropriate" is (I suspect) essential to making the world-model usable and powerful, and needs to be repeated millions of times in every nook and cranny of the world model, and thus this is a process that an overseer would have little choice but to approve in general, I think. But this process can find and chunk manipulative causal pathways just as well as any other kind of pathway. And once manipulation is packaged up inside a chunk, you won't need optimization per se to manipulate, it will just be an obvious step in the process of doing something, just like unscrewing the top of the jug is an obvious step in putting-milk-into-coffee. I'm not sure how you propose to stop that from happening.

[-]Ofer6yΩ110

The optimization processes used in the agent must all be capable of controlling whether it creates a mesa-optimizer.

I'm confused about this sentence - my understanding is that the term mesa-optimizer refers to the agent/model itself when it is doing some optimization. I think the term "run-time optimization" (which I've seen in this slide, seemingly from a talk by Yann LeCun) refers to this type of optimization.

4. Does this apply to other forms of optimization daemons?

Isn't every optimization daemon a mesa-optimizer?

I was under the impression that the term "optimization daemon" was used to describe a mesa-optimizer that is a "consequentialist" (I don't know whether there's a common definition for the term "consequentialist" in this context; my own tentative fuzzy definition is "something that has preferences about the spacetime of the world/multiverse".)

[-]Matthew Barnett6yΩ110

In particular, the goal is to create a notion of transparency strong enough that an attempted deception would be completely transparent.

Is the idea here that

Creating a version of transparency this strong would enable us to mitigate less extreme forms of mesa misalignment ie. This is a strong enough form of transparency that it 'covers' the other cases automatically.
Deception should be treated separately from other forms of mesa misalignment.

LESSWRONG
LW

LESSWRONG
LW

38

Optimization Provenance

38

Ω 19

38

Ω 19

Background notions

Legibility

World model

Optimizer

Provenance

Motivations

Treacherous turn

Goodhart's curse

The optimization provenance agenda

Legible optimizers

Mesa-optimizer control

Provenance accountability

Recursive uses