Aspiring monastic and AI safety researcher


Search versus design

Yes I agree with this. Another example is the way a two-by-four length of timber is a kind of "interface" between the wood mill and the construction worker. There is a lot of complexity in the construction of these at the wood mill, but the standard two-by-four means that the construction worker doesn't have to care. This is also a kind of factorization that isn't about decomposition into parts or subsystems.

Search versus design

Nice post, very much the type of work I'd like to see more of.

Thank you!

I'm not sure I'd describe this work as "notorious", even if some have reservations about it.

Oops, terrible word choice on my part. I edited the article to say "gained attention" rather than "gained notoriety".

I think this is incorrect - for example, "biological systems are highly modular, at multiple different scales". And I expect deep learning to construct minds which are also fairly modular. That also allows search to be more useful, because it can make changes which are comparatively isolated.

Yes I agree with this, but modularity is only a part of what is needed for comprehensibility. Chris Olah's work on circuits in convnets suggests that convnets trained on image recognition tasks are somewhat modular, but it's still very very difficult to tease them apart and understand them. Biological trees are modular in many ways, but we're still working on understanding how trees work after many centuries of investigation.

You might say that comprehensibility = modularity + stories. You need artifacts that decompose into subsystems, and you need stories about that decomposition and what the pieces do so that you're not left figuring it out from scratch.

Search versus design

And thus the wheel of the Dharma was set in motion once again, for one more great turning of time

Search versus design

Ah this is a different Ben.

Search versus design

I think this is a very good summary

Search versus design

I think my real complaint here is that your story is getting its emotional oomph from an artificial constraint (every output must be 100% correct or many beings die) that doesn't usually hold, not even for AI alignment

Well OK I agree that "every output must be 100% correct or many beings die" is unrealistic. My apologies for a bad choice of toy problem that suggested that I thought such a stringent requirement was realistic.

But would you agree that there are some invariants that we want advanced AI systems to have, and that we really want to be very confident that our AI systems to satisfy these invariants before we deploy them, and that these invariants really must hold at every time step?

To take an example from ARCHES, perhaps it should be the case that, for every action output at every time step, the action does not cause the Earth's atmospheric temperature to move outside some survivable interval. Or perhaps you say that this invariant is not a good safety invariant -- ok, but surely you agree that there is some correct formulation of some safety invariants that we really want to hold in an absolute way at every time step? Perhaps we can never guarantee that all actions will have acceptable consequences because we can never completely rule out some confluence of unlucky conditions, so then perhaps we formulate some intent alignment invariant that is an invariant on the internal mechanism by which actions are generated. Or perhaps intent alignment is misguided and we get our invariants from some other theory of AI safety. But there are going to be invariants that we want our systems to satisfy in an absolute way, no?

And if we want to check whether our system satisfies some invariant in an absolute way then I claim that we need to be able to look inside the system and see how it works, and convince ourselves based on an understanding of how the thing is assembled that, yes, this python code really will sort integers correctly in all cases; that, yes, this system really is structured such that this intent alignment invariant will always hold; that yes, this learning algorithm is going to produce acceptable outputs in all cases for an appropriate definition of acceptability.

When we build sophisticated systems and we want them to satisfy sophisticated invariants, it's very hard to use end-to-end testing alone. And we are forced to use end-to-end testing alone whenever we are dealing with systems that we do not understand the internals of. Search produces systems that are very difficult to understand the internals of. Therefore we need something beyond search. This is the claim that my integer sorting example was trying to be an intuition pump for. (This discussion is helping to clarify my thinking on this a lot.)

Search versus design

Obviously a regular sorting algorithm would be better, but if the choice were between the neural net and a human, and you knew there wasn't going to be any distributional shift, I would pick the neural net.

Well, sure, but this is a pretty low bar, no? Humans are terrible at repetitive tasks like sorting numbers.

Better than any of these solutions is to not have a system where a single incorrect output is catastrophic.

Yes very much agreed. It is actually incredibly challenging to build systems that are robust to any particular algorithm failing, especially at the granularity of a sorting algorithm. Can I trust the function that appends items to arrays to always work? Can I trust that the command line arguments I receive are accurate to what the user typed? Can I trust the max function? Can I trust that arithmetic is correctly implemented? Do you know of any work that attempts to understand/achieve robustness at this level? I'd be fascinated to read more about this.

Search versus design

Well said, friend.

Yes when we have a shared understanding of what we're building together, with honest and concise stories flowing in both directions, we have a better chance of actually understanding what all the stakeholders are trying to achieve, which at least makes it possible to find a design that is good for everyone.

The distinction you point out between a design error and missing information seems like a helpful distinction to me. Thank you.

It reminds me of the idea of interaction games that CHAI is working on. Instead of having a human give a fully-specified objective to a machine right up front, the idea is to define some protocol for ongoing communication between human and machine about what the objective actually is. When I design practical things I notice this kind of thing happening between me and the world as I start with a vague sense of what my objective is and gradually clarify it through experimentation.

I'm curious to hear about your experience designing and building things, and how it matches up with the model we're talking about here.

Search versus design

One key to this whole thing seems to be that "helpfulness" is not something that we can write an objective for. But I think the reason that we can't write an objective for it is better captured by inaccessible information than by Goodhart's law.

By "other-izer problem", do you mean the satisficer and related ideas? I'd be interested in pointers to more "other-izers" in this cluster.

But isn't it the case that these approaches are still doing something akin to search in the sense that they look for any element of a hypothesis space meeting some conditions (perhaps not a local optima, but still some condition)? If so then I think these ideas are quite different from what humans do when we design things. I don't think we're primarily evaluating whole elements of some hypothesis space looking for one that meets certain conditions, but are instead building things up piece-by-piece.

Search versus design

Hey thank you for your thoughts on this post, friend

overall "design" is being used as a kind of catch-all that is probably very complicated

Yes it may be that "automating design" is really just a rephrasing of the whole AI problem. But I'm hopeful that it's not. Keep in mind that we only have to be competitive with machine learning, which means we only have to be able to automate the design of artifacts that can also be produced by black box search. This seems to me to be a lower bar than automating all human capacity for design, or automating design in general.

In fact you might think the machine learning problem statement of "search a hypothesis space for a policy that performs well empirically" is itself a restatement of the whole AI problem, and perhaps a full solution to this problem would in fact be a general solution to AI. But in practice we've been able to make incremental progress in machine learning without needing to solve any AI-complete parts of the problem space (yet).

does the artifact work because of the story (as in "design"), or does the artifact work because of the evaluation (as in search)?

Interesting. I wasn't thinking of the story as playing any causal role in making the artifact work. (Though it's very important that the story convey how the artifact actually works, rather than being optimized for merely being convincing.)

Can you say more about what it would mean for an artifact to work because of the story?

This isn't so clean, since [...] Most artifacts work for a combination of the two reasons---I design a thing then test it and need a few iterations

Yup very much agreed. But the trial-and-error part of design seems to me very different from the evaluate-and-gradient-step part of search. When I write some code and test it and it doesn't work, I almost always get some insight into some general failure mode that I failed to anticipate before, and I update both my story and my code. This might be realizing that some API sometimes return null instead of a string for a certain field, or noticing that negative numbers on the command line look like flags to the argument parser. Perhaps we could view these as gradient steps in story space. Or perhaps that's too much of a stretch.

There seem to many other reasons things work (e.g. "it's similar to other things that worked" seems to play a super important role in both design and search).

Yes when we look at the evolution of, say, a particular product made by a particular company over a timespan of years, it seems that there is a local search happening. Same thing if we look at, say, manufacturing processes for semiconductors. I'm keen to spend more time thinking about this.

A story seems like it's the same kind of thing as an artifact, and we could also talk about where it comes from. A story that plays a role in a design itself comes from some combination of search and design.

Yeah this seems important. If we view a story as a particular kind of artifact then it would be good to get clear on what exactly identifies an artifact as a story. There is a section in the Genesis manifesto (audacious GOFAI manifesto from some cog sci folks at MIT) that talks about internal and external stories, where an internal story is a story represented in mental wetware, and an external story is any object that spontaneously gives rise to an internal story upon examination.

During design it seems likely that humans rely very extensively on searching against mental models, which may not be introspectively available to us as a search but seems like it has similar properties.

In my very limited experiment I was continuously surprised about how non-search-like the practical experience of design was to me. But yes there is a lot happening beneath conscious awareness so I shouldn't update too heavily on this.

if you do jointly search for a model+helpful story about it, the story still isn't the reason why the model works, and from a safety perspective it might be similarly bad

Well when I buy an air conditioner that comes with an instruction manual that includes some details on how the air conditioner is constructed, it's not literally the case that the air conditioner works because of the physical instruction manual. The physical instruction manual is superfluous to the correct functioning of the physical air conditioner. And it's also quite possible that the physical instruction manual was constructed after the physical air conditioner was already constructed, and this doesn't rule out the helpfulness of the instruction manual.

What's important is that the instruction manual conveys a true account of how the air conditioner really works.

Now if we really nailed down what it means for a story to give a true account of how an artifact really works then perhaps we could search over (artifact, story) pairs. But this seems like deeply inaccessible information to me, so I agree this kind of solution can't work. If I'm a search algorithm searching over (artifact, story) pairs then I have no understanding whatsoever about how the artifacts I'm constructing work. I'm just searching. On what basis could I possibly discern whether a certain story faithfully captures how it is that a certain artifact really works?

What we actually need is a process for constructing artifacts that builds them up piece by piece, so that a story can be constructed piece by piece in parallel. Or something else entirely. But it just doesn't seem that search is enough here. I might try to formalize this argument at some point.

Load More