Does it need to be either "pure search" or "no search"?
My expectation would be that in the limit it learns a ton of heuristics about what usually works, and learns to do a much more efficient search using those heuristics. This would especially be the case if e.g. capabilities researchers give the nets extra options to speed up the search (for instance, it's totally possible to embed a few steps of gradient descent into the inference of a neural network, since gradient descent is differentiable - I don't know if that will eventually be shown to help improve capabilities, but if it will, capabilities researchers would presumably do it).
Agreed that "search" is not a binary but more like a continuum, where we might call a program more "search-like" if it is enumerating possible actions and evaluating their consequences, and less "search-like" if it is directly mapping representations of inputs to actions. The argument in this post is that gradient descent (unlike evolution, and unlike human programmers) doesn't select much for "search-like" programs. If we take depth-first search as a central example of search, and a thermostat as the paradigmatic non-search program, gradient descent will select for something more like the thermostat.
it's totally possible to embed a few steps of gradient descent into the inference of a neural network, since gradient descent is differentiable
Agreed, and networks may even be learning something like this already! But in my ontology I wouldn't call an algorithm that performs, say, 5 steps of gradient descent over a billion-parameter space and then outputs an action very "search-like"; the "search" part is generating a tiny fraction of the optimization pressure, relative to whatever process sets up the initial state and the error signal.
Maybe this is just semantics, because for high levels of capability search and control are not fundamentally different (what you're pointing to with "much more efficient search" - an infinitely efficient search is just optimal control, you never even consider suboptimal actions!). But it does seem like for a fixed level of capabilities search is more brittle, somehow, and more likely to misgeneralize catastrophically.
Mostly orthogonal:
Other relevant differences are
See my answer to tailcalled:
a program is more "search-like" if it is enumerating possible actions and evaluating their consequences
I'm curious if you mean something different by search when you say that we're likely to find policies that look like an "explicit search process + simple objective(s)"
Yeah, that's definitely not what I mean by search (nor what I think others mean by search, in the context of AI and inner agents).
Roughly speaking, a general search process is something which takes in a specification of some problem or objective (from a broad class of possible problems/objectives), and returns a plan which solves the problem or scores well on the objective. For instance, a gradient descent algorithm takes in an objective, and returns a point which scores well on the objective, for a very broad class of possible objectives; gradient descent is therefore a search method.
Enumerating possible actions and evaluating their consequences is one way to do general search, but it's wildly inefficient; I would typically refer to that as "brute force search". Gradient descent does better by leveraging backprop and gradients; approximately none of the algorithmic work done by gradient descent comes from direct evaluation of the consequences of actions. And there are many other tricks one can use too - like memoization on subsearches, or A*-style heuristic search, or (one meta-level up from A*) relaxation-based methods to discover heuristics. The key point is that these tricks are all very general purpose: they work on a very wide variety of search problems, and therefore produce general-purpose search algorithms which are more efficient than brute force (at least on realistic problems).
More advanced general-purpose search methods seem to rely relatively little on enumerating possible actions and evaluating their consequences. By the time we get to human-level search capabilities, we see human problem-solvers spend most of their effort on nontrivial problems thinking about subproblems, abstractions and analogies rather than thinking directly about particular solutions.
I agree that A* and gradient descent are central examples of search; for realistic problems these algorithms typically evaluate the objective on millions of candidates before returning an answer.
In contrast, human problem solvers typically do very little state evaluation - perhaps evaluating a few dozen possibilities directly, and relying (as you said) on abstractions and analogies instead. I would call this type of reasoning "not very search-like".
On the far end we have algorithms like Gauss-Jordan elimination, which just compute the optimal solution directly without evaluating any possibilities. Calling them "search algorithms" seems quite strange.
a general search process is something which takes in a specification of some problem or objective (from a broad class of possible problems/objectives), and returns a plan which solves the problem or scores well on the objective.
This appears to be a description of any optimization process, not just search - in particular it would include Gauss-Jordan elimination. I guess your ontology has "optimization" and "search" as synonyms, whereas mine has search as a (somewhat fuzzy) proper subset of optimization. Anyways, to eliminate confusion I'll use Abram Demski's term selection in future. Also added a terminology note to the post.
My ontology indeed has search and a narrow notion of optimization as approximately synonyms; they differ only somewhat in type signature and are easily interchangeable. Conceptually, both take in an objective, and return something which scores highly on the objective. (This is narrower than e.g. Flint's notion of "optimization"; in that ontology it might be called a "general-purpose optimizer" instead.)
Anyway, insofar as any of this is relevant to the arguments for mesa-optimization, it's the notion of search/optimization as general problem solving which applies there.
Roughly speaking, a general search process is something which takes in a specification of some problem or objective (from a broad class of possible problems/objectives), and returns a plan which solves the problem or scores well on the objective.
This description of “search” seems far too broad. E.g., it seems to include things like lookup tables, and AFAICT, literally every way to solve problems or satisfy objectives?
Seems like “search” should mean something more specific than “problem solving”.
It excludes methods specific to a small number of problems. Search is about general problem solving.
Anyway, IIUC this is how the term "search" has historically been used in AI, it is also the notion of "search" which is relevant to the arguments for mesaoptimization in Risks From Learned Optimization, it is also the notion of search which is relevant to general intelligence being A Thing, it is the notion of search which is relevant to the possibility of an ML system suddenly grokking "general search" and thereby undergoing a rapid increase in capabilities, etc.
we will select for "explicit search processes with simple objectives"
The actual argument is that small descriptions give higher parameter space volume, and so the things we find are those with short descriptions (low Kolmogorov complexity). The thing with a short description is the whole mesa-optimizer, not just its goal. This is misleading for goals because low Kolmogorov complexity doesn't mean low "complexity" in many other senses, so an arbitrary goal with low Kolmogorov complexity would actually be much more "complicated" than intended base objective. In particular, it probably cares about the real world outside an episode and is thus motivated to exhibit deceptively aligned behavior.
I think "explicit search" is similarly misleading, because most short programs (around a given behavior) are not coherent decision theoretic optimizers. Search would only become properly explicit after the mesa-optimizer completes its own agent foundations alignment research program and builds itself a decision theory based corrigible successor. A mesa-optimizer only needs to pursue the objective of aligned behavior (or whatever it's being selected for), and whether that tends to be its actual objective or the instrumental objective of deceptive alignment is a toss-up, either would do for it to be selected. But in either case, it doesn't need to be anywhere ready to pursue a coherent goal of its own (as aligned behavior is also not goal directed behavior in a strong sense).
Agreed on "explicit search" being a misleading phrase, I'll replace it with just "search" when I'm referring to learned programs.
small descriptions give higher parameter space volume, and so the things we find are those with short descriptions
I don't think I understand this. GPT-3 is a thing we found, which has 175B parameters, what is the short description of it?
I mean relatively short, as in the argument for why overparametrized models generalize. They still do get to ~memorize all training data, but anything else comes at a premium, reduces probability of getting selected for models whose behavior depends on those additional details. (This use of "short" as meaning "could be 500 gigabytes" was rather sloppy/misleading of me, in a comment about sloppy/misleading use of words...)
I'm thinking of a setting where shortest descriptions of behavior determine sets of models that exhibit matching behavior (possibly in a coarse-grained way, so distances in behavior space are relevant). This description-model relation could be arbitrarily hard to compute, so it's OK for shortest descriptions to be shortest programs or something ridiculous like that. This gives a partition of the model/parameter space according to the mapping from models to shortest descriptions of their behavior. I think shorter shortest descriptions (simpler behaviors) fill more volume in the parameter/model space, have more models whose behavior is given by those descriptions (this is probably the crux; e.g. it's false if behaviors are just models themselves and descriptions are exact).
Gradient descent doesn't interact with descriptions or the description-model relation in any way, but since it selects models ~based on behavior, and starts its search from a random point in the model space, it tends to select behaviors from larger elements of the partition of the space of models that correspond to simpler behaviors with shorter shortest descriptions.
This holds at every step of gradient descent, not just when it has already learned something relevant. The argument is that whatever behavior is selected, it is relatively simple, compared to other behaviors that could've been selected by the same selection process. Further training just increases the selection pressure.
Yeah I think you need some additional assumptions on the models and behaviors, which you're gesturing at with the "matching behaviors" and "inexact descriptions". Otherwise it's easy to find counterexamples: imagine the model is just a single N x N matrix of parameters, then in general there is no shorter description length of the behavior than the model itself.
Yes there are non-invertible (you might say "simpler") behaviors which each occupy more parameter volume than any given invertible behavior, but random matrices are almost certainly invertible so the actual optimization pressure towards low description length is infinitesimal.
I think you overestimate the importance of the genomic bottleneck. It seems unlikely that humans would have been as successful as we are if we were... the alternative to the kind of algorithm that does search, which you don't really describe.
Performing search to optimize an objective seems really central to our (human's) capabilities, and if you want to argue against that I think you should say something about what an algorithm is supposed to look like that is anywhere near as capable as humans but doesn't do any search.
I disagree that performing search is central to human capabilities relative to other species. The cultural intelligence hypothesis seems much more plausible: humans are successful because our language and ability to mimic allow us to accumulate knowledge and coordinate at massive scale across both space and time. Not because individual humans are particularly good at thinking or optimizing or performing search. (Not sure what the implications of this are for AI).
You're right though, I didn't say much about alternative algorithms other than point vaguely in the direction of hierarchical control. I mostly want to warn people not to reason about inner optimizers the way they would about search algorithms. But if it helps, I think AlphaStar is a good example of an algorithm that is superhuman in a very complex strategic domain but is very likely not doing anything like "evaluating many possibilities before settling on an action". In contrast to AlphaZero (with rollouts), which considers tens of thousands of positions before selecting an action. AlphaZero (just the policy network) I'm more confused about... I expect it still isn't doing search, but it is literally trained to imitate the outcome of a search so it might have similar mis-generalization properties?
(Note that I'm not making a claim about how search is central to human capabilities relative to other species; I'm just saying search is useful in general. Plausibly also for other species, though it is more obvious for humans)
From my POV, the "cultural intelligence hypothesis" is not a counterpoint to importance of search. It's obvious that culture is important for human capabilities, but it also seems obvious to me that search is important. Building printing presses or steam engines is not something that a bundle of heuristics can do, IMO, without gaining those heuristics via a long process of evolutionary trial-and-error. And it seems important that humans can build steam engines without generations of breeding better steam-engine-engineers.
Re AlphaStar and AlphaZero: I've never played Starcraft, so I don't have good intuitions for what capabilities are needed. But on the definitions of search that I use, the AlphaZero policy network definitely performs search. In fact out of current systems it's probably the one that most clearly performs search!
...Now I'm wondering whether our disagreement just comes from having different definitions of search in mind. Skimming your other comments above, it seems like you take a more narrow view of search = literally iterating through solutions and picking a good one. This is fine by me definitionally, but I don't think the fact that models will not learn search(narrow) is very interesting for alignment, or has the implications that you list in the post? Though ofc I might still be misunderstanding you here.
Yeah it's probably definitions. With the caveat that I don't mean the narrow "literally iterates over solutions", but roughly "behaves (especially off the training distribution) as if it's iterating over solutions", like Abram Demski's term selection.
AlphaZero (just the policy network) I'm more confused about... I expect it still isn't doing search, but it is literally trained to imitate the outcome of a search so it might have similar mis-generalization properties?
This suggests that the choice of decision theory that amplifies a decision making model (in the sense of IDA/HCH, or just the way MCTS is used in training AlphaZero) might influence robustness of its behavior far off-distribution, even if its behavior around the training distribution is not visibly sensitive to choice of decision theory used for amplification.
Though perhaps this sense of "robustness" is not very appropriate, and a better one should be explicitly based on reflection/extrapolation from behavior in familiar situations, with the expectation that all models fail to be robust sufficiently far off-distribution (in the crash space), and new models must always be prepared in advance of going there.
My thinking is that one of the biggest reasons humans managed to dominate is basically 3x more brainpower combined with ways to get rid of the heat necessary to support brainpower, which requires sweating all over the body.
Essentially it's the scaling hypothesis applied to biological systems.
And since intelligence can be used for any goal, it's not surprising that intelligence's main function was cultural.
TL;DR: Gradient descent won't select for inner search processes because they're not compute & memory efficient.
Slightly longer TL;DR: A key argument for mesa-optimization is that as we search over programs, we will select for "search processes with simple objectives", because they are simpler or more compact than alternative less dangerous programs. This argument is much weaker when your program search is restricted to programs that use a fixed amount of compute, and you're not optimizing strongly for low description length - e.g. gradient descent in modern deep learning systems. We don't really know what shape of programs gradient descent selects for in realistic environments, but they are much less likely to involve search than commonly believed.
Note on terminology (added in response to comments): By "search" I mean here a process that evaluates a number of candidates before returning the best one; what Abram Demski calls "selection" in Selection vs Control . The more candidates considered, the more "search-like" a process is - with gradient descent and A* being central examples, and a thermostat being a central counter-example.
Recap: compression argument for inner optimizers
Here's the argument from Risks From Learned Optimization: [emphasis mine]
and even more forceful phrasing from John Wentworth:
Compactness, Complexity, and Compute
At face value, it does seem like we're selecting programs for simplicity. The Deep Double Descent paper showed us that gradient descent training in the overparametrized regime (i.e. the regime of all modern deep models) favors simpler models. But is this notion of simplicity the same as "compactness" or "complexity"? Evan seems to think so, I'm less sure. Let's dive into the different notions of complexity here.
The most commonly used notion of program complexity is Kolmogorov complexity (or description length), basically just "length of the program in some reference programming language". This definition seems natural... but, critically, it assumes away all computational constraints. K-complexity doesn't care if your program completes in a millisecond or runs until the heat death of the universe. This makes it a particularly perverse notion of complexity to use when interacting with the real, computationally constrained world.
Why do we use Kolmogorov complexity at all, then? One reason it's so intuitive is that it's the primary notion of complexity that humans programmers use in their work. Programs that humans write are selected to be short, hence easy to write and analyze. Memory and compute use are often secondary concerns, because in most programming contexts human time is so much more valuable than computer time.
A simplistic model for how humans do program search is "What's the shortest program I can implement that solves the problem exactly?". The answer is usually some form of explicit search with a simple objective, because with no computational constraints that's always the shortest description length program (as an extreme example, bogosort is probably the lowest-description-length sorting procedure).
In contrast, the way modern deep learning does program search is "I can tweak the parameters of a massively parallel computation involving at most 50 serial steps, 100 million multiply-adds per step, and 100 kilobytes of memory to store intermediate results; what parameters get me the best approximate solution given this compute budget?". Here it seems much less clear that the resulting program is going to resemble search; it might well be a diffuse bundle of heuristics, or something else that humans don't have good intuitions for.
What's going on here? Once we stop assuming compute is infinite and instead fix a budget, you can no longer rely on the "compression trick" that lets you replace an actual policy with a search for the policy. Search isn't free; it costs you compute and memory that could have been spent running a better policy, and gradient descent doesn't select for low Kolmogorov complexity nearly enough (if at all) to compensate.
Some caveats to this conclusion:
(Speculative) Implications for interpretability and safety
If most of the danger of inner optimization comes from search and planning[1], this line of reasoning has some unexpected implications. In AI safety we generally think of large black-box models as dangerous, and small interpretable models as safer. But the bias towards explicit search with simple objectives turns out to result from a human bias towards mechanistically interpretable (i.e. low-Kolmogorov-complexity) programs, a bias that often comes at significant computational cost.
If what we really care about when we talk about interpretability is predicting generalization errors, mechanistic interpretability may not be what we want. There is hideous, unpredictable complexity lurking in mechanistically simple programs such as the 3n+1 algorithm and the Rule 110 cellular automaton. The black-box models that gradient descent converges on may turn out to be simpler and more predictable in the sense that really matters. An "agent AI" trained end-to-end with gradient descent may be safer in practice than a "tool AI" that generates C code given a problem description, because the latter shares human programmers' bias towards explicit search over simple, brittle objectives. Which is where much of the danger lies.
Unclear how true this is - in the limit of capabilities, both search and control processes could be lethal. But intuitively, for a fixed level of capabilities, it seems like search processes are more brittle and likely to mis-generalize catastrophically than control processes.