On one hand, it does not have to be a black box. If we consider one of the programming formalisms allowing to take linear combination of programs, one can consider decomposition of programs into series, and one can retain quite a bit of structure this way.
I think, capability-wise something close to “glass boxes” (in terms of structure, but not necessarily behavior) can be done.
But implications for safety are uncertain. On one hand, even very simple dynamic systems can have complex and unpredictable behavior. And even more so, when we add self-modification to the mix. So the structure can be transparent, but this might not always translate into transparency of behavior.
And then, the ability to understand these systems is a double-edged sword (it is a strong capability booster, and makes it much easier to improve those AI systems).
Yes, any “neuromorphic formalism” would do (basically, one considers stream-oriented functional programs, and one asks for streams in question to admit linear combinations of streams, and the programs end up being fairly compact high-end neural machines with small number of weights).
I can point you to a version I’ve done, but when people translate small specialized programs into small custom-synthesized Transformers, that’s in the same spirit. Or when people craft some compact neural cellular automata with small number of parameters, it is also in that spirit.
Basically, as long as programs themselves end up being expressible as some kind of sparse connectivity tensors, you can consider their linear combinations and series.
Would an elegant implementation of AGI be safe? What about an elegant and deeply understood implementation?
My impression is that it's possible for an elegant and deeply understood implementation to block particular failure modes. An AI with a tightly bounded probability distribution can't get pascal mugged.
Being well understood isn't sufficient. Often it just means you know how the AI will kill you.
And "Safe" implies blocking all failure modes.
Looked up Blum's speedup theorem.
Simplified structure of proof. Consider a listing of all turing machines. T_1, T_2 ...
The problem over which the speed up theorem applies involves simulating the first N Turing machines, but applying MUCH more compute to those that come first in the listing.
So your simulating Turing machine i for 2^^(N-i) (note double up arrow) steps.
By caching the outputs of the first few Turing machines, you can get a faster algorithm.
But, at some point, you get a Turing machine that never halts, but can't be proved never to halt in ZFC.
At this point, your slower algorithm has to actually run the Turing machine, and your faster algorithm just magically knows that it never halts.
Blums speedup comes from replacing a long running computation, with a look up table full of uncomputable data. Computability theories "there exists an algorithm" allows you to magically know any finite amount of data.
Given the Blum infinite sequence of ever faster algorithms, there is a fastest algorithm that can be proved to work in ZFC.
Yeah I don't understand where Eliezer is coming from in that tweet (and he's written similar things elsewhere).
I don't know as much about algorithms as you; my argument centers instead on the complexity of the world.
Like, here a question: is "tires are usually black", and stuff like that, encoded directly in the source code?
If yes, then (1) that's unrealistic, and (2) even if it happened, the source code would wind up horrifically complicated and inscrutable, because it's a complicated world.
If no, then the source code must be defining a learning algorithm of some sort, which in turn will figure out for itself that tires are usually black. Might this learning algorithm be simple and legible? Yes! But that was also true for GPT-3 too, which Eliezer has always put in the inscrutable category.
So what is he taking about? He must have some alternative in mind, but I'm unsure what.
I suppose the learning process could work in a more legible way - for instance, it's not clear why neural networks generalize successfully. But this seems to be more closely related to the theoretical understanding of learning algorithms than their knowledge representation.
For instance, there is no computable predictor which always learns to do as well as every other computable predictor on all sequences.
What's going on here is quite interesting. For any number N, a version of AIXI with time and length bound N is in some sense optimal for it's compute.
So (in the limit), what you want is a really big N.
One way to do that is to pick a probability distribution over Turing machines, then store Chaitin's constant ( p(halt) ) to some number of digits. Start running more and more Turing machines for longer and longer until that many halt, and the time taken is your number.
If we take this probability distribution over Turing machines as our prior, then the Turing machines that this algorithm does badly on are the ones that don't halt in time N. And each extra digit of Chaitin's constant, on average, halves the measure of such Turing machines.
Epistemic status: intuitive speculation with scattered mathematical justification.
My goal here is to interrogate the dream of writing a beautiful algorithm for intelligence and thereby ensuring safety. For example:
I don't know precisely what alternative he had in mind, but I only seem to remember reading clean functional programs from MIRI, so that is one possibility. Whether or not anyone else endorses it, that is the prototypical picture I have in mind as the holy grail of glass box AGI implementation. In my mind, it has careful comments explaining the precisely chosen, recursive prior and decision rules that make it go "foom."
Is the "inscrutable" complexity of deep neural networks unavoidable? There is has been some prior discussion of the desire to avoid it as map-territory confusion, and I am not sure if that is true (though I have some fairly subtle suspicions). However, I want to approach this question from a different angle.
And most importantly, I want to ask what this has to do with AI safety. Would an elegant implementation of AGI be safe? What about an elegant and deeply understood implementation?[1]
In a previous essay, I considered the distinction between understanding or discovering core algorithms of intelligence and searching for optimality guarantees. It occurs to me that the former is really a positive, creative exercise while the later is more negative. Or, perhaps it is more accurate to say that an optimal method must sit at the boundary of the positive and the negative; it is exactly as good as possible, and anything better would be impossible.[2]
Sometimes, there is no meaningfully optimal element in a class. For instance, there is no computable predictor which always learns to do as well as every other computable predictor on all sequences.[3] This is a failure of a relative notion of optimality. But perhaps it is more important to ask for good performance on some sequences than others. I think rationalists tend to (hyper?)focus on consistency-based conditions like "Bayes optimality," which I am not really sure can be viewed as optimality conditions at all (though they are certainly desirable to satisfy, all else held equal). Perhaps it is useful to think of the relative notions as external-facing and the consistency notions as internal-facing. I don't want to be too specific here about what makes for a good optimality condition for intelligence, because I think that if I tried to be very specific, I would probably be wrong. But I think some kind of best-in-class property is probably what we mean by optimality, and I suspect that this ultimately (that is, terminally) wants to be externally oriented.
You can approach an optimal method from below or from above. From below, by inventing better and better methods. Sometimes, the class of methods under consideration has structure that makes it easy to construct an optimal method. Sometimes it does not. For instance, there are many algorithms for SAT, it is not so easy to find an asymptotically fastest one. Each faster algorithm invented is a demonstration of what is possible. Optimality can also be approached from above, by trying to prove limits on the possible (exercise to the reader: prove that SAT cannot be solved in polynomial time).
Sometimes, the attitudes of the two different directions are sufficiently different that they form fields with different names - for instance, algorithms and complexity. In this case, the two often appear together with an "and" between them, for obvious reasons.
In general, there is no reason to believe that the set of possible things is "closed" - there may be an infinite sequence of slight improvements above any method. The possible and impossible do not need to touch. Existence proof: Blum's speedup theorem.
Sometimes they do touch, at least in some sense; Solomonoff's universal distribution is an example, optimal among the lower semi-computable distributions. But this is arguably[4] a contrived class of methods.
I think that as intelligent algorithms approach the boundary, they tend to get really complicated. But I don't think they have to start out really complicated - a simple "kernel" should do. I have several independent reasons for believing this. Some of the reasons connect more or less directly / rigorously, and some of them apply more or less widely. Since the relationship between these two virtues seems to here be inverse, I don't think the case is closed.
Deep learning and more generally the bitter lesson. This is more of an intuitive argument. It seems that (large) neural nets become very, very complicated as they learn. In fact if I recall correctly, Neel Nanda recently described them as "cursed" during a talk at LISA[5]. This seems to be reflective of a general trend towards increasing pragmatism (some might say pessimism) about truly understanding neural nets mechanistically.[6] It's notable that the algorithms for transformer training and inference are not actually very complicated (though they do not qualify as elegant either).
No elegant theory of universal prediction. A predictor that can learn sequences up to some complexity must be about that complex itself, as proven by Shane Legg: https://arxiv.org/abs/cs/0606070. It's interesting to interpret this result in light of transformers. The transformer learning algorithm is fairly simple, but we don't actually care about their performance during training. When trained on a massive amount of text, transformers seem to be able to (learn to) learn a lot of things in context (particularly, during deployment) - in fact, it is their in-context learning which is often compared to Solomonoff induction. But by deployment, it is no longer a raw transformer that is learning; it is an LLM. And an LLM is a highly complicated object.
A maximally compressed object looks like noise, in the sense that it (by definition) cannot be compressed further. This a standard "result" in algorithmic information theory. Insofar as understanding is compression, I visualize acquiring deep knowledge as packing your brain with white noise - or what looks like it, from the outside. To the right machine, it is executable noise. See this related talk about the singularity by Marcus Hutter.
...and Blum's speedup theorem, as mentioned above.
I think it is possible to write a fairly short algorithm for intelligence. I think that algorithm learns and self-modifies, and eventually it looks like noise.
It's not clear that starting from a short algorithm actually buys you anything in terms of safety, once it has self-modified into noise.
Safety must be a design property of the algorithm. Presumably, a safe algorithm has to be optimizing a thing that is safe to optimize. However, it may well be that a complicated algorithm, optimizing the same thing, is about as safe. It's not even clear that the simplicity of an algorithm makes it easier to determine what it is optimizing; there is a sense in which transformers are just optimizing their loss function, which is (during pretraining) exactly log-loss on next-token prediction. But that is not quite right, for reasons that can be roughly summed up as "inner optimization." However, I suspect that the simplest algorithms which eventually unfold into intelligence also have inner optimization problems.
I think we shouldn't expect to be able to implement an optimal algorithm for intelligence at all, and an elegant core engine for intelligence is not necessarily safe. Still, a deeper understand of intelligence probably can lead to somewhat more elegant and hopefully much safer algorithms. The former, because some epicycles can be removed, leading to cleaner analysis. The latter, because the optimization target can be chosen wisely.
Still, it is important to be clear that AGI code golf is not inherently productive for AI safety.
Does the answer seem obvious to you? Of the people who think the answer is obvious, I weakly expect disagreement about both the answer and the reason(s). If you have a strong opinion, I'd appreciate comments both before and after reading the rest of the post.
I think I may have read this phrasing somewhere, but I can't recall where.
See Carnap's quest for a universal "confirmation rule" (in the modern context, usually viewed as a sequence prediction method) and Putnam's diagonal argument for impossibility.
See Professor Tom Sterkenburg's excellent thesis: Universal Prediction
I am currently (as of writing) working out of LISA for ARENA 5.0.
I feel slightly vindicated by this, which I have to admit is comforting since I have changed my mind about a lot of other things lately!