[Intro to brain-like-AGI safety] 8. Takeaways from neuro 1/2: On AGI development

Thanks yourself!

when people use a model (aka query a trained model on validation set)

You say “aka”, but those seem different to me. For example, in regards to GPT-3, we can consider:

Training: Weights are updated by self-supervised learning
Evaluation: OpenAI staff use the trained model on data that wasn't part of training, in order to estimate things like perplexity, performance on benchmarks, etc.
Use / deployment: Some random author buys access to the OpenAI API and uses GPT-3 to help them to brainstorm how to advance the plot of a short story that they're writing.

Could it be the case that the "evolution from scratch" model is learned in the Learned Content of the "ML code" approach? Is that what the mesa-optimization line suggests?

We're talking about the diagram in Section 8.3, right side. I interpret your comment as saying: What if the “Learned content” box was sufficiently powerful that it could, say, implement any computable function? If so, then a whole second, separate model-based RL system could appear inside that “Learned content” box. (Is that what you're saying?)

If so, I agree in principle. But in practice I expect the “Leaned content” box to not be able to implement any computable function, or (more specifically) to run all the machinery of an entire separate “mesa” model-based RL system. Instead I expect it to be narrowly tailored to performing an operation that we might describe as “querying and/or updating a probabilistic world-model”. (And value function and so on.)

So I think “mesa-optimizers”, as the term is normally used today, are really specific to the “evolution from scratch” model, and not a useful thing to talk about in the context of the “genome = ML code” model.

^{^}

For example, here’s a random neural architecture search (NAS) paper: “The evolved transformer”. The authors brag about their “large search space”, and it is a large search space by the standards of NAS. But searching through that space still yields only 385 bits of information, and the end result fits in one easily-human-legible diagram in the paper. By contrast, the weights of an ML trained model may easily comprise millions or billions of bits of information, and the end result requires heroic effort to understand. We can also compare those 385 bits to the number of bits of information in the human-created parts of the learning algorithm source code, such as the code for matrix multiplication, softmax, autograd, shuttling data between the GPU and the CPU, and so on. The latter parts comprise orders of magnitude more than 385 bits of information. This is what I mean when I say that things like hyperparameter tuning and NAS contribute a tiny proportion of the total “design work” in a learning algorithm.

(The most outer-loop-search-reliant paper that I know of is AutoML-Zero, and even there, the outer-loop search contributed effectively 16 lines of code, which the authors had no trouble understanding.)

^{^}

If you’re curious for some ballpark estimates of how much time and money would it take to perform an amount of computation equivalent to the entire history of animal evolution on Earth, see the “Evolution anchor” discussion in Ajeya Cotra’s 2020 draft report on biological anchors. Obviously, this is not exactly the same as the amount of computation required for evolution-from-scratch AGI development, but it’s not entirely irrelevant either. I won’t talk about this topic more; I don’t think it’s important, because I don’t think evolution-from-scratch AGI development will happen anyway.

[-]christos4y40

Hey Steven, im new in the LW community so please excuse my formatting.

Case #1 would involve changing the model weights, while Case #2 would not. Instead, Case #2 would solely involve changing the model activations.

I am confused about the deployment part of offline training. Is it not the case that when people use a model (aka query a trained model on validation set), they seek to evaluate and not fit the new examples? So would it not be about changing weights in online learning vs using the relevant activations in offline mode?

Two models for AGI development. The one on the left is directly analogous to how evolution created human brains. The one on the right involves an analogy between the genome and the source code defining an ML algorithm, as spelled out in the next subsection.

Thanks!

[-]Steven Byrnes4y30

“Genome = ML code” analogy
Human intelligence	Today’s machine learning systems
Human genome	GitHub repository with all the PyTorch code for training and running the Pac-Man-playing agent
Within-lifetime learning	Training the Pac-Man-playing agent
How an adult human thinks and acts	Trained Pac-Man-playing agent
Evolution	Maybe the ML researchers did an outer-loop search for a handful of human-legible adjustable parameters—e.g., automated hyperparameter tuning, or neural architecture search.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

57

[Intro to brain-like-AGI safety] 8. Takeaways from neuro 1/2: On AGI development

57

Ω 23

57

Ω 23

8.1 Post summary / Table of contents

8.2 “One Lifetime” turns into “One training run”

8.2.1 How long does it take to train a model?

8.2.2 Online learning implies no fundamental training-versus-deployment distinction

8.2.3 …Nevertheless, the conventional ML wisdom that “training is more expensive than deployment” still more-or-less applies

8.2.4 Online learning is bad for safety, but essential for capabilities

8.3 Evolution-like outer-loop automated searches: maybe involved, but not the “lead designer”

8.3.1 The “Genome = ML code” analogy

8.3.2 Why I think “evolution from scratch” is less likely (as an AGI development method) than “genome = ML code”

8.3.3 Why “evolution from scratch” is worse than “genome = ML code” (from a safety perspective)

8.3.3.1 Is it a good idea to build human-like social instincts by evolving agents in a social environment?

8.4 Other non-hand-coded things that might go in a future brain-like-AGI Steering Subsystem

8.4.1 Pre-trained image classifiers, etc.

8.4.2 A tower of AGIs steering AGIs?

8.4.3 Humans steering AGIs?

Changelog