[Intro to brain-like-AGI safety] 2. “Learning from scratch” in the brain

[-]Jan4yΩ8180

Hey Steve! Thanks for writing this, it was an interesting and useful read! After our discussion in the LW comments, I wanted to get a better understanding of your thinking and this sequence is doing the job. Now I feel I can better engage in a technical discussion.

I can sympathize well with your struggle in section 2.6. A lot of the "big picture" neuroscience is in the stage where it's not even wrong. That being said, I don't think you'll find a lot of neuroscientists who nod along with your line of argument without raising objections here and there (neuroscientists love their trivia). They might be missing the point, but I think that still makes your theory (by definition) controversial. (I think the term "scientific consensus" should be used carefully and very selectively).

In that spirit, there are a few points that I could push back on:

Cortical uniformity (and by extension canonical microcircuits) are extremely useful concepts for thinking about the brain. But they are not literally 100% accurate. There are a lot of differences between different regions of the cortex, not only in thickness but also in the developmental process (here or here). I don't think anyone except for Jeff Hawkin believes in literal cortical uniformity.
In section 2.5.4.1 you are being a bit dismissive of biologically-"realistic" implementations of backpropagation. I used to be pretty skeptical too, but some of the recent studies are beginning to make a lot of sense. This one (a collaboration of Deepmind and some of the established neuroscience bigshots) is really quite elegant and offers some great insights on how interneurons and dendritic branches might interact.
A more theoretical counter: If evolution could initialize certain parts of the cortex so that they are faster "up and running" why wouldn't it? (Just so that we can better understand it? How nice!) From the perspective of evolution, it makes a lot of sense to initialize the cortex with an idea of what an oriented edge is because oriented edges have always been around since the inception of the eye.
Or, in terms of computation theory, learning from scratch is computationally intractable. Strong, informative priors over hypothesis space might just be necessary to learn anything worthwhile at all.

But perhaps I'm missing the point with that nitpicking. I think the broader conceptual question I have is: What does "randomly initialized" even mean in the brain? At what point is the brain initialized? When the neural tube forms? When interneurons begin to migrate to the cortex? When the first synapses are established? When the subplate is gone? When the pruning of excess synapses and the apoptosis of cells is over? When the animal/human is born? When all the senses begin to transmit input? After college graduation?

Perhaps this is the point that the "old-timer" also wanted to make. It doesn't really make sense to separate the "initialization" from the "refinement". They happen at the same time, and whether you put a certain thing into one category or the other is up to individual taste.

All of this being said, I'm very curious to read the next parts of this sequence! :) Perhaps my points don't even affect your core argument about AI Safety.

[-]Steven Byrnes4yΩ7110

Thanks!

I don't think anyone except for Jeff Hawkin believes in literal cortical uniformity.

Not even him! Jeff Hawkins: "Mountcastle’s proposal that there is a common cortical algorithm doesn’t mean there are no variations. He knew that. The issue is how much is common in all cortical regions, and how much is different. The evidence suggests that there is a huge amount of commonality."

I mentioned "non-uniform neural architecture and hyperparameters". I'm inclined to put different layer thicknesses (including agranularity) in the category of "non-uniform hyperparameters".

If evolution could initialize certain parts of the cortex so that they are faster "up and running" why wouldn't it?

If you buy the "locally-random pattern separation" story (Section 2.5.4), that would make it impossible for evolution to initialize the adjustable parameters in a non-locally-random way.

in terms of computation theory, learning from scratch is computationally intractable. Strong, informative priors over hypothesis space might just be necessary to learn anything worthwhile at all.

I'm very confused by this. I have coded up a ConvNet with random initialization. It was computationally tractable; in fact, it ran on my laptop!

I guess maybe what you're claiming is: we can't have all three of {learning from scratch, general intelligence, computational tractability}. If so, well, that's a possible thing to believe, although I happen not to believe it. My question would be: why do you believe it? "Learning-from-scratch algorithms" consist of an astronomically large number of algorithms, of which an infinitesimal fraction have ever been even conceived of by humans. I think it's difficult to make blanket statements about the whole category.

I don't see the relevance of Solomonoff Induction here. "Generally intelligent" is a much lower bar than "just as intelligent as a Solomonoff Inductor", right?

I'm also confused about why you think "strong, informative priors over hypothesis space" are not compatible with learning-from-scratch algorithms. The famous example everyone talks about is how ConvNets disproportionately search for patterns that are local (i.e. involve neighboring pixels) and translation-invariant.

What does "randomly initialized" even mean in the brain? At what point is the brain initialized?

Here's an operationalization. Suppose someday we write computer code that can do the exact same useful computational things that the neocortex (etc.) does, for the exact same reason. My question is: Might that code look like a learning-from-scratch algorithm?

[-]Jan4yΩ590

Here's an operationalization. Suppose someday we write computer code that can do the exact same useful computational things that the neocortex (etc.) does, for the exact same reason. My question is: Might that code look like a learning-from-scratch algorithm?

Hmm, I see. If this is the crux, then I'll put all the remaining nitpicking at the end of my comment and just say: I think I'm on board with your argument. Yes, it seems conceivable to me that a learning-from-scratch program ends up in a (functionally) very similar state to the brain. The trajectory of how the program ends up there over training probably looks different (and might take a bit longer if it doesn't use the shortcuts that the brain got from evolution), but I don't think the stuff that evolution put in the cortex is strictly necessary.

A caveat: I'm not sure how much weight the similarity between the program and the brain can support before it breaks down. I'd strongly suspect that certain aspects of the cortex are not logically implied by the statistics of the environment, but rather represent idiosyncratic quirks that were adapted at some point during evolution. Those idiosyncratic quirks won't be in the learning-from-scratch program. But perhaps (probably?) they are also not relevant in the big scheme of things.

I'm inclined to put different layer thicknesses (including agranularity) in the category of "non-uniform hyperparameters".

Fair! Most people in computational neuroscience are also very happy to ignore those differences, and so far nothing terribly bad happened.

If you buy the "locally-random pattern separation" story (Section 2.5.4), that would make it impossible for evolution to initialize the adjustable parameters in a non-locally-random way.

You point out yourself that some areas (f.e. the motor cortex) are granular, so that argument doesn't work there. But ignoring that, and conceding the cerebellum and the drosophila mushroom body to you (not my area of expertise), I'm pretty doubtful about postulating "locally-random pattern separation" in the cortex. I'm interpreting your thesis to cash out as "Given a handful of granule cells from layer 4, the connectivity with pyramidal neurons in layer 2/3 is (initially) effectively random, and therefore layer 2/3 neurons need to learn (from scratch) how to interpret the signal from layer 4". Is that an okay summary?

Because then I think this fails at three points:

One characteristic feature of the cortex is the presence of cortical maps. They exist in basically all sensory and motor cortices, and they have a very regular structure that is present in animal species separated by as much as 64 million years of evolution. These maps imply that if you pick a handful of granule cells from layer 4 that are located nearby, their functional properties will be somewhat similar! Therefore, even if connectivity between L4 and L2/3 is locally random it doesn't really matter since the input is somewhat similar in any case. Evolution could "use" that fact to pre-structure the circuit in L2/3.
Connectivity between L4 and L2/3 is not random. Projections from layer 4 are specific to different portions of the postsynaptic dendrite, and nearby synapses on mature and developing dendrites tend to share similar activation patterns. Perhaps you want to argue that this non-randomness only emerges through learning and the initial configuration is random? That's a possibility, but ...
... when you record activity from neurons in the cortex of an animal that had zero visual experience prior to the experiment (lid-suture), they are still orientation-selective! And so is the topographic arrangement of retinal inputs and the segregation of eye-specific inputs. At the point of eye-opening, the animals are already pretty much able to navigate their environment.

Obviously, there are still a lot of things that need to be refined and set up during later development, but defects in these early stages of network initialization are pretty bad (a lot of neurodevelopmental disorders manifest as "wiring defects" that start in early development).

I'm very confused by this. I have coded up a ConvNet with random initialization. It was computationally tractable; in fact, it ran on my laptop!

Okay, my claim there came out a lot stronger than I wanted and I concede a lot of what you say. Learning from scratch is probably not computationally intractable in the technical sense. I guess what I wanted to argue is that it appears practically infeasible to learn everything from scratch. (There is a lot of "everything" and not a lot of time to learn it. Any headstart might be strictly necessary and not just a nice-to-have).

(As a side point: your choice of a convnet as the example is interesting. People came up with convnets because fully-connected, randomly initialized networks were not great at image classification and we needed some inductive bias in the form of a locality constraint to learn in a reasonable time. That's the point I wanted to make.)

I guess maybe what you're claiming is: we can't have all three of {learning from scratch, general intelligence, computational tractability}.

Interesting, I haven't thought about it like this before. I do think it could be possible to have all three - but then it's not the brain anymore. As far as I can tell, evolutionary pressures make complete learning from scratch infeasible.

[-]Steven Byrnes4y*Ω470

Thanks for your interesting comments!

People came up with convnets because fully-connected, randomly initialized networks were not great at image classification and we needed some inductive bias in the form of a locality constraint to learn in a reasonable time. That's the point I wanted to make.

I'm pretty confused here. To me, that doesn't seem to support your point, which suggests that one of us is confused, or else I don't understand your point.

Specifically: If I switch from a fully-connected DNN to a ConvNet, I'm switching from one learning-from-scratch algorithm to a different learning-from-scratch algorithm.

I feel like your perspective is that {inductive biases, non-learning-from-scratch} are a pair that go inexorably together, and you are strongly in favor of both, and I am strongly opposed to both. But that's not right: they don't inexorably go together. The ConvNet example proves it.

I am in favor of learning-from-scratch, and I am also in favor of specific designed inductive biases, and I don't think those two things are in opposition to each other.

Yes, it seems conceivable to me that a learning-from-scratch program ends up in a (functionally) very similar state to the brain.

I think you're misunderstanding me. Random chunks of matter do not learn language, but the neocortex does. There's a reason for that—aspects of the neocortex are designed by evolution to do certain computations that result in the useful functionality of learning language (as an example). There is a reason that these particular computations, unlike the computations performed by random chunks of matter, are able to learn language. And this reason can be described in purely computational terms—"such-and-such process performs a kind of search over this particular space, and meanwhile this other process breaks down the syntactic tree using such-and-such algorithm…", I dunno, whatever. The point is, this kind of explanation does not talk about subplates and synapses, it talks about principles of algorithms and computations.

Whatever that explanation is, it's a thing that we can turn into a design spec for our own algorithms, which, powered by the same engineering principles, will do the same computations, with the same results.

In particular, our code will be just as data-efficient as the neocortex is, and it will make the same types of mistakes in the same types of situations as the neocortex does, etc. etc.

when you record activity from neurons in the cortex of an animal that had zero visual experience prior to the experiment (lid-suture), they are still orientation-selective

is that true even if there haven't been any retinal waves?

[-]Jan4yΩ370

I'm pretty confused here.

Yeah, the feeling's mutual 😅 But the discussion is also very rewarding for me, thank you for engaging!

I am in favor of learning-from-scratch, and I am also in favor of specific designed inductive biases, and I don't think those two things are in opposition to each other.

A couple of thoughts:

Yes, I agree that the inductive bias (/genetically hardcoded information) can live in different components: the learning rule, the network architecture, or the initialization of the weights. So learning-from-scratch is logically compatible with inductive biases - we can just put all the inductive bias into the learning rule and the architecture and none in the weights.
- But from the architecture and the learning rule, the hardcoded info can enter into the weights very rapidly (f.e. first step of the learning rule: set all the weights to the values appropriate for an adult brain. Or, more realistically, a ConvNet architecture can be learned from a DNN by setting a lot of connections to zero). Therefore I don't see what it could buy you to assume the weights to be free of inductive bias.
- There might also be a case that in the actual biological brain the weights are not initialized randomly. See f.e. this work on clonally related neurons.
Something that is not appreciated a lot outside of neuroscience: "Learning" in the brain is as much a structural process as it is a "changing weights" process. This is particularly true throughout development but also into adulthood - activity-dependent learning rules do not only adjust the weights of connections, but they can also prune bad connections and add new connections. The brain simultaneously produces activity, which induces plasticity, which changes the circuit, which produces slightly different activity in turn.

The point is, this kind of explanation does not talk about subplates and synapses, it talks about principles of algorithms and computations.

That sounds a lot more like cognitive science than neuroscience! This is completely fine (I did my undergrad in CogSci), but it requires a different set of arguments from the ones you are providing in your post, I think. If you want to make a CogSci case for learning from scratch, then your argument has to be a lot more constructive (i.e. literally walk us through the steps of how your proposed system can learn all/a lot of what humans can learn). Either you take a look at what is there in the brain (subplate, synapses, ...), describe how these things interact, and (correctly) infer that it's sufficient to produce a mind (this is the neuroscience strategy); Or you propose an abstract system, demonstrate that it can do the same thing as the mind, and then demonstrate that the components of the abstract system can be identified with the biological brain (this is the CogSci strategy). I think you're skipping step two of the CogSci strategy.

Whatever that explanation is, it's a thing that we can turn into a design spec for our own algorithms, which, powered by the same engineering principles, will do the same computations, with the same results.

I'm on board with that. I anticipate that the design spec will contain (the equivalent of) a ton of hardcoded genetic stuff also for the "learning subsystem"/cortex. From a CogSci perspective, I'm willing to assume that this genetic stuff could be in the learning rule and the architecture, not in the initial weights. From a neuroscience perspective, I'm not convinced that's the case.

is that true even if there haven't been any retinal waves?

Blocking retinal waves messes up the cortex pretty substantially (same as if the animal were born without eyes). There is the beta-2 knockout mouse, which has retinal waves but with weaker spatiotemporal correlations. As a consequence beta-2 mice fail to track moving gratings and have disrupted receptive fields.

[-]Jon Garcia4yΩ260

Great summary of the argument. I definitely agree that this will be an important distinction (learning-from-scratch vs. innate circuitry) for AGI alignment, as well as for developing a useful Theory of Cognition. The core of what motivates our behavior must be innate to some extent (e.g., heuristics that evolution programmed into our hypothalamus that tell us how far from homeostasis we're veering), to act as a teaching signal to the rest of our brains (e.g., learn to generate goal states that minimize the effort required to maintain or return to homeostasis, goal states that would be far too complex to encode genetically).

However, I think that characterizing the telencephelon+cerebellum as just a memory system + learning algorithm is selling it a bit short. Even at the level of abstraction that you're dealing with here, it seems important to recognize that the neocortex is a dynamic generative simulator. It has intrinsic dynamics that generate top-down spatiotemporal patterns, from regions that represent the highest levels of abstraction down to regions that interface directly with the senses and motor system.

At the beginning, yes, it is simulating nonsense, but the key is that it is more than just randomly initialized memory slots. The sorts of hierarchical patterns that it generates contain information about the dynamical priors that it expects to deal with in life. (Cortical waves, for instance, set priors for spatiotemporally local causal interactions. Retinal waves are probably doing something similar.)

The job of the learning algorithm is, then, to bind the dynamics of experience to the intrinsic dynamics of the neural circuitry, so that the top-down system is better able to simulate real-world dynamics in the future. It probably does so by propagating prediction errors bottom-up from the sensorimotor areas up to the regions representing abstract concepts, making adjustments and learning from them along the way.

In my opinion, some sort of predictive coding scheme using some sort of hierarchical, dynamic generative simulator will be necessary (though not sufficient) to building generally intelligent systems. For learning-from-scratch to be effective (i.e., not require millions of labelled training examples before it can make useful generalizations), it needs to have such a head start.

[-]Steven Byrnes4yΩ130

Thanks! I'm not sure we have much disagreement here. Some relevant issues are:

Memory ≠ Unstructured memory (and likewise, locally-random ≠ globally-random): There's certainly a neural architecture, with certain types of connections between certain macroscopic regions.
"just" a memory system + learning algorithm—with a dismissive tone of voice on the "just": Maybe you didn't mean it that way, but for the record, I would suggest that to the extent that you feel wowed by something the neocortex does, I claim you should copy that feeling, and feel equally wowed by "what learning-from-scratch algorithms are capable of". The things that ML people are doing right now are but a tiny corner of a vast universe of possible learning-from-scratch algorithms.
Generative models can be learned from scratch—obviously, e.g. StyleGAN. I imagine you agree, but you mentioned "generative", so I'm just throwing this in.
Dynamics is not unrelated to neural architecture: For example, there's a kinda silly sense in which GPT-3 involves a "wave of activity" that goes from layer 1 through layers 2, 3, 4, …, ending in layer 96. I'm not saying anything profound here—it's just that GPT-3 happens to be feedforward, not recurrent. But anyway, if you think it's important that the neocortex has a bias for waves that travel in certain directions, I'd claim that such a bias can likewise be built out of a (not-perfectly-recurrent) neural architecture in the brain.
Lottery ticket hypothesis: I was recently reading Brain From The Inside Out by Buzsáki, and Chapter 13 had a discussion which to my ears sounded like the author was proposing that we should brain learning as like the lottery ticket hypothesis. E.g. "Rich brain dynamics can emerge from the genetically programmed wiring circuits and biophysical properties of neurons during development even without any sensory input or experience." Then "experience [is] a matching process" between data and this "preexisting dictionary of nonsense words combined with internally generated syntactical rules"—i.e., "preexisting neuronal patterns that coincide with the attended unexpected event are marked as important". That all sounds to me like the lottery ticket hypothesis. And I have no argument there. What I'm not so sure about is who or what he was arguing against. Obviously the lottery ticket hypothesis is compatible with learning-from-scratch with locally-random initialization, indeed that's the context in which that term is universally used. Maybe Buzsáki is arguing against globally-random, unstructured networks, or something? (Does anyone believe that? I dunno.) Anyway, that's Buzsáki not you, but I bring it up because I felt like I was maybe getting similar vibes from your comment.

[-]Jon Garcia4yΩ060

Memory ≠ Unstructured memory (and likewise, locally-random ≠ globally-random): [...]

Agreed. I didn't mean to imply that you thought otherwise.

"just" a memory system + learning algorithm—with a dismissive tone of voice on the "just": [...]

I apologize for how that came across. I had no intention of being dismissive. When I respond to a post or comment, I typically try to frame what I say for a typical reader as much for the original poster. In this case, I had a sense that a typical reader could get the wrong impression about how the neocortex does what it does if the only sorts of memory systems and learning algorithms that came to mind were things like a blank computer drive and stochastic gradient descent on a feed-forward neural network.

You are absolutely right that the neocortex is equipped to learn from scratch, starting out generating garbage and gradually learning to make sense of the world/body/other-brain-regions/etc., which can legitimately be described as a memory system + learning algorithm. I just wanted anyone reading to appreciate that, at least in biological brains, there is no clean separation between learning algorithm and memory, but that the neocortex's role as a hierarchical, dynamic, generative simulator is precisely what makes learning from scratch so efficient, since it only has to correlate its intrinsic dynamics with the statistically similar dynamics of learned experience.

I'm sure that there are vastly more ways of implementing learning-from-scratch, maybe some much better ways in fact, and I realize that the exact implementation is probably not relevant to the arguments you plan to make in this sequence. I just feel that a basic understanding of what a real learning-from-scratch system looks like could help drive intuitions of what is possible.

Generative models can be learned from scratch [...]

Indeed, but of course including their own particular structural priors.

Dynamics is not unrelated to neural architecture: [...]

Well, what is a recurrent neural network after all but an arbitrarily deep feed-forward neural network with shared weights across layers? My comment on cortical waves was just to point out a clever way that the brain learns to organize its cortical maps and primes them to expect causality to operate (mostly) locally in space and time. For example, orientation columns in V1 may be adjacent to each other because similarly oriented edges (from moving objects) were consistently presented to the same part of the visual field close in time, such that traveling waves of excitation would teach pyramidal cell A to learn orientation A at time A and then teach neuron B to learn orientation B at time B.

Lottery ticket hypothesis: [...]

"Lottery tickets" (i.e., subnetworks with random initializations that just so happen to give them the right inductive bias to generalize well from the training data for a particular task) probably occur in the brain as much as in current deep learning architectures. However, the issue in DL is that the rest of the network often fails to contribute much to test performance beyond what the lottery ticket subnetwork was able to achieve, as though there was a chasm in model space that the other subnetworks were unable to cross to reach a solution. Evolution seems to have found a way around this problem, at least by the time the neocortex came along, in the sense that the brain seems adept at giving every subnetwork a useful job.

Again, I think the nature of the neocortex as a generative simulator is what makes this feasible with sparse training data. The structure and dynamics of the neocortex have enough in common statistically with the structure and dynamics of the natural world that it is easy for it to align with experience. In contrast, the structure and (lack of) dynamics of current DL systems makes them more brittle when trying to make sense of the natural world.

All that being said, I realize that my comments might be taking things down a rabbit hole. But I appreciate your feedback and look forward to seeing your perspective fleshed out more in the rest of this sequence.

[-]Tapatakt4y50

Russian translation

[-]Eli Tyre21d30

He talks about it here (30:00)

This is a dead link.

[-]Steven Byrnes20d30

Fixed, thanks

[-]Eli Tyre21d20

The single most important question in AGI safety is: Is the AGI trying to do something that we didn’t intend for it to be trying to do?

Thinking aloud: Is this right?

It seems like "maybe, for a narrow notion of 'intend'."

Like, if we build a superintelligence to solve problems that are eluding humans, the superintelligence will be doing things that we didn't intend (because we didn't think of them) all the time.

[-]Steven Byrnes20d50

I guess I’d say “the thing we intended for the AGI to be trying to do” can be vague, or described at a meta-level, as opposed to very specific.

I didn’t mean for that sentence to be making a specific controversial claim about alignment targets. I generally see “alignment targets” as an open question (see a footnote in post 10).

[-]Lucius Bushnaq3y10

By contrast, I’m going to talk relatively little about the nuts-and-bolts of how the learning algorithms work. That would be a complicated story which is not particularly relevant for AGI safety.

Disagree. If someone could figure out what these algorithms are, I think that could be a useful component in the tech tree needed to produce training setups, architectures and learning algorithms that make networks converge towards learning human-like primitive goals and desires ("Shards").

Whether a network placed in an information environment similar to that of a human baby will learn circuits similar in function to those the baby brain learns seems like it could potentially depend on differences in the learning algorithms (ADAM vs whatever brains use) as well as architecture.

At the very least, it would be good to confirm that the algorithms don't have any particularly nasty surprises for us. Like, say, they turn out to operate "non-locally" in parameter space, unlike Gradient Descent or genetic algorithms or anything else we're using, and so the structures they find are totally unlike the structures our algorithms find, so it's nigh impossible to evolve human-like motivations with GD variants.

[-]Steven Byrnes3y80

I mean, there’s a sense in which every aspect of developing AGI capabilities is “relevant for AGI safety”. If nothing else, for every last line of AGI source code, we can do an analysis of what happens if that line of code has a bug, or if a cosmic ray flips a bit, and how do we write good unit tests, etc.

So anyway, there’s a category of AGI safety work that we might call “Endgame Safety”, where we’re trying to do all the AGI safety work that we couldn’t or didn’t do ahead of time, in the very last moments before (or even after) people are actually playing around with the kind of powerful AGI algorithms that could get out of control and destroy the future.

My claims are:

If there’s any safety work that requires understanding the gory details of the brain’s learning algorithms, then that safety work in the category of “Endgame Safety”—because as soon as we understand those gory details, then we’re spitting distance from a world in which hundreds of actors around the world are able to build very powerful and dangerous superhuman AGIs. My argument for that claim is §3.7–§3.8 here. (Plus here for the “hundreds of actors” part.)
The following is a really bad argument: “Endgame Safety is really important, so let’s try to make the endgame happen ASAP, so that we can get to work on Endgame Safety.” It’s a bad argument because, What’s the rush? There’s going to be an endgame sooner or later, and we can do Endgame Safety Research then! Bringing the endgame sooner is basically equivalent to having all the AI alignment and strategy researchers hibernate for some number N years, and then wake up and get back to work. And that, in turn, is strictly worse than having all the AI alignment and strategy researchers do what they can during the next N years, and also continue doing work after those N years have elapsed. I claim that there is plenty of safety work that we can do right now that is not in the category of “Endgame Safety”, and in particular that posts #12–#15 are in that category (and they have lots more open questions!).

[-]Lucius Bushnaq3y30

I don‘t need the gory details, but „the brain is doing some variant of gradient descent“ or „the brain is doing this crazy thing that doesn‘t seem to depend on local information in the loss landscape at all“ would seem like particularly valuable pieces of information to me, compared to other generic information about the AGI we have, for things I am working on right now.

^{^}

I keep saying that “learning from scratch” implies “unhelpful for behavior at birth”. This is an oversimplification, because it’s possible for “within-lifetime learning” to happen in the womb. After all, there should already be plenty of data to learn from in the womb—interoception, sounds, motor control, etc. And maybe retinal waves too—those could be functioning as fake sensory data for the learning algorithm to learn from.

^{^}

Minor technicality: Why did I say the input data and supervisory signals for the cortex (for example) come from outside the cortex? Can’t one part of the cortex get input data and/or supervisory signals from a different part of the cortex? Yes, of course. However, I would instead describe that as “part of the cortex’s neural architecture”. By analogy, in ML, people normally would NOT say “ConvNet-layer-12 gets input data from ConvNet-layer-11”. Instead, they would be more likely to say “The ConvNet (as a whole) gets input data from outside the ConvNet”. This is just a way of talking, it doesn't really matter.

^{^}

I’m framing this as a “made-up example” because I’m trying to make a simple conceptual point, and don’t want to get bogged down in complicated uncertain empirical details. That said, the bird song thing is not entirely made up—it’s at least “inspired by a true story”. See discussion here of Gadagkar 2016, which found that a subset of dopamine neurons in the songbird brainstem send signals that look like RL rewards for song quality, and those signals go specifically to the vocal motor system, presumably training it to sing better. The missing part of that story is: what calculations are upstream of those particular dopamine neurons? In other words, how does the bird brain judge its own success at singing? For example, does it match its self-generated auditory inputs to an innate template? Or maybe the template is generated in a more complicated way—say, involving listening to adult birds of the same species? Or something else? I’m not sure the details here are known—or at least, I don’t personally know them.

^{^}

Why is it called “pattern separation”? It’s kinda related to the fact that a pattern-separator has more output lines than input lines. For example, you might regularly encounter five different “patterns” of sensory data, and maybe all of them consist of activity in the same set of 30 input lines, albeit with subtle differences—maybe one pattern has such-and-such input signal slightly stronger than in the other patterns, etc. So on the input side, we might say that these five patterns “overlap”. But on the output side, maybe these five patterns would wind up activating entirely different sets of neurons. Hence, the patterns have been “separated”.

^{^}

In other examples, I think pattern separation is serving other purposes too, e.g. sparsifying the neuron activations, which turns out to be very important for various reasons, including not getting seizures.

^{^}

If you want to dive into the rapidly-growing literature on biologically-plausible backprop-ish algorithms, a possible starting point would be References #12, 14, 34–38, 91, 93, and 94 of A deep learning framework for neuroscience.

^{^}

There is a field of “machine learning interpretability”, dedicated to interpreting the innards of learned-from-scratch “trained models”—example. I (along with pretty much everyone else working on AGI safety) strongly endorse efforts to advance that field, including tackling much bigger models, and models trained by a wider variety of different learning algorithms. Also on this topic: I sometimes hear an argument that a brain-like AGI using a brain-like learning algorithm will produce a relatively more human-interpretable trained model than alternatives. This strikes me as maybe true, but far from guaranteed, and anyway “relatively more human-interpretable” is different than “very human-interpretable”. Recall that the cortex has ≈100 trillion synapses, and an AGI could eventually have many more than that.

	Learning-from-scratch algorithms	Individual innate adjustable parameters
Stereotypical example to keep in mind:	Every deep learning paper: there’s a learning algorithm that gradually builds a trained model by adjusting lots of parameters.	Some connection in the rat brain that strengthens when the rat wins a fight—basically, it’s a counter variable, tallying how many fights the rat has won over the course of its life. Then this connection is used to implement the behavior “If you’ve won lots of fights in your life, be more aggressive.” (ref)
Number of parameters that change based on input data (i.e. how many dimensions is the space of all possible trained models?)	Maybe lots—hundreds, thousands, millions, etc.	Probably few—even as few as one
If you could scale it up, would it work better after training?	Yeah, probably.	Huh?? WTF does “scale it up” mean?

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

68

[Intro to brain-like-AGI safety] 2. “Learning from scratch” in the brain

68

Ω 24

68

Ω 24

2.1 Post summary / Table of contents

2.2 What is “learning from scratch”?

2.3 Three things that “learning from scratch” is NOT

2.3.1 Learning-from-scratch is NOT “blank slate”

2.3.2 Learning-from-scratch is NOT “nurture-over-nature”

2.3.3 Learning-from-scratch is NOT the more general notion of “plasticity”

2.4 My hypothesis: the cortex, extended striatum, and cerebellum learn from scratch; the hypothalamus and brainstem don’t

2.5 Evidence on whether the cortex, striatum, and cerebellum learn from scratch

2.5.1 Big-picture-type evidence

2.5.2 Neonatal evidence

2.5.3 “Uniformity” evidence

2.5.4 Locally-random pattern separation

2.5.4.1 What is pattern separation?

2.5.4.2 Where is pattern-separation?

2.5.4.3 Why does pattern separation suggest learning-from-scratch?

2.5.5 Summary: I don’t pretend that I’ve proven the hypothesis of learning-from-scratch cortex, striatum, and cerebellum, but I’ll ask you to suspend disbelief and read on

2.6 Is my hypothesis consensus, or controversial?

2.7 Why does learning-from-scratch matter for AGI safety?

2.8 Timelines-to-brain-like-AGI part 1/3: how hard will it be to reverse-engineer the learning-from-scratch parts of the brain, well enough for AGI?

Changelog