TL;DR: We showed how Hebbian learning with weight decay could enable a) feedforward circuits (one-to-many) to extract the first principal component of a barrage of inputs and b) recurrent circuits to amplify signals which are present across multiple input streams and suppress signals which are likely spurious.

Short recap

In the last post, we introduced the following idea:

  • We don’t have a way to formalize concepts and transfer them so that an alien agent understands them. Think back to the tree conversation - how would you describe a tree to an AGI?
  • Yet, we aren’t facing the same issues when communicating concepts between humans.
  • We concluded that this must have something to do with how the brain learns.


In this post, we introduce a learning rule that is (presumably) used by biological brains and connect it with the type of circuits that emerge in the brain under different types of input. This connection will serve as the mathematical foundation for exploring how the brain forms natural abstractions.

We will consider two scenarios in the brain: (a) how the brain learns in a "many-to-one" setup, where several neurons project onto a single neuron in another layer, and (b) how the brain learns in an "all-to-all" setup, where several neurons in a layer are connected to each other.

The current post tries to make the derivations accessible but still focuses on mathematics. In the next post, we will discuss the implications of our derivations by delving into neuroscience and related topics.

Definitions and notation:

The notation used in this post:

  •  refers to the statistical average of a variable .
  •  and  refer to the activation of neuron i and neuron j. Without loss of generality, we will assume that the activity of all neurons centres around 0 so that .
  •  refers to the strength of the synapse connecting neuron  to neuron . A higher weight implies a higher likelihood of neuron i triggering neuron j.
  • We define the output of a single neuron as .
  • We will use the delta notation for derivatives, i.e. . This means that from  to , the weight changes with: .
  • For the derivations, we will switch to vector notation in some places. When we do that, we remove indices and make the letter bold.

Definition of Natural abstractions: Abstractions are lower-dimensional but high-level summaries of the things out there. Often, we can find relevant abstractions ‘further ahead (causally, but also in other senses)’ for prediction. They are natural in the sense that we expect a wide selection of intelligent agents to converge on them (Wentworth, 2021).

Definition of Hebbian learning: While there is a plethora of learning rules, with varying degrees of biological relevance and plausibility, most rules are derivates of the Hebbian principle: neurons that fire together wire together (attributed to Carla Shatz). Thus, an increase in synaptic efficacy arises from a presynaptic cell's repeated and persistent stimulation of a postsynaptic cell. The naive implementation of this principle () is unstable, i.e. the numerical values of the weights grow indefinitely (see Appendix). To resolve this instability, variations of Hebbian learning have been proposed[1].

Definition Hebbian learning with weight decay: For the sake of simplicity, we will focus on one of the simplest variants of Hebbian learning, Hebbian learning with linear weight decay:

Here,  represents a time constant that regulates the speed of learning.  is a scalar decay factor that controls the speed of weight decay. This rule has several key advantages:

  • the rule is stable, i.e. it avoids Hebbian runaway dynamics (see Appendix).
  • the rule is biologically plausible in that homeostatic downregulation of large synapses appears as a key mechanism for memory consolidation (Torrado Pacheco et al., 2021).
  • weight decay is important for training deep neural networks, where it acts as a natural regularizer for improved generalization (Xie et al, 2021).

The structural simplicity of the rule also allows for clean derivations in the rest of this article. We can also derive equivalent results with other rules[2].

The Setup

A classic framework for understanding information processing in the biological brain is the hierarchical processing framework. In this framework, sensory information enters the brain through the sensory organs (eyes, ears, taste buds, olfactory system, somatosensory system, …). This information then progresses through layers of the cortical hierarchy. At each step of the hierarchy, neural circuits process the sensory information and integrate it into a coherent whole with prior information. The further up in the cortical hierarchy a neuron lives, the higher-order the information that neuron processes will be.

Hierarchical processing in the ventral stream. (Manassi et al 2013)

While the classic framework has limitations[3], it still provides a useful approximation of information processing in the biological brain. We focus on two abstract circuits that are ubiquitous throughout the classic framework:

Feedforward circuit: Given two distinct populations of neurons, how does the brain learn the appropriate neural projections from one population to the other? Generally, the feedforward circuit is a "many-to-one" setup, where several neurons project onto a single neuron in another layer.

Recurrent circuit: Given a population of neurons, how does the brain learn the appropriate connections of neurons within the circuit? Generally, we interpret a recurrent circuit as an "all-to-all" setup, where several neurons in a layer are connected to each other. Even though in practice, not all neurons connect with all other neurons, we can still apply the all-to-all setup, where most of the connection strengths are set to zero (see Ko et al., 2011 for some biological background).

‘Many-to-one’: Feedforward circuits - analysis

In this scenario, individual neurons receive multiple inputs from another population of neurons (as is the case for pyramidal neurons in layer 2/3 of the cortex). Speaking in terms of information processing, each neuron faces the task of extracting “relevant” information from a barrage of synaptic inputs. So, the neuron has to prioritize some inputs over others, depending on its role in the circuit. This role emerges during early brain development in an activity-dependent fashion through the flexible self-organization of neural circuits (Kirchner, 2022).

We model the activation of the target neuron, , as the weighted sum over all presynaptic inputs, :

Given this characterization, we can plug the equation for the activation of the target neuron into equation (1) to arrive at a formulation of the weight dynamics in terms of presynaptic activity:

As we care about the connectivity of the circuit at the end of development, we can assume that the system is in a steady state, i.e. , to arrive at

or, in vector notation,

Let’s equate  with the covariance matrix of the inputs, , under the assumption that the average activity of the inputs centres around zero[4]:

We recognize that this equation is the eigenvector equation, i.e. we learn that the vector of synaptic weights, , should be an eigenvector of the covariance matrix, . Under reasonable assumptions, we can furthermore derive that  will be proportional to the eigenvector of the covariance matrix with the largest eigenvalue (Oja, 1983; 1992).

In summary, a neuron in the feedforward circuit will learn to extract a principal component of its input:

Each red arrow indicates the extracted principal component that emerged through Hebbian learning of input weights.

We have arrived at the principal component by deriving it as the eigenvector of the covariance matrix. Interestingly, the principal component analysis identifies the eigenvector corresponding to the largest eigenvalue of the covariance matrix as the projection of the input space that retains the largest amount of variance. In our case, this is the weight vector  at the point of convergence.

In later posts, we will expand on our observations in this post, where we also show the connection between maximizing projected variance and maximizing mutual information. This connection will be crucial for understanding how natural abstractions can arise in the biological brain.

‘All-to-All’: Recurrent circuit - analysis

The second important component of the hierarchical processing framework is the recurrent circuit. In this circuit, individual neurons within a population interconnect, creating a network that allows information to pass back and forth between neurons. This type of circuit is important for tasks like memory and pattern recognition, where information from multiple sources is integrated and processed over time.

In the recurrent circuit, we have to consider a quadratic number of possible connections, , rather than the linear number of connections from the feedforward circuit, . In particular, the rule for Hebbian learning with weight decay now becomes

Here we assume that the feedforward input into neurons $i$ and $j$ dominate the activity variables,  and . This connects to the insight we gained in the last section: each neuron in the population receives a barrage of inputs and has to prioritize. Thus, the activity variables  and  depend on the inputs received from the previous layers. This assumption is biologically plausible, as sensory stimulation indeed dominates neural activity (Alenda et al., 2010; Stringer et al., 2019), and recurrent connections become more important in the absence of sensory stimulation (Litwin-Kumar et al., 2014).

When writing the Hebbian learning rule thus, we can again introduce a steady state assumption, i.e. , to investigate the circuit connectivity at the end of development:

or in matrix notation, . In the case of a recurrent circuit, the structure of the circuit ends up mirroring the correlational structure of the input. Each synaptic connection between two neurons corresponds to an entry in the covariance matrix. This means that the recurrent activity of the circuit will amplify signals that are present across multiple input streams and suppress signals that don’t (and are thus likely spurious).

Limitations and up-next

The material in this post is by no means novel and is (with many variations) well-established introductory material in computational neuroscience. Still, wrinkles in the above story continuously appear, and many PhD, theses are written about those wrinkles, so we stress that we do not aim to provide a full picture but only a first-order approximation.

In the next post, we want to list and explain (some of) the empirical evidence from neuroscience that corroborates the theory we outlined in this post.


  1. ^
  2. ^

    For a similar derivation for Oja’s rule, see this article.

  3. ^

    For example, it neglects feedback connections and multimodality.

  4. ^

    Note that we assumed above that the average activity of all cells is zero, which justifies us calling  the covariance matrix. We leave it to the motivated reader to convince themselves that the derivation also works with non-zero average firing rates and an appropriate offset in the learning rule.


Why is pure Hebbian learning biologically implausible?

See this graph, which shows, for 250 iterations with a learning rate of 0.1, how the output of the Hebbian learning algorithm behaves and develops:

Here are the plotted weight vectors , also for 250 Iterations:

This shows that with enough iterations, the weights of neurons equipped with Hebbian learning grow explosively - no decay term limits their growth.

In contrast, Hebbian learning with a linear weight decay term is relatively stable:


New Comment
2 comments, sorted by Click to highlight new comments since: Today at 8:05 PM

This seems related and might be useful to you, especially (when it comes to Natural Abstractions) the section 'Linking Behavior and Neural Representations': 'A mathematical theory of semantic development in deep neural networks' 

Uhhh exciting! Thanks for sharing!

New to LessWrong?