Musings on Cumulative Cultural Evolution and AI

calebo

This post might be interesting to you if you want a conceptual model of cumulative cultural evolution and/or you’re curious how cumulative cultural evolution impacts AI forecasting and development.

In particular, I’ll argue that cumulative cultural evolution makes one argument for the discontinuity of AI progress more plausible and sketches out at least two possible paths of development that are worth further investigation.

Cumulative Cultural Evolution

Humans have altered more than one-third of the earths’ land surface. We cycle more nitrogen than all other terrestrial life forms combined and have now altered the flow of two-thirds of the earth’s rivers. Our species uses 100 times more biomass than any large species that has ever lived. If you include our vast herds of domesticated animals, we account for more than 98% of terrestrial vertebrate biomass. — Joseph Henrich

Cumulative cultural evolution is often framed as an answer to the general question: what makes humans so successful relative to other apes? Other apes do not occupy as many continents, are not as populous, and do not shape or use environment to the extent that humans do.

There are a smorgasbord of different accounts of this, referencing:

human capability for deception ratcheted up human sociality and intelligence via arms race like effects.
runaway signaling.
human’s general intelligence or improvisational intelligence
humans’ prosociality
a package of instinctual modules, such as language, intelligence, social learning, folk physics and biology
human’s cultural learning abilities

In this section, I’ll explain the account that refers to human’s cultural learning abilities. Cultural learning typically refers to a subset of social learning abilities such as mindreading, imitation, and teaching. I’ll use cultural and social learning interchangeably here.

Muthukrishna, Doebeli, Chudek, and Henrich’s recent paper “The Cultural Brain Hypothesis: How culture drives brain expansion, sociality, and life history” contains one of the best conceptual models of the cumulative cultural evolution story.

In the paper, Muthukrishna and co are in the business of making mathematical models which they can use to simulate the impact of brain size, sociality, mating structures, and life history. I’ll go the conceptual features of the primary model here and motivate it’s plausibility.

The components of the model are:

assumptions
lifecycle
parameters

Assumptions

Muthukrishna and co make the following assumptions:

larger brains are more expensive
larger brains corresponds to an increased capacity to store and manage adaptive information
adaptive information reduces the probability that its bearer will die or increases the probability that it will reproduce.

One way to think about the role of adaptive information is to think of humans as occupying “the cognitive niche.” Just as there are niches that particular species may exploit due to their biological features, a species may exploit a cognitive niche through gathering, managing, and applying information. Humans are apparently unique with regards to the range of information we can gather, the ability to apply that information during development time, and the ability to pass that on that information through generations. Occupying the cognitive niche allows a species like homo sapiens to innovate new tools and manipulate the environment using fire to smart phones.

Managing and storing information requires large brains and large brains are expensive. Large brains are harder to feed, birth, and develop. There are then two conflicting selection pressures: the costliness of large brains and the advantageousness of adaptive information.

Lifecycle

To tease out the impact of these different pressures Muthukrishna and co run a move agents through a lifecycle of BIRTH, LEARNING, MIGRATION, and SELECTION. These steps are straightforward, agents are born, spend time learning asocially or socially, migrate between groups, and then are selected according to the amount of adaptive knowledge they acquired, costliness of their brain size, and the environmental payoff.

Parameters

The model includes the following parameters:

Transmission fidelity

how accurately can an agent learn from others?

Asocial learning

how efficient is asocial learning?

Ecological richness

what is the environmental payoff for adaptive knowledge?

Reproductive skew

how much more do those with more adaptive knowledge reproduce? Groups with a pair bonding structure will have less reproductive skew. Groups with a polygynous mating structure will have more reproductive skew.

These parameters are modified for different agents and groups.

Results

What is the result of the simulation? The simulation outputs out the following causal relationships:

Larger brains allow for more adaptive knowledge. This creates a selection pressure for larger brains. This is true for both social and asocial learners.
More adaptive knowledge allows for a larger carrying capacity. As an agents invest in more social learning, this creates a selection pressure for larger groups as larger groups are richer sources of adaptive knowledge than smaller groups.
Larger groups of individuals will tend to have more adaptive knowledge, this puts pressure on longer juvenile periods for social learners.
Extended juvenile periods creates selection pressures for better learning biases, in particular biases concerning who to learn from.
Better learning biases and oblique learning “lead to the realm of cumulative cultural evolution”

Here’s a nifty picture displaying these relationships:

These results are for the most part, weakly, verified in the empirical world.

Brain size and social learning

This relationship appears to hold for primates and birds.

Larger groups & larger brains

This relationship holds for primates, but not for other taxa.

Brain size and juvenile period

There is a positive relationship for primates between brain size and juvenile period.

Group size and juvenile period

There is a positive relationship for primates between groups size and absolute juvenile period.

See 19-22 for relationship strengths and caveats.

The general model provides a rather neat picture. Crucially, there are positive feedback loops between larger brains and social learning in the right environment. This in turn pushes towards longer juvenile periods and larger groups.

Cumulative Cultural Evolution

Under the right parameter values, Muthukrishna and co saw that a species which undergoes something like cumulative cultural evolution can be generated. Recall the parameters of the model:

transmission fidelity
reproductive skew
asocial learning
ecological richness

Let a cumulative cultural be the phenomenon where a group contains far more adaptive knowledge than could likely be generated by all of its individual members via asocial learning. Basically, a group that exhibits cumulative culture would be a species where it is exceedingly unlikely that any of its members could generate its pool of adaptive knowledge via asocial learning.

What values of the model parameters would produce cumulative culture?

Before reading on, answer this question on your own.

Very high transmission fidelity.

In order for adaptive knowledge to accumulate in a species that species needs to be able to accurately transmit that knowledge through generations.

Low reproductive skew.

What would happen if there were a high reproductive skew? Then individuals who learn asocially and have especially large brains would be rewarded and survive. However, populations with large brained asocial learners eventually go extinct. This is because as large brained asocial learners survive variance in social learning ability decreases. As variance in social learning ability decreases the population is: (i) unable to cheaply accumulate knowledge via social learning and (ii) unable to transition to smaller brained social learners (remember, brains are expensive!).

Moderate asocial learning.

Social learners face a bootstrapping problem -- the adaptive knowledge must come from somewhere. This means that the species needs to be able to generate innovations asocially. However, the species cannot invest in asocial learning too much, otherwise asocial learning may be too efficient and social learning will not take off.

Ecological richness.

Finally, adaptive knowledge must payoff. Brains are too expensive to grow, unless there is a significant benefit to their becoming larger. The ecology can account for that benefit.

These parameter values offer an explanation of human brain size, social learning capabilities, and general success. Moreover, they may also explain why species like humans are very low in number. In order for there to be a species that invests in social learning at a very high rate:

Ecologies must be sufficiently rich.
There must be a low reproductive skew
A species must be in the goldilocks zone with respect to asocial learning
Transmission fidelity must be very high

In this environment there is significant pressure to:

increase brain size
increase social learning
increase social learning efficiency

I personally think this stuff is very exciting. We have a model for how brain sizes could 3x and how humans can emerge as a uniquely social species. There are additional insights about the value of information and importance of mating structure.

AI Forecasting and Development

Now, what, if anything, is the import of this for AI forecasting and development?

There’s an argument that’s been floating awhile for sometime that goes something like this:

Humans are vastly more successful in certain ways than other hominids, yet in evolutionary time, the distance between them is small. This suggests that evolution induced discontinuous progress in returns, if not in intelligence itself, somewhere approaching human-level intelligence. If evolution experienced this, this suggests that artificial intelligence research may do also. — AI Impacts

One can push back on the argument above with the claims that:

Evolution wasn’t optimizing for improving human intelligence
Human evolutionary success is not due to human intelligence

In response to the first claim, there's good reason to believe, from the model above, that there are feedback loops that enable a species’ brain size and social learning capabilities to take off. Moreover, evolution was “targeting” this takeoff due the selection pressures for adaptive information and against large brains. Large brain social learners were significantly “rewarded” in the right ecosystems. In the relevant sense, evolutionary dynamics optimized for and pushed humans’ asocial and social learning capabilities upward.

In response to the second claim, whether or not humans are significantly better asocial learners than our primate relatives is unclear. Obviously adult humans have significantly higher intelligence than our primate relatives, however it’s unclear to what extent this higher intelligence is a result of asocial learning rather than social learning. What is clear is that, from a very early age, humans are vastly better social learners than our primate ancestors. Given this, it’s plausible to claim that human success, at least relative to other apes, derived from our asocial and social learning abilities. These abilities enabled humans to occupy the cognitive niche.

Cumulative cultural evolution renders this argument more plausible. However, before getting too excited, it’s worth noting that cumulative cultural evolution is one plausible account of human success and intelligence among an array of plausible accounts.

So much for forecasting, what of artificial intelligence development? The development of human intelligence suggested by the cumulative cultural evolution story, followed the chain below:

moderate asocial learning => social learning.

One can imagine machine learning work developing sufficient asocial learning techniques and then ratcheting capabilities forward by combining previous work through social learning techniques (potentially via imitation learning or model transfer, but likely much more sophisticated techniques). On this model asocial learning (probably made up by a number of different modules and techniques), enables social learning to become a winning strategy. However, social learning is an independent thing, it is not built on top of asocial learning modules.

Related to this, more work needs to be done determining capacities primarily drive human social learning. Henrich suggests that it is both mindreading and imitation. Tomasello’s work stresses mindreading, in particular the ability for humans to develop joint attention . Answers to this issue would provide at least some evidence about what machine learning algorithms are likely to be more successful.

Another issue brought up by this work is whether social learning is an instinct, that is (roughly) whether it is encoded by genes and not brought into existence via culture, or whether it is a gadget. A gadget is not encoded by genetic information, but is instead developed by cultural means. Suggestive work by Heyes argues that social learning capabilities could be developed from humans’ temperament, high working memory, and attention capabilities. If this is so, then the development of artificial intelligence sketched above is likely flawed. It may instead look like:

temperament + computation power + working memory + attention ⇒ social learning

On this model, not only does asocial learning enable social learning to be a winning strategy, asocial learning capabilities compose social learning abilities. Social learning is really not that different from asocial learning, it just a layer built on top of lower level intelligence systems.

Both of these models suggest that AI development is currently bottlenecked on the asocial learning step. However, once a threshold for asocial learning is reached, intelligence will increase at a vastly quick rate.

There’s a lot more to do here. I hope to have persuasively motivated the cumulative cultural evolution story and the idea that it has important upshots for AI development and forecasting.

Planned summary:

A recent paper develops a conceptual model that retrodicts human social learning. They assume that asocial learning allows you adapt to the current environment, while social learning allows you to copy the adaptations that other agents have learned. Both can be increased by making larger brains, at the cost of increased resource requirements. What conditions lead to very good social learning?

First, we need high transmission fidelity, so that social learning is effective. Second, we need some asocial learning, in order to bootstrap -- mimicking doesn't help if the people you're mimicking haven't learned anything in the first place. Third, to incentivize larger brains, the environment needs to be rich enough that additional knowledge is actually useful. Finally, we need low reproductive skew, that is, individuals that are more adapted to the environment should have only a slight advantage over those who are less adapted. (High reproductive skew would select too strongly for high asocial learning.) This predicts pair bonding rather than a polygynous mating structure.

This story cuts against the arguments in Will AI See Sudden Progress? and Takeoff speeds: it seems like evolution "stumbled upon" high asocial and social learning and got a discontinuity in reproductive fitness of species. We should potentially also expect discontinuities in AI development.

We can also forecast the future of AI based on this story. Perhaps we need to be watching for the perfect combination of asocial and social learning techniques for AI, and once these components are in place, AI intelligence will develop very quickly and autonomously.

Planned opinion:

As the post notes, it is important to remember that this is one of many plausible accounts for human success, but I find it reasonably compelling. It moves me closer to the camp of "there will likely be discontinuities in AI development", but not by much.

I'm more interested in what predictions about AI development we can make based on this model. I actually don't think that this suggests that AI development will need both social and asocial learning: it seems to me that in this model, the need for social learning arises because of the constraints on brain size and the limited lifetimes. Neither of these constraints apply to AI -- costs grow linearly with "brain size" (model capacity, maybe also training time) as opposed to superlinearly for human brains, and the AI need not age and die. So, with AI I expect that it would be better to optimize just for asocial learning, since you don't need to mimic the transmission across lifetimes that was needed for humans.

Awesome, thanks for the super clean summary.

I agree that the model doesn't show that AI will need both asocial and social learning. Moreover, there is a core difference between the growth of the cost of brain size between humans and AI (sublinear [EDIT: super] vs linear). But in the world where AI dev faces hardware constraints, social learning will be much more useful. So AI dev could involve significant social learning as described in the post.

Moreover, there is a core difference between the growth of the cost of brain size between humans and AI (sublinear vs linear).

Actually, I was imagining that for humans the cost of brain size grows superlinearly. The paper you linked uses a quadratic function, and also tried an exponential and found similar results.

But in the world where AI dev faces hardware constraints, social learning will be much more useful.

Agreed if the AI uses social learning to learn from humans, but that only gets you to human-level AI. If you want to argue for something like fast takeoff to superintelligence, you need to talk about how the AI learns independently of humans, and in that setting social learning won't be useful given linear costs.

E.g. Suppose that each unit of adaptive knowledge requires one unit of asocial learning. Every unit of learning costs $K, regardless of brain size, so that everything is linear. No matter how much social learning you have, the discovery of $N$ units of knowledge is going to cost $ $K N$ , so the best thing you can do is put $N$ units of asocial learning in a single brain/model so that you don't have to pay any cost for social learning.

In contrast, if $N$ units of asocial learning in a single brain costs $ $K N^{2}$ , then having N units of asocial learning in a single brain/model is very expensive. You can instead have $N$ separate brains each with 1 unit of asocial learning, for a total cost of $ $K N$ , and that is enough to discover the $N$ units of knowledge. You can then invest a unit or two of social learning for each brain/model so that they can all accumulate the $N$ units of knowledge, giving a total cost that is still linear in $N$ .

I'm claiming that AI is more like the former while this paper's model is more like the latter. Higher hardware constraints only changes the value of $K$ , which doesn't affect this analysis.

Tomasello’s work stresses mindreading, in particular the ability for humans to carry joint attention [link].

Link seems to be missing.

Planned summary:

Planned opinion:

Awesome, thanks for the super clean summary.

Moreover, there is a core difference between the growth of the cost of brain size between humans and AI (sublinear vs linear).

Actually, I was imagining that for humans the cost of brain size grows superlinearly. The paper you linked uses a quadratic function, and also tried an exponential and found similar results.

But in the world where AI dev faces hardware constraints, social learning will be much more useful.

I'm claiming that AI is more like the former while this paper's model is more like the latter. Higher hardware constraints only changes the value of $K$ , which doesn't affect this analysis.

Tomasello’s work stresses mindreading, in particular the ability for humans to carry joint attention [link].

Link seems to be missing.

LESSWRONG
LW

LESSWRONG
LW

19

Musings on Cumulative Cultural Evolution and AI

19

Ω 5

Cumulative Cultural Evolution

Assumptions

Lifecycle

Parameters

Results

Cumulative Cultural Evolution

AI Forecasting and Development

19

Ω 5

19

Ω 5