Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Here are two kinds of super high-level explanation for how things turned out in the future:

  1. Someone wanted things to turn out that way.
  2. Selective forces favoured that outcome.

Which one is a better explanation?

This question is one I’m highly uncertain about, and often shows up as a crux when thinking about where to focus one’s efforts when aiming for a good future.

One way of labelling the alternatives is intelligence vs evolution. There’s a cluster of stuff on each side. I’ll try to point to them better by listing points in the clusters:

  • Intelligence: an agent, someone capably trying to do something, intentional deliberate action, the will of the most powerful entity, consequentialists, unipolar/singleton or few-agent scenarios
  • Evolution: Moloch (multipolar traps), natural selection, competitive pressures, the incentive landscape, game theoretic solution concepts, highly multipolar or multi-agent scenarios

Under some view, both intelligence and evolution are good explanations for how things turned out. They just amount to taking a different perspective or looking at the situation at a different scale. I agree with this but want to avoid this view here so we can focus on the distinction. So let’s try to make the alternatives mutually exclusive:

  1. Intelligence: someone was trying to produce the outcome
  2. Evolution: no one was trying to produce the outcome

This question is very similar to unipolar vs multipolar, but maybe not the same. My focus is on whether the main determinant of the outcome is “capable trying” vs anything else. This can be for a few agents, it doesn’t require exactly one.

 

If you think our future will be better explained by intelligence, you might prefer to work on understanding intelligence and related things like:

  • Decision theory, anthropics, probability, epistemology, goal-directedness, embedded agency, intent alignment – understanding what agents are, how they work, how to build them
  • Moral philosophy and value learning
  • {single, multi}/single alignment, and approaches to building AGI that focus on a single agent

If you think our future will be better explained by evolution, you might prefer to work on understanding evolution and related things like:

  • Economics and sociology, especially areas like social choice theory, game theory, and topics like bargaining and cooperation
  • Biological and memetic evolution, and the underlying theory, perhaps something like formal darwinism
  • {multi, single}/multi alignment, and approaches to building AGI that involve multiple agents
  • Governance, and interventions aimed at improving global coordination

 

Why believe intelligence explains our future better than evolution? One argument is that intelligence is powerful. The outsized impact humans have had on the planet, and might be expected to have beyond it, is often attributed to their intelligence. The pattern of “some human or humans want X to happen” causing “X happens” occurs very frequently and reliably, and seems to happen via intelligence-like things such as planning and reasoning. 

Relatedly, the ideal of a rational agent – something that has beliefs and desires, updates beliefs towards accuracy, and takes actions thereby expected to achieve the desires – looks, almost by construction, like something that would in the limit of capability explain what outcomes actually obtain.

Both of these considerations ignore multipolarity, possibly to their peril. Why believe evolution explains our future better than intelligence? Because it seems to explain a lot of the past and present. Evolution (biological and cultural) has much to say about the kinds of creatures and ideas that are abundant today, and the dynamics that led to this situation. The world currently looks like a competitive marketplace more than like a unified decision-maker.

Will this continue? Will there always be many agents with similar levels of capabilities but different goals? To argue for this, I think there are two types of arguments one could put forward. The first is that no single entity will race ahead of the rest (“foom”) in capability, rendering the rest irrelevant. The second is to rebut trends – such as multicellular life, tribes, firms, and civilizations – towards greater coordination and cooperation, and argue that they are fundamentally limited.

 

I don’t know all the arguments that have been made on this, and since this post is for blog post day I’m not going to go find and summarise them. But I don’t think the question is settled – please tell me if you know better. Being similar to the unipolar vs multipolar question, the intelligence vs evolution question has been explored in the AI foom debate and Superintelligence. Here is some other related work, split by which side it’s more relevant to or favourable of.
Intelligence:

  • Section 4 of Yudkowsky’s chapter on AI and Global Risk argues that intelligence is more powerful than people tend to think.
  • Bostrom’s singleton hypothesis is very similar to expecting Intelligence over Evolution.
  • Christiano’s speculation that the Solomonoff prior is controlled by consequentialists relies on (and argues for) the power of intelligence.

Evolution:

  • Alexander’s Meditations on Moloch points to many examples of multipolar traps in reality.
  • Critch’s Robust Agent-Agnostic Processes argues for outcomes being better understood from something like evolution than intelligence. Critch and Krueger’s ARCHES provides the {single, multi}/{single, multi} alignment framework and their multiplicity thesis seems to me to assume that evolution matters more than intelligence.
  • Hanson’s Age of Em illustrates how a future with advanced intelligence but determined primarily by evolution-like things, such as selection for productivity, might look.
  • Henrich’s The Secret of Our Success shows how human impact and control of the planet may be better explained by evolution (of memes) than the intelligence of individuals.

 

Acknowledgements: Daniel Kokotajlo for running the impromptu blog post day and giving me feedback, Andrew Critch and Victoria Krakovna for one conversation where this question came up as a crux, and Allan Dafoe for another.

New Comment
15 comments, sorted by Click to highlight new comments since: Today at 6:02 PM

I agree that both are good explanations. My question is more about which will be dominant in the long run. I tried to ask this more clearly with the mutually exclusive version in the post (someone vs no one was trying to produce the outcome).

I could view the "Why not both?" response as indicating that neither is dominant and we just have to understand how both operate simultaneously (perhaps on different timescales) and interact. I think I'd view that as actually coming down mostly on the evolution side of things - i.e., this means I should understand intelligence only in the larger evolutionary context -- no intelligence will permanently outstrip and render irrelevant the selection forces. Is that right?

Sometimes something is invented (not necessarily just by one person, and possibly changed by 'one or more processes you would call 'evolution'',) which makes both more powerful. Writing. Trade networks. The internet. Languages that are a mixture of earlier languages (for easier communication between speakers of one of two languages). Etc.

The pattern of “some human or humans want X to happen” causing “X happens” occurs very frequently and reliably

I think we have to be very careful with this kind of reasoning because of hindsight and survivorship bias. Humans are very good at explaining their behavior as rational when it largely isn't. I would like to see a systematic review of such decisions.   

Some humans wanted X to happen. Other humans wanted to prevent X. Finally, X happened.

We can interpret this as a result of intelligence (of those humans who wanted X to happen) or evolution (some impersonal mechanism selected the winning side).

This is a good point. Could something like Shapley value help in distributing credit for X between the humans and the impersonal mechanism? I find myself also wanting to ask about how frequent these cases are -- where it could easily be viewed both ways -- and declare that if it's mostly ambiguous then 'evolution' wins.

For "some impersonal mechanism" I'm thinking "memetic fitness of X amongst humans" (which in some cases cashes out as the first group of humans being larger?). What are other ways of thinking about it?

The story feels a little underspecified. When X happens because the first group of humans figured out how to thwart the second group, and anticipated them, etc. and furthermore if that group consistently does this for whatever they want, it seems a lot more like intelligence.

Darwinian evolution as such isn't a thing amongst superintelligences. They can and will preserve terminal goals. This means the number of superintelligences running around is bounded by the number humans produce before the point the first ASI get powerful enough to stop any new rivals being created. Each AI will want to wipe out its rivals if it can. (unless they are managing to cooperate somewhat)  I don't think superintelligences would have humans kind of partial cooperation. Either near perfect cooperation, or near total competition. So this is a scenario where a smallish number of ASI's that have all foomed in parallel expand as a squabbling mess.

Do you know of any formal or empirical arguments/evidence for the claim that evolution stops being relevant when there exist sufficiently intelligent entities (my possibly incorrect paraphrase of "Darwinian evolution as such isn't a thing amongst superintelligences")?

Error correction codes exist. They are low cost in terms of memory etc. Having a significant portion of your descendent mutate and do something you don't want is really bad.

If error correcting to the point where there is not a single mutation in the future only costs you 0.001% resources in extra hard drive, then <0.001% resources will be wasted due to mutations.

Evolution is kind of stupid compared to super-intelligences. Mutations are not going to be finding improvements. Because the superintelligence will be designing their own hardware and the hardware will already be extremely optimized. If the superintelligence wants to spend resources developing better tech, It can do that better than evolution.

So squashing evolution is a convergent instrumental goal, and easily achievable for an AI designing its own hardware.

Error correction codes help a superintelligence to avoid self-modifying but they don't allow goals necessarily to be stable with changing reasoning abilities. 

Firstly this would be AI's looking at their own version of the AI alignment problem. This is not random mutation or anything like it. Secondly I would expect there to only be a few rounds maximum of self modification that runs risk to goals. (Likely 0 rounds) Firstly damaging goals looses a lot of utility. You would only do it if its a small change in goals for a big increase in intelligence. And if you really need to be smarter and you can't make yourself smarter while preserving your goals. 

You don't have millions of AI all with goals different from each other. The self upgrading step happens once before the AI starts to spread across star systems.

I think that was a really good post, I don't know why it didn't receive some more attention. I have found myself coming back to the ideas in the post frequently in the last weeks.

The distinction is a shaky one. What's the difference between this 'Evolution' and a 'Market'? Lots of people might want to make lightbulbs or flying cars.

I did intend 'Evolution' here to include markets. The main contrast I'm trying to make is to a unified rational actor causing the outcomes it wants.

I thought I'd check because at first glance markets have selection from intelligence. (If I buy coffee from Starbucks, etc.) And seem like a mixture. But evolution arguably has the same property, at a different time scale. So I see why you'd lump them together.