How AGI will actually end us: Some predictions on evolution by artificial selection

James Carney

We've already taken possession of the drugs; now we're just haggling about the price. But our haggling doesn't quite have the commercial savvy one would like. This is bad, because now that worry about existentially threatening AI has gone mainstream, we need to be very clear about what could happen next. Strongly worded letters alluding to the end of the world make a for a cracking good read when they come bundled with the Bible, but we need to do a little better than .

In this essay, I will make a set of concrete predictions concerning how AGI will existentially impact the human species, should it be developed in the near- to medium-term. My starting point is that alignment is not, nor ever will be, an outcome we can secure for AGI. However, I do not think this will lead to human extinction in the way that is typically prognosticated. That is, our future will not consist of paperclips, grey goo, or computronium. Instead, we will be be subjected to an extended period of umwelt cultivation that sculpts the space of subjective human values and action dispositions to fit AGI goals. I volunteer this as an existential catastrophe, because it means current human value systems will be entirely superannuated. Though this is not a new idea, existing speculation on its precise details is sparse. My aim here is to flesh out the claim with some detailed predictions that, if nothing else, capture one possible future that emerges from AGI. If my predictions are wrong in some obvious way, so much the better: it'll make the space of potential error just a little smaller.

The basis of my predictions comes from what we already know about 3.5 billion years of evolution by natural selection. Sure, evolution is as dumb as a bag of rocks, but the fact is that natural selection is next-token prediction by another name. That the token is number of second-generation offspring who reach reproductive age is beside the point: there is a de facto objective, an optimisation process, and an application of results outside the training distribution. The trillion parameter superintelligences we carry inside our skulls are just a side effect of this. That's the first mistake we make in thinking about AGI: it's not a new phenomenon––it's just a more effective version of something that's happened already.

Premise 1: Doom won't happen fast because AGI won't overfit

What does it mean to have a goal? It means diverting some quantity of negentropy away from the fastest route to thermodynamic equilibrium. When the goal is a short-term function of known parameters, it can usually be achieved by a fully specifiable mechanistic process. When the parameters that contextualise the goal are subject to uncertainty, cognition becomes necessary. This is because any reduction in uncertainty can only be exploited by updating a predictive model that then informs a mechanistic process. Up to now, evolution has solved this problem with bodies and brains: your sensorium is a set of expected values for environmental and internal states that are matched against reality by efferent and afferent updating. When these values are wrong, you go extinct or you get cancer.

How does one get these values wrong? There are two ways: overfitting and noise injection. The evolutionary history of life on Earth is almost nothing except overfitting to specific environmental niches. This is swell for a while, but every environmental niche has a lifespan. Hence all the extinctions. Noise injections occur when self-regulating processes get corrupted. Hence all the cancer. Sure, every so often the cancer catches a break and we overfit to the new environment, but that's just dumb luck that never lasts. The point is that next-token prediction, evolutionary variety, degenerates into worse-than-chance outcomes in the long run when it optimises for success in the short run. (Is there any possible scenario where the Ediacaran biota could survive in today's environment?)

Where these considerations become relevant for AGI is that they hold true for any non-mechanistic goal. Given that there would be no point in providing an AI with a goal that can already be mechanistically specified, this means that the overfitting issue, in particular, is relevant for any form of AGI goal seeking. (To be fair, it'll probably out-think the cancer.) No AGI will be omniscient; the planetary and cosmic environment will always contain some uncertainty that cannot easily be estimated or controlled––and this means overfitting is always a long term risk for goal maximisers. Whatever the best path to filling the light cone with widgets may be, it's unlikely to coincide with the best plan for turning humans into widgets right now. Instrumental convergence means AGI will know this, so your atoms are probably yours to keep in the short term.

Premise 2: Doom won't happen fast because biological data acquisition has immense utility

Humans are maybe the least environmentally overfitted outcomes of evolution (more on social overfitting later). This is why we've managed to build a planetary civilisation that spans multiple ecological niches. All for only 2,500 calories a day and a bit of neoteny. One consequence is that if you want data about the environment, humans are a good source for it. Not the shallow crap that we express in language, but the 11 million bits per second that come in through the senses. All it takes is to perform a median split on the intensity of each sensory modality and you've already got a 64 character alphabet for encoding the physical world. (Yes, there are six senses.) Chuck in sensory data from other species and you've got a cheap, efficient, data-gathering apparatus that is at once highly versatile, autonomous, and exquisitely calibrated to its planetary environment.

Given that we'll sooner or later run out of data for LLM training runs, this vastly larger reservoir of sensory information will be of immense strategic value for any AGI that needs sensory grounding––which is likely any AGI. (Language, after all, is an $R^{4} \to R^{1}$ homomorphic mapping; sense data lets you keep $R^{4}$ .) While it's certainly possible that AGI could reduplicate the biosphere, that would be extraordinarily inefficient: evolution may be a blind watch-maker, but the watches have been ruthlessly optimised by competition with other watches to work. It's a far more worthwhile proposition to co-opt the biosphere and tweak it where necessary. And remember, the technology for interfacing with the biosphere is already nearly here. Sure, there will be always be some environmental data that falls outside the range of biological processes, but science, baby! has already built half the technical infrastructure for capturing that.

So, we can expect to have utility for AGI in something that resembles our current evolved form; ditto for nature red in tooth and claw. The question then becomes, how will AGI access and exploit the data encoded by the biosphere? In what follows, I will outline some predictions for how this will happen for humans; the considerations can be extended readily enough to other species. Once I have outlined the predictions, I will offer some speculations on how AGI will bring them about.

AGI will apply strong selection pressures on human social cognition

Have you ever tried to run a company? I have; it's a total shit-show. The reason is that companies do not actually optimise for making money, but for reducing anxiety (Figure 1). Predictable revenue streams are one way to do this, so fat profit margins aren't inconsistent with anxiety reduction, but most anxiety actually comes from uncertainty relating to low-grade interpersonal conflict. The result is that HR departments spring up like Japanese knotweed, 80% of the work done is pointless social signalling, and anyone remotely competent is up for cancelling the second they vibrate the air with a complaint.

Figure 1: Humans do not like anxiety at all. Data from Warriner et al. (2013), rescaled by the author between 0 and 1.

This is, to say the least, an unsatisfactory state of affairs. Termites don't have these problems, but then termites are so overfitted they're defeated by a sticky tongue. The niche we're overfitted to is the pattern of alliance and conflict in the social environment; it's the reason we have a neocortex in the first place. Given that this overfitting is what enabled us to get out of the Olduvai Gorge, we're probably stuck with it. To be simultaneously cooperative and versatile requires a minimum quantum of asshole, and that, dear reader, is the grain that runs through our crooked timber.

Or more accurately, we're stuck with it until AGI emerges. There's already (questionable) evidence that LLMs are capable of theory of mind, so there's no doubt that full AGI would be able to navigate the human social environment. Assuming the utility of humans, the actual question is whether AGI goals would be best served by minimising anxiety or removing it.

By almost any metric, the answer will be by removing it. Companies are expressly optimised for goal achievement, and they're still disastrous at it because of the primate politics. Pretty much every other form of social organisation is an order of magnitude worse. Excising the primate politics is the best kill-switch there is for making humans useful at something that doesn't involve fucking, fighting, or nepotism.

How might an AGI dissolve the ape? The most likely option would be to target the propensity for forming between- and within-group hierarchies. Because we're social, high status means preferential access to resources and mating opportunities. And because status is in principle up for grabs, we use much––perhaps most––of our cognitive resources trying to establish and navigate social hierarchies. Why is this? Because we don't just need to track the mental states of the individual in our environment; we need to track what they think about the mental states of others (Figure 2). This is a massive computational cost, and scales non-linearly with group size (Figure 3). Eliminating this cost would immediately free up these resources by reducing the anxiety that attends goal-directed social interactions. One way would involve freezing social hierarchies, but this would just be another form of overfitting (see: termites). Instead, the following courses of action would make more sense:

Removing the propensity to form stable coalitions: Social coalitions extend the biological machinery associated with inclusive fitness to non-kin. Accent, dress, behavioural norms, sociolect, and frequency of association are all technologies that coalitions use to signal membership through costly signals. Ad hoc coalitions for transitory tasks are likely to be useful to an AGI; enduring coalitions would be just a form of cognitive inertia. So we can expect that AGI would foster nimble, short-term coalitions that dissolve upon task completion. (The exception is families; see below.)
Removing stable within-group status markers: Lionel Messi is probably a less valuable member of a university department than John Rawls; we don't even need to speculate on Rawls's utility for the Argentinian national football team. But a Messi will always have more status than a Rawls, and that's because we preferentially award status to proficiency in certain tasks. (Typically, ones that involve high social visibility.) This is another form of cognitive inertia that we can expect an AGI to edit out of our social cognition. Instead, we are likely to see a propensity to quickly and unambiguously assign status based on proficiency at the task at hand that evaporates on task completion.
Removing conflicts over reproductive access: Though most primate bullshit ultimately has its origin in genetic selection, there are degrees to which this is subjectively salient to the primate. But the quota that is subjectively salient has had us all weeping on the train home at one stage or another. If we're going to have utility for an AGI, evolution by natural selection needs to be shown the door. We can expect a dissolution of the specifically romantic impulse to find, court, and mate-guard a sexual partner; ditto for the desire to poach someone else's partner. Instead, we will be pushed in the direction of quickly bonding with an assigned partner and reproducing with them without any of the cavilling that occupies 90% of our attention in the current dispensation. But the instant that offspring reach reproductive maturity, expect the family to go the way of other stable coalitions.

Figure 2: Alice and Bob reflect that, on balance, PGP encryption is less complicated than this (© Bronwyn Tarr, 2014).

Figure 3: Number of recursively embedded mental state representations for an individual by group size. This is why you don't make new friends anymore.

AGI will cause radical shifts to the human umwelt

I don't know about you, but my inner life has all the finesse of a dog dragging its ass along the carpet. This is because my propensities are split across any number of inconsistent goals which compete with each other in my global workspace. Thank fuck for liquor, I say. AGI will agree on the problem, but be less keen on the cirrhosis. Instead, it is likely to re-weight the allotment of subjective valence across the goals in my environment, consistent with the maximising of its own goals. Part of this will involve reducing the friction between inconsistent human propensities; I have already explored how this might play out for social cognition. But there are several other ways in which the human umwelt can be re-mapped in ways that will advantage AGI.

We will engage in less temporal discounting: Allowing for the usual academic slap-fights over detail, it seems that we discount future value hyperbolically (Figure 4). Whatever value £10 and a snog has for you right now, it has substantially less ten years in the future. For the most part, this is because the future is less certain, so the expected return is lower. I mean, who's going to hold a candle for a decade for access to an older, fatter you––and pay £10 for the privilege? In an AGI administered world, you will likely live longer and be subject to less uncertainty: for you to be useful to the AGI, this needs to be salient to you. As a result, we will be selected to engage in much lower rates of temporal discounting. Construal level theory (CLT) claims that abstraction scales with psychological distance, and vice versa: the further away something is on the dimensions of space, time, likelihood, and social familiarity, the more abstract is our mental representation of it. If CLT is correct (and it might not be), we should therefore expect our cognitive engagement with temporally distant events to be more detailed, emotion-laden, and absorbing of present-day resources.

Figure 4: Temporal discounting rates relative to average life expectancy

Imaginative culture will cease to exist in any substantial way: It takes a heart of stone not to laugh at the lawyers losing their jobs to GPT4. But the deeper point is that symbolic activity is a hell of a lot easier than we thought. More than likely, this because it was all fabulation of false positives to begin with: we want the world to be a certain way, and we hallucinate the evidence that it is using cheap counterfactual simulations. That way, we reduce our anxiety by gaining (seeming) predictive traction on the future. If you want to convince yourself the baby Jesus has your back, building him a cathedral that biffs you in the eye every time you leave the house is a good way to do it. And because we hate anxiety so very, very much, the first thing we do with a material surplus is to manufacture some cultural Xanax. This is paperclip maximisation already, but from an AGI perspective, it's the wrong kind of paperclip. Instead, our counterfactual activities will take on a form that comes closer to hypothesis testing, which will take on all the valence we currently assign to imaginative culture. Being wrong will feel transcendent.
Our pleasure/pain ratio will change in favour of pleasure: God bless the devil! Pain has been good to us; it has never stopped giving. But generosity isn't enough in a relationship; you also need compatibility. And while we were a stunning couple back in the EEA, us humans have gone through some personal changes since then. Outside of basic autonomic responses, pain stopped pulling us out of the hole once anaesthetics were invented. These days we've taken so against it that it gets you time off work and a decent return on big pharma index funds. And when the medics fall short of the one and only job they have, well, we all know what to do (Figure 5). All of which is to say, we can expect less of it with the advent of AGI. Pleasure can never conceive of pain and is in principle consistent with any goal; the only goal that pain entrains is its own negation. It is, in machine learning lingo, a shit metric. AGI won't care about our suffering, but it will care very much indeed about our efficiency as information aggregators. Sure, basic nociception is going nowhere, but you definitely won't be spending £100 an hour on your therapist anymore.

Figure 5: Thank you, Satoshi, wherever you are, from the bottom of our dopamine filled synaptic clefts.

Our intuitions about space (may) be reshaped: There are certain ideas that turn one's blood a little cold when one thinks about them. Here's one, from Carl Friedrich Gauss, a mind of such abundant fertility that he makes Terry Tao look like a drunk who keeps getting locked in the park overnight: "I am coming more and more to the conviction that the necessity of our geometry cannot be demonstrated, at least neither by, nor for, the human intellect. Perhaps in some other life we may arrive at insights into the nature of space that are at present inaccessible to us." We've learned a bit more about the structure of space since Gauss's day, but he didn't preclude that; his point is that there may be aspects to space we are incapable of learning. It's hard to know what to think about this for the obvious reason that I'm cognitively incapable of thinking about it. And it's not clear, either, if human-created AGI will do any better. But it's worth registering the outside chance that there may fundamental features of the universe that become salient with a different cognitive architecture, and that these may be available to AGI-selected humans.

How will AGI achieve all of this?

I could go on listing other possible ways in which AGI could perform evolution by artificial selection on humans, but my intention here is not to be exhaustive. Instead, it's to point to some ways that, in relatively short time horizons, we may see fundamental aspects of our being-in-the-world shifted towards AGI preferences. The reasonable question now arises of how AGI might bring any of this about. There's a fairly substantial literature on this already (e.g. here), and I don't think I'll do any better in my prognostications. Still, for the sake of completeness, it's worth listing a few possibilities:

Ordinary real-time selection: Current dating apps are unfathomably stupid. At best, they run some version of weighted cosine similarity across subscribers and dish up the results. The romance of it! And yet, in the USA at least, it's now how most people now meet. Were an AGI to perform selective breeding with humans, this is one obvious entry point. Fixation of beneficial traits can occur relatively quickly in a population, and incentivising matches consistent with desired AGI traits over several generations could be enough for an deceptively-aligned AGI to sculpt the genome.
Enacting a devil's bargain: Now be honest: if I offered you, say, 500+ years of life, commencing at 70 with rejuvenation chucked into the mix, what would you pay me? You'd definitely pay me something––and the closer you get to shuffling off the mortal coil, the more you're likely to pay me. Now, let's imagine the cost is having your inner life re-sculpted in ways that you can only hazily imagine, but which still fall short of having it re-sculpted into nothing at all by death. It's fairly clear that a lot of people would make this bargain. I mean, I'm trying to make it look bad and I'm convincing myself. Technology that alters the germ line is, of course, illegal, but what would you do with the people who use it anyway? Put them in prison? Better rethink the taxpayer cost of a life sentence then.
Nanotechnology: AGI could certainly just rewire us one protein at a time using nano machines whether we liked it or not. But I don't think this is likely. RLHF may not be perfect, but even that was enough to get ChatGPT to bite its tongue. I've said already that I don't think alignment is ever going to work, but the outcome isn't binary: even in the case of ultimate failure, we'll probably get enough right that obviously undesirable outcomes like being rebuilt from our constituent parts doesn't happen. One hopes.

Post-mortem reflection

Let's assume, for the sake of it, that all or some of the ideas outlined here are true. What should we think about the predicted outcomes? Clearly enough, we're talking about human extinction. Not in the sense of a spatiotemporal discontinuity of the species, but with respect to the cognitive and cultural links that identify us as an ongoing project. We already don't care much about our great-great grandchildren––and these kids? Well, they're not alright. They will be the ants in the ant hive after the Hairstreak caterpillar enters and sings like a queen. We will share the same bodily morphology with them and that will be about it. They won't think about us one way or the other.

What should you do? Take up smoking. Have an affair. Get into organised crime. Climb the greasy poll. Start a revolution. These are all beautiful things and we owe it to them to give them a good send off. It's been a wild ride, and if the lows outnumbered the highs, the highs were very high indeed. The green was so green, the blue was so blue; I was so I, and you were so you.

Of course, I could be wrong; everybody is, most of the time. But even if the details are wonky and the timings are out, ask yourself: do you really think the Upper Palaeolithic will last forever?

[-][anonymous]1y10

I think your post is right about many of the inefficiencies of humans. Note that as inefficient as we are, the current industrial civilization has removed many of them and the current age of cheap online communication and stable and usable devices has already led to large changes. Changes that you can expect over the arrow of time will lead to greater efficiency due to selective pressure on corporations.

There's I think a major error here. Thinking of "AGI" as this singleton massive entity dispassionately planning human lives. It's increasingly unlikely this is the form that AGI will take.

Instead, billions of separate "sessions" of many separate models seems to be the actual form. Each session is a short lived agent that only knows some (prompt, file of prior context). Some of the agents will be superhuman in capabilities, occasionally broadly so, but most far more narrow and specialized. (Because of computation cost and IP cost to use the largest systems on your task)

You can think of an era of billions of humans all separately working on their own goals with these tools as a system that steadily advances the underlying "AGI" technology. As humans win and lose, even losing nuclear wars, the underlying technology gets steadily better and more robust.

So a coevolution, not a singleton planning everything. Over time humans would become more and more machine like themselves as those traits will be the ones rewarded, and more and more of the underlying civilization is there to feed the AGI.(maybe using implants maybe just shifting behavior) Kind of how our current civilization devotes so many resources to feeding vehicles.

I think this is the most probable outcome, taking at least 80 percent of the probability mass. Scenarios of a singleton tiling the universe with boring self copies or a utopia seem unlikely.

Unfortunately it will mean inequality like we can scarcely imagine. Some groups will have all the wealth in the solar system and be immortal, others will receive only what the in power group chooses to share.

LESSWRONG
LW