Ten different ways of thinking about Gradual Disempowerment

David Scott Krueger

About a year ago, we wrote a paper that coined the term “Gradual Disempowerment.”

It proved to be a great success, which is terrific. A friend and colleague told me that it was the most discussed paper at DeepMind last year (selection bias, grain of salt, etc.) It spawned articles in the Economist and the Guardian.

Most importantly, it entered the lexicon. It’s not commonplace for people in AI safety circles and even outside of them to use the term, often in contrast with misalignment or rogue AI. Gradual Disempowerment tends to resonate more than Rogue AI with people outside AI safety circles.

But there’s still a lot of confusion about what it really is and what it really means. I think it’s a very intuitive concept, but also I still feel like I don’t have everything clear in my mind. For instance, I think our paper both introduces the concept and presents a structured argument that it could occur and be catastrophic. But these things seem somewhat jumbled together both in my mind and the discourse..

So for reasons including all of the above, I plan to write a few posts on the topic, starting with this one.

The rest of this post is a list of ten different ways of thinking about or arguing for gradual disempowerment that I’ve used or encountered.

We’re replacing people with AI. These days when I speak publicly about AI, I often find myself returning to i) the more-or-less explicit goal of many AI companies and researchers of “automating all human labor”, and ii) the fact that many people in the space view humanity as a “bootloader for AI” as Elon Musk evocatively put it. Gradual Disempowerment is the process by which this replacement happens without AI ever rising up -- AI takes our jobs, and the people who control it and still have power increasingly are those who embrace “merging with the machines”, i.e. becoming cyborgs, but with the human bits being phased out over time until before long, humans cease to exist entirely.
Companies and governments don’t intrinsically care about you. This is basically the main argument in the paper… You can think of companies and governments as “agents” or “beings” that are driven by goals like (e.g.) “quarterly profits” or “GDP” or “national security”. Right now, the best ways to achieve these goals make use of humans. In the future, the best ways will instead make use of AI. A relentless pursuit of such goals, powered by AI, seems likely to destroy the things humans need to survive.
It’s (“global” or “late stage”) capitalism. The previous argument bears a significant resemblance to existing arguments, popular on the left, that “capitalism” is responsible for most of the world’s present ills. This feels like a decent “80/20” version of the concern, but importantly, it’s not just companies, but also governments (whose power is often more feared by those on the right) that could end up turning against their citizens once they become useless to them. And indeed, we’ve seen “communist” countries slaughter their own people by the millions. Besides wondering what alternative critics imagine, I don’t wholeheartedly endorse such critiques because I often feel unsure of what exactly people are criticizing when they critique capitalism in this way. But for people who already have this mental model, where our current social arrangements treat people as somewhat disposable or lacking in fundamental dignity or worth, this can be a useful starting point for discussion.
It’s another word for (or the primary symptom of) the “meta-crisis”. A few people in my circles have told me about this concept from Daniel Schmachtenberger, which I originally encountered on a podcast somewhere. The key claim is that all the crises we observe in the modern world are driven by some shared underlying factors. I view this as basically a more nuanced version of the view above, where “capitalism” is the root of all evil: The meta-crisis is still meant to be the root of all evil, but we don’t fully understand its nature. The way I like to describe the basic problem is that we are not practicing good enough methods of collective decision-making, or collective sense-making. And while I think we have some good ideas for improving on the status quo, we don’t have a proven solution.
It’s a structural consequence of the way in which information technology demands metrics, enables large scale influence campaigns, translates money into political power, and concentrates power via a recursive feedback loop. This one is maybe a bit too much to unpack in this blog post, but basically, society is increasingly “standardized” not only in terms of products, but also in terms of processes (e.g. restrictive customer service scripts or standard operating procedures) that have the benefit of being cheap, scalable, and reliable (often by eliminating “human error”, i.e. limiting human decision-making power and otherwise encouraging compliance). They also increasingly make more and more aspects of life subject to measurement and control via optimization of metrics, which necessarily fail to capture everything that matters. This general issue was a prime concern of mine before I learned about deep learning in 2012, and realized we might get to Real AI quite soon -- notably, this can happen even with stupid AI.1 In fact, you could argue that gradual disempowerment is already occurring through advertising, corporate media, and money in politics, among other things. This makes it a bit unclear how far back to go.
It’s evolution, baby! Maybe gradual disempowerment is best viewed as part of a much larger trend, going quite far back: evolution. People like to say “AI is the next stage in evolution” as if that means it’s okay if humanity goes extinct. But whether it’s OK or not, it may be that “Natural Selection Favors AIs over Humans”. At the end of the day, if AI becomes much better than humans at everything, it does seem a bit strange from a “survival of the fittest” point of view that humans would stick around. In such a situation, those who hand over more power and resources to AI would presumably outcompete those who don’t. So in the limit, AIs would end up with ALL the power and resources.
…and there’s no natural limit to outsourcing decision making to AI, even if you don’t trust it. AIs could be like underlings that are untrustworthy, but so skilled that competitive pressures still compel us to delegate to them. Consider the trope of the cowboy cop who’s “a loose cannon, but DAMMIT he’s the best we have!” Trust is important, and people are loath to use things they don’t trust. But AI seems to be becoming a tool so powerful that you almost HAVE to use it, even though it’s not secure, even though we haven’t solved alignment, even though we see evidence of scheming in tests, even though it seems to drive people crazy, etc… For me, this mostly comes up as a counter-argument to people who claim that market forces actually favor making AI aligned and trustworthy… that’s certainly true if doing so is free, but in fact, it’s impossible right now, and alignment doesn’t solve the problem of negative externalities.2 I like to analogize AI to a button that gives you $1,000,000 when you push it, but each time you press it also increases the temperature of the earth by a fraction of a degree. Or each time you press it has a 1% chance of destroying the world.
It’s an incarnation of Moloch. One of the most famous blog posts in the history of AI safety is Meditations on Moloch. It’s often considered a parable about coordination failures, but I think of it as about the triumph of “instrumental goals” over “terminal goals”, i.e. the pursuit of money (“instrumental goal”) as a means to happiness that has a tendency to become an end (“terminal goal”) in itself. We might begin handing over power to AI systems because we hope they will help achieve our goals. But we might need to hand over more and more power and also the AI might need to focus more and more simply on acquiring power in order to avoid being outcompeted by other AIs. This is also like an even deeper version of the evolution argument -- evolution and Moloch as described in the post both have the property where it’s unclear if they can really ever be “defeated” or are rather just part of the way the world works.
It’s on a (2D) spectrum with Rogue AI x-risk scenarios. Rogue AI scenarios are where “the AI suddenly seizes power”; gradual disempowerment is “we gradually hand over power”. There are lots of scenarios in the middle where the handoff takes place in part due to recklessness or negligence, rather than deliberately. One thing I don’t like about this way of talking about it is that I actually think gradual disempowerment is entirely compatible with full-blown rogue AI, in fact, I think one of the most likely outcomes is that competitive pressures simultaneously drive gradual disempowerment and reckless racing towards superintelligence, warning signs are ignored, and at some point in the reckless and chaotic exploration of the AI design space, rogue AI pops out.
Deskilling, aka “the WALL-E problem”. A lot of people these days seem to think of gradual disempowerment as largely about humans losing our own capabilities, e.g. for critical thinking, because we defer to AI so much. Professor Stuart Russell called this the “WALL-E” problem. To be honest, I still don’t fully understand or buy into this concern, or see how it necessarily leads to total disempowerment, but thought it’s worth a mention, due to its place in the discourse.

This might be as bad with smarter AI -- they can use more sophisticated judgments. But that ability also makes it tempting to put them in charge of more stuff.

This point seems important enough I almost want to make it its own item in the list.

Yeah. Thanks for making this list. For me also #2 is the central point. Though I'd maybe extend it a little: not only companies and governments, but also the rich and powerful people in control of companies and governments also don't care much about me. In history, rich and powerful people have often screwed over the poor and weak in many horrible ways. It's part of human nature and it's part of what AI is trained on. One can even abstract away from all details of AI as a substitute for labor and so on, and just say that if AI is a black box that greatly increases power imbalance between people (which I think is almost certainly true), then it will lead to more of the kind of bad things that happen when power imbalance increases, which means basically the worst bad things in history.

This is why nowadays I've shifted from judging interventions as good or bad depending on whether they slow down or speed up the race, to accepting that the race is gonna be fast no matter what, and judging interventions as good if they spread out AI-derived power and bad if they concentrate it. This leads to a few counterintuitive conclusions that most on LW disagree with. Like: jailbreaks, exports and weight exfiltration aren't bad. Much effort by the rich and powerful to prevent jailbreaks, exports and weight exfiltration, and align AI more securely to their interests, is bad. I hope more people will start to see things this way in time.

judging interventions as good if they spread out AI-derived power and bad if they concentrate it

I don't see how this can work out in the long run. Today we already have very lumpy power distribution between humans, and betweens humans and other animals, and this can seemingly only continue or become more lumpy in the future with superintelligence? For example if humans alive today end up owning even a sliver of the solar system, they could have absolute power over other AIs or simulated/uploaded humans that they create using their resources.

So my perspective is that "power corrupts" is a very serious problem, but it seemingly needs to be solved another way than directly pushing against power concentration. Something like, do a long AI pause, solve metaphilosophy during the pause, make enough moral, philosophical, social progress to get people to want to fix the "power corrupts" and other human safety problems. This seems like a long shot, but your plan doesn't fix most of the problem even if it works?

Another long shot is that there does end up extreme power concentration in the form of a Singleton or a few top level SIs, and they're all Friendly enough to prevent abuses of power within their domains. If power was more distributed than that (and AI develops along the default path), I don't even see how we can "get lucky" enough to prevent this kind of problem (except scenarios like everyone converges to objective values, which obviously aren't what you have in mind). Can you maybe spell out a hopeful scenario that you see here?

My hopeful scenario is that if AI power gets spread out enough (something like "everyone gets +50 points due to AI"), then people will start acting together more, and act together to prevent extreme power concentration and abuse. Basically, democracy and human rights becoming more of an attractor as people get smarter. Of course there will be lots of opportunities for people to defect, or just individually go off the rails; but I think if power is spread out enough, people are more likely to keep each other sane and in check.

I'm very sympathetic with these ways of thinking, and with your views here:

I view the world today as highly dysfunctional in many ways: corruption, coordination failures, preference falsification, coercion, inequality, etc. are rampant. This state of affairs both causes many bad outcomes and many aspects are self-reinforcing. I don't expect AI to fix these problems; I expect it to exacerbate them.

But curiously, I tend to frame many of these as "human safety problems" whereas you frame them as "gradual disempowerment", which suggest or make salient very different types of solutions/approaches, namely understanding why/how humans are unsafe and make them safer, vs trying to keep humans in power.

To take a striking example:

I like to analogize AI to a button that gives you $1,000,000 when you push it, but each time you press it also increases the temperature of the earth by a fraction of a degree. Or each time you press it has a 1% chance of destroying the world.

If we had AIs running around pressing buttons like these, instead of humans, we'd call that a "safety problem" and try to take away their power!

Overall I do like having another way to frame these problems, and appreciate that it seems to get through to many people better, but want to point out both the irony here and the potentially significant framing effects.

some more variations on this theme:

Nick Land in Meltdown: "Nothing human makes it out of the near-future.", "Capital only retains anthropological characteristics as a symptom of underdevelopment; reformatting primate behaviour as inertia to be dissipated in self-reinforcing artificiality. Man is something for it to overcome: a problem, drag."
Historical materialism views the organization of society throughout history as being the argmax of production (or maybe argmax the development of production or productive power or something), and after AGI, humans will not be part of the argmax of production for long.
"when you make something less useful (eg by introducing other things that can do its "jobs/functions" better), you make it less likely to stick around", "what is no longer good for anything tends to get discarded" ^[1]
"messy futures are bad for humans" (in the limit: "a uniformly random configuration of atoms doesn't have anything like humans in it")

conversely, you can make something more likely to be preserved by figuring out how to make it instrumental to more valued/productive/competitive things/processes — each such process then provides a reason to keep the thing around, and provides a constraint on any replacement to the thing. "instrumentalizing the terminal", ie protecting good things this way, is a sort of dual to subgoal stomp. i think protection by instrumentality is the main way one gets conserved structures in biological evolution ↩︎

I appreciate this post. A list of frames/examples seems like a good way to communicate an unintuitive or easily misunderstood concept.

As far as I can tell, the biggest frame collision happens around the claim like "even if we 'solve alignment', we might be screwed". I don't think you address it in this post, except perhaps implicitly in points 2 and 3. One interesting framing is Byrnes's Law of Conservation of Wisdom.

Other assorted responses;

I don’t wholeheartedly endorse such critiques because I often feel unsure of what exactly people are criticizing when they critique capitalism in this way.

Eh, yeah, "capitalism" has generally been a very common beating boy on the left for the past few decades (e.g., for every mainstream social ill, you'll probably easily find a bunch of people blaming "capitalism" for it loudly on the internet), and a conflationary alliance term. While there are important and valid variants of some of those critiques, the noise makes it difficult for them to propagate to the "collective consciousness", on top of them often being difficult to propagate ex ante.^[1]

meta-crisis

ETA: Based on a pushback in a private convo from Kaarel, I no longer stand strongly behind the following as pertaining to the concept of the meta-crisis itself. Plausibly, my judgment here was somewhat colored by bad experience with some cluster of enthusiasts of the concept. As far as I recall, I had some more substantive object-level disagreements about this that I can't recall now what they were, so I am somewhat deferring to my past self by being somewhat skeptical about this concept anyway until I engage further.

[Epistemic status: rant, partially hoping that someone shows me that I am significantly wrong here, at least partly.]

As far as I can tell, from my limited but earnest engagement with this cluster, whenever I try to steer someone towards telling me what exactly those roots of the meta-crisis are, it boils down to bad decision-making, lack of coherent collective agency, etc. All of those things are definitely great to improve, but then calling it a "meta-crisis" doesn't strike me as a good framing, because it's like calling a general and broad skill issue a "(meta-)crisis". One response would be that it's about the various structures holding up the edifice of civilization collapsing/rotting/decaying, and those respective dynamics of collapse/rot/decay reinforcing one another (I mean, this also happens in the life of an individual human, but OK, let's acknowledge that the scale of the problem matters). When I made the comment that such things have been occurring for as long as civilizations have been falling, so meta-crisis is not as new a thing as it's made out to be (except maybe insofar as we have very different sorts of social structures that were not present for the Persians, Maya, Romans, etc.).

I see some ways out here and ways to make the framing more coherent and valuable, at least on the margin, but it doesn't seem to me like people who are into the meta-crisis framing are trying to make it coherent. Instead, it seems to revolve around a general feeling of "things are getting worse" + pointers to examples of categories of things getting worse and how those ways things are getting worse are interconnected. This makes me suspicious and somewhat uninterested.

But maybe I talked to the wrong people. Maybe the concept of a meta-crisis got corrupted when being transmitted from up high. I welcome links to better explanations of the concept/frame.

</rant>

They also increasingly make more and more aspects of life subject to measurement and control via optimization of metrics, which necessarily fail to capture everything that matters.

See C Thi Nguyen's Value Capture.

Deskilling

I would actually like to see much more unpacked discussion on this, both because of more mundane concerns (e.g., the Acemoglu et al. paper that "failed [your] smell test"; I haven't read, but it's very unobvious to me that their concerns are unjustified; I'd probably conclude they're overly confident in their models, but my model of this has a lot of degrees of freedom, so IDK sounds plausible?), as well as the possibility that, insofar as people are trying to "automate 'alignment research'", the automation orchestrators getting WALL-E'd into cognitive enfeeblement and thus approving research directions and solutions that will turn out to be slop.

^{^}
Plausibly, a similar situation holds for some "conspiracy theories".

Good point RE deskilling of alignment researchers.

(footnotes in this post are broken)

It's Pythia, the patters that are predicted to be more effective power-seekers being selected for in the present, causing the entire system to side towards productivity maximizing at the expense of all other values.