I appreciate this post. A list of frames/examples seems like a good way to communicate an unintuitive or easily misunderstood concept.
As far as I can tell, the biggest frame collision happens around the claim like "even if we 'solve alignment', we might be screwed". I don't think you address it in this post, except perhaps implicitly in points 2 and 3. One interesting framing is Byrnes's Law of Conservation of Wisdom.
Other assorted responses;
I don’t wholeheartedly endorse such critiques because I often feel unsure of what exactly people are criticizing when they critique capitalism in this way.
Eh, yeah, "capitalism" has generally been a very common beating boy on the left for the past few decades (e.g., for every mainstream social ill, you'll probably easily find a bunch of people blaming "capitalism" for it loudly on the internet), and a conflationary alliance term. While there are important and valid variants of some of those critiques, the noise makes it difficult for them to propagate to the "collective consciousness", on top of them often being difficult to propagate ex ante.[1]
meta-crisis
[Epistemic status: rant, partially hoping that someone shows me that I am significantly wrong here, at least partly.]
As far as I can tell, from my limited but earnest engagement with this cluster, whenever I try to steer someone towards telling me what exactly those roots of the meta-crisis are, it boils down to bad decision-making, lack of coherent collective agency, etc. All of those things are definitely great to improve, but then calling it a "meta-crisis" doesn't strike me as a good framing, because it's like calling a general and broad skill issue a "(meta-)crisis". One response would be that it's about the various structures holding up the edifice of civilization collapsing/rotting/decaying, and those respective dynamics of collapse/rot/decay reinforcing one another (I mean, this also happens in the life of an individual human, but OK, let's acknowledge that the scale of the problem matters). When I made the comment that such things have been occurring for as long as civilizations have been falling, so meta-crisis is not as new a thing as it's made out to be (except maybe insofar as we have very different sorts of social structures that were not present for the Persians, Maya, Romans, etc.).
I see some ways out here and ways to make the framing more coherent and valuable, at least on the margin, but it doesn't seem to me like people who are into the meta-crisis framing are trying to make it coherent. Instead, it seems to revolve around a general feeling of "things are getting worse" + pointers to examples of categories of things getting worse and how those ways things are getting worse are interconnected. This makes me suspicious and somewhat uninterested.
But maybe I talked to the wrong people. Maybe the concept of a meta-crisis got corrupted when being transmitted from up high. I welcome links to better explanations of the concept/frame.
</rant>
They also increasingly make more and more aspects of life subject to measurement and control via optimization of metrics, which necessarily fail to capture everything that matters.
See C Thi Nguyen's Value Capture.
Deskilling
I would actually like to see much more unpacked discussion on this, both because of more mundane concerns (e.g., the Acemoglu et al. paper that "failed [your] smell test"; I haven't read, but it's very unobvious to me that their concerns are unjustified; I'd probably conclude they're overly confident in their models, but my model of this has a lot of degrees of freedom, so IDK sounds plausible?), as well as the possibility that, insofar as people are trying to "automate 'alignment research'", the automation orchestrators getting WALL-E'd into cognitive enfeeblement and thus approving research directions and solutions that will turn out to be slop.
About a year ago, we wrote a paper that coined the term “Gradual Disempowerment.”
It proved to be a great success, which is terrific. A friend and colleague told me that it was the most discussed paper at DeepMind last year (selection bias, grain of salt, etc.) It spawned articles in the Economist and the Guardian.
Most importantly, it entered the lexicon. It’s not commonplace for people in AI safety circles and even outside of them to use the term, often in contrast with misalignment or rogue AI. Gradual Disempowerment tends to resonate more than Rogue AI with people outside AI safety circles.
But there’s still a lot of confusion about what it really is and what it really means. I think it’s a very intuitive concept, but also I still feel like I don’t have everything clear in my mind. For instance, I think our paper both introduces the concept and presents a structured argument that it could occur and be catastrophic. But these things seem somewhat jumbled together both in my mind and the discourse..
So for reasons including all of the above, I plan to write a few posts on the topic, starting with this one.
The rest of this post is a list of ten different ways of thinking about or arguing for gradual disempowerment that I’ve used or encountered.
We’re replacing people with AI. These days when I speak publicly about AI, I often find myself returning to i) the more-or-less explicit goal of many AI companies and researchers of “automating all human labor”, and ii) the fact that many people in the space view humanity as a “bootloader for AI” as Elon Musk evocatively put it. Gradual Disempowerment is the process by which this replacement happens without AI ever rising up -- AI takes our jobs, and the people who control it and still have power increasingly are those who embrace “merging with the machines”, i.e. becoming cyborgs, but with the human bits being phased out over time until before long, humans cease to exist entirely.
Companies and governments don’t intrinsically care about you. This is basically the main argument in the paper… You can think of companies and governments as “agents” or “beings” that are driven by goals like (e.g.) “quarterly profits” or “GDP” or “national security”. Right now, the best ways to achieve these goals make use of humans. In the future, the best ways will instead make use of AI. A relentless pursuit of such goals, powered by AI, seems likely to destroy the things humans need to survive.
It’s (“global” or “late stage”) capitalism. The previous argument bears a significant resemblance to existing arguments, popular on the left, that “capitalism” is responsible for most of the world’s present ills. This feels like a decent “80/20” version of the concern, but importantly, it’s not just companies, but also governments (whose power is often more feared by those on the right) that could end up turning against their citizens once they become useless to them. And indeed, we’ve seen “communist” countries slaughter their own people by the millions. Besides wondering what alternative critics imagine, I don’t wholeheartedly endorse such critiques because I often feel unsure of what exactly people are criticizing when they critique capitalism in this way. But for people who already have this mental model, where our current social arrangements treat people as somewhat disposable or lacking in fundamental dignity or worth, this can be a useful starting point for discussion.
It’s another word for (or the primary symptom of) the “meta-crisis”. A few people in my circles have told me about this concept from Daniel Schmachtenberger, which I originally encountered on a podcast somewhere. The key claim is that all the crises we observe in the modern world are driven by some shared underlying factors. I view this as basically a more nuanced version of the view above, where “capitalism” is the root of all evil: The meta-crisis is still meant to be the root of all evil, but we don’t fully understand its nature. The way I like to describe the basic problem is that we are not practicing good enough methods of collective decision-making, or collective sense-making. And while I think we have some good ideas for improving on the status quo, we don’t have a proven solution.
It’s a structural consequence of the way in which information technology demands metrics, enables large scale influence campaigns, translates money into political power, and concentrates power via a recursive feedback loop. This one is maybe a bit too much to unpack in this blog post, but basically, society is increasingly “standardized” not only in terms of products, but also in terms of processes (e.g. restrictive customer service scripts or standard operating procedures) that have the benefit of being cheap, scalable, and reliable (often by eliminating “human error”, i.e. limiting human decision-making power and otherwise encouraging compliance). They also increasingly make more and more aspects of life subject to measurement and control via optimization of metrics, which necessarily fail to capture everything that matters. This general issue was a prime concern of mine before I learned about deep learning in 2012, and realized we might get to Real AI quite soon -- notably, this can happen even with stupid AI.1 In fact, you could argue that gradual disempowerment is already occurring through advertising, corporate media, and money in politics, among other things. This makes it a bit unclear how far back to go.
It’s evolution, baby! Maybe gradual disempowerment is best viewed as part of a much larger trend, going quite far back: evolution. People like to say “AI is the next stage in evolution” as if that means it’s okay if humanity goes extinct. But whether it’s OK or not, it may be that “Natural Selection Favors AIs over Humans”. At the end of the day, if AI becomes much better than humans at everything, it does seem a bit strange from a “survival of the fittest” point of view that humans would stick around. In such a situation, those who hand over more power and resources to AI would presumably outcompete those who don’t. So in the limit, AIs would end up with ALL the power and resources.
…and there’s no natural limit to outsourcing decision making to AI, even if you don’t trust it. AIs could be like underlings that are untrustworthy, but so skilled that competitive pressures still compel us to delegate to them. Consider the trope of the cowboy cop who’s “a loose cannon, but DAMMIT he’s the best we have!” Trust is important, and people are loath to use things they don’t trust. But AI seems to be becoming a tool so powerful that you almost HAVE to use it, even though it’s not secure, even though we haven’t solved alignment, even though we see evidence of scheming in tests, even though it seems to drive people crazy, etc… For me, this mostly comes up as a counter-argument to people who claim that market forces actually favor making AI aligned and trustworthy… that’s certainly true if doing so is free, but in fact, it’s impossible right now, and alignment doesn’t solve the problem of negative externalities.2 I like to analogize AI to a button that gives you $1,000,000 when you push it, but each time you press it also increases the temperature of the earth by a fraction of a degree. Or each time you press it has a 1% chance of destroying the world.
It’s an incarnation of Moloch. One of the most famous blog posts in the history of AI safety is Meditations on Moloch. It’s often considered a parable about coordination failures, but I think of it as about the triumph of “instrumental goals” over “terminal goals”, i.e. the pursuit of money (“instrumental goal”) as a means to happiness that has a tendency to become an end (“terminal goal”) in itself. We might begin handing over power to AI systems because we hope they will help achieve our goals. But we might need to hand over more and more power and also the AI might need to focus more and more simply on acquiring power in order to avoid being outcompeted by other AIs. This is also like an even deeper version of the evolution argument -- evolution and Moloch as described in the post both have the property where it’s unclear if they can really ever be “defeated” or are rather just part of the way the world works.
It’s on a (2D) spectrum with Rogue AI x-risk scenarios. Rogue AI scenarios are where “the AI suddenly seizes power”; gradual disempowerment is “we gradually hand over power”. There are lots of scenarios in the middle where the handoff takes place in part due to recklessness or negligence, rather than deliberately. One thing I don’t like about this way of talking about it is that I actually think gradual disempowerment is entirely compatible with full-blown rogue AI, in fact, I think one of the most likely outcomes is that competitive pressures simultaneously drive gradual disempowerment and reckless racing towards superintelligence, warning signs are ignored, and at some point in the reckless and chaotic exploration of the AI design space, rogue AI pops out.
Deskilling, aka “the WALL-E problem”. A lot of people these days seem to think of gradual disempowerment as largely about humans losing our own capabilities, e.g. for critical thinking, because we defer to AI so much. Professor Stuart Russell called this the “WALL-E” problem. To be honest, I still don’t fully understand or buy into this concern, or see how it necessarily leads to total disempowerment, but thought it’s worth a mention, due to its place in the discourse.
This might be as bad with smarter AI -- they can use more sophisticated judgments. But that ability also makes it tempting to put them in charge of more stuff.
This point seems important enough I almost want to make it its own item in the list.