Over the past year and half, I've had numerous conversations about the risks we describe in Gradual Disempowerment. (The shortest useful summary of the core argument is: To the extent human civilization is human-aligned, most of the reason for the alignment is that humans are extremely useful to various social systems like the economy, and states, or as substrate of cultural evolution. When human cognition ceases to be useful, we should expect these systems to become less aligned, leading to human disempowerment.) This post is not about repeating that argument - it might be quite helpful to read the paper first, it has more nuance and more than just the central claim - but mostly me ranting sharing some parts of the experience of working on this and discussing this.

What fascinates me isn't just the substance of these conversations, but relatively consistent patterns in how people avoid engaging with the core argument. I don't mean the cases where stochastic parrots people confused about AI progress repeat claims about what AIs can't do that were experimentally refuted half a year ago, but the cases where smart, thoughtful people who can engage with other arguments about existential risk from AI display surprisingly consistent barriers when confronting this particular scenario.

I found this frustrating, but over time, I began to see these reactions as interesting data points in themselves. In this post, I'll try to make explicit several patterns I've observed. This isn't meant as criticism. Rather, I hope that by making these patterns visible, we can better understand the epistemics of the space.

Before diving in, I should note that this is a subjective account, based on my personal observations and interpretations. It's not something agreed on or shared with the paper coauthors, although when we compared notes on this, we sometimes found surprisingly similar patterns. Think of this as one observer's attempt to make legible some consistently recurring dynamics. Let's start with what I call "shell games", after an excellent post by TsviBT.

Shell Games

The core principle of the shell games in alignment is that when people propose strategies for alignment, the hard part of aligning superintelligence is always happening in some other component of the system than what's analyzed. In gradual disempowerment scenarios, the shell game manifests as shifting the burden of maintaining human influence between different societal systems.

When you point out how automation might severely reduce human economic power, people often respond "but the state will handle redistribution." When you explain how states might become less responsive to human needs as they rely less on human labor and taxes, they suggest "but cultural values and democratic institutions will prevent that." When you point out how cultural evolution might drift memplexes away from human interests when human minds stop being the key substrate, maybe this has an economic solution or governance solution. 

What makes this particularly seductive is that each individual response is reasonable. Yes, states can regulate economies. Yes, culture can influence states. Yes, economic power can shape culture. The shell game exploits the tendency to think about these systems in isolation, missing how the same underlying dynamic - decreased reliance on humans - affects all of them simultaneously, and how shifting the burden puts more strain on the system which ultimately has to keep humans in power. 

I've found this pattern particularly common among people who work on one of the individual domains. Their framework gives them sophisticated tools for thinking about how one social system works, but usually the gradual disempowerment dynamic undermines some of the assumptions they start from, if multiple systems might fail in correlated ways.

The Flinch

Another interesting pattern in how people sometimes encounter the gradual disempowerment argument is a kind of cognitive flinch away from really engaging with it. It's not disagreement exactly; it's more like their attention suddenly slides elsewhere, often to more “comfortable”, familiar forms of AI risk. 

This happens even with (maybe especially with) very smart people who are perfectly capable of understanding the argument. A researcher might nod along as we discuss how AI could reduce human economic relevance, but bounce off the implications for state or cultural evolution. Instead, they may want to focus on technical details of the econ model, how likely it is that machines will outcompete humans in virtually all tasks including massages or something like that.

Another flinch is something like just rounding it off to some other well known story - like “oh, you are discussing multipolar scenario” or "so you are retelling Paul's story about influence-seeking patterns." (Because the top comment on G.D. LessWrong post is a bit like that, it is probably worth noting that while it fits the pattern, it is not the single or strongest piece of evidence.)

Delegating to Future AI

Another response, particularly from alignment researchers, is "This isn't really a top problem we need to worry about now - either future aligned AIs will solve it or we are doomed anyway."

This invites a rather unhelpful reaction of the type "Well, so the suggestion is we keep humans in control by humans doing exactly what the AIs tell them to do, and this way human power and autonomy is preserved?". But this is a strawman and there's something deeper here - maybe it really is just another problem, solvable by better cognition.

I think this is where the 'gradual' assumption is important. How did you get to the state of having superhuman intelligence aligned to you? If the current trajectory continues, it's not the case that the AI you have is a faithful representative of you, personally, run in your garage. Rather it seems there is a complex socio-economic process leading to the creation of the AIs, and the smarter they are, the more likely it is they were created by a powerful company or a government.

This process itself shapes what the AIs are "aligned" to. Even if we solve some parts of the technical alignment problem we still face the question of what is the sociotechnical process acting as “principal”. By the time we have superintelligent AI, the institutions creating them will have already been permeated by weaker AIs decreasing human relevance and changing the incentive landscape. 

The idea that the principal is you, personally, implies that a somewhat radical restructuring of society somehow happened before you got such AI and that individuals gained a lot of power currently held by super-human entities like bureaucracies, states or corporations. 

Also yes: it is true that capability jumps can lead to much sharper left turns. I think that risk is real and unacceptably high. I can easily agree that gradual disempowerment is most relevant in words where rapid loss of control does not happen first, but note that the gradual problem makes the risk of coups go up. There is actually substantial debate here I'm excited about.

Local Incentives

Let me get a bit more concrete and personal here. If you are a researcher at a frontier AI lab, I think it's not in your institution's self-interest for you to engage too deeply with gradual disempowerment arguments. The institutions were founded based on worries about power and technical risks of AGI, not worries about AI and macroeconomy. They have some influence over technical development, and their 'how we win' plans were mostly crafted in a period of time where it seemed this was sufficient. It is very unclear if they are helpful or have much leverage in the gradual disempowerment trajectories. 

To give a concrete example, in my read of Dario Amodei's "Machines of Loving Grace" one of the more important things to notice is not what is there, like fairly detailed analysis of progress in biology, but what is not there, or is extremely vague. I appreciate it is at least gestured at:

At that point (...a little past the point where we reach "a country of geniuses in a datacenter"...) our current economic setup will no longer make sense, and there will be a need for a broader societal conversation about how the economy should be organized.

So, we will have nice, specific things like Prevention of Alzheimer's, or some safer, more reliable descendant of CRISPR may cure most genetic disease in existing people. Also, we will need to have some conversation because the human economy will be obsolete and incentives for states to care about people will be obsolete.

I love that it is a positive vision. Also, IDK, it seems like a kind of forced optimism about certain parts of the future. Yes, we can acknowledge specific technical challenges. Yes, we can worry about deceptive alignment or capability jumps. But questioning where the whole enterprise ends, even if everything works as intended? Seems harder to incorporate into institutional narratives and strategies.

Even for those not directly employed by AI labs, there are similar dynamics in the broader AI safety community. Careers, research funding, and professional networks are increasingly built around certain ways of thinking about AI risk.  Gradual disempowerment doesn't fit neatly into these frameworks. It suggests we need different kinds of expertise and different approaches than what many have invested years developing. Academic incentives also currently do not point here - there are likely less than ten economists taking this seriously, trans-disciplinary nature of the problem makes it hard sell as a grant proposal.     

To be clear this isn't about individual researchers making bad choices. It's about how institutional contexts shape what kinds of problems feel important or tractable, how funding landscape shapes what people work on, how memeplexes or ‘schools of thought’ shape attention.  In a way, this itself illustrates some of the points about gradual disempowerment - how systems can shape human behavior and cognition in ways that reinforce their own trajectory.

Conclusion

Actually, I don't know what's really going on here. Mostly, in my life, I've seen a bunch of case studies of epistemic distortion fields - cases where incentives like money or power shape what people have trouble thinking about, or where memeplexes protect themselves from threatening ideas. The flinching moves I've described look somewhat familiar to those patterns.

New Comment
37 comments, sorted by Click to highlight new comments since:

I think you're wrong to be psychoanalysing why people aren't paying attention to your work. You're overcomplicating it. Most people just think you're wrong upon hearing a short summary, and don't trust you enough to spend time learning the details. Whether your scenario is important or not, from your perspective it'll usually look like people are bouncing off for bad reasons.

For example, I read the executive summary. For several shallow reasons,[1] the scenario seemed unlikely and unimportant. I didn't expect there to be better arguments further on. So I stopped. Other people have different world models and will bounce off for different reasons.

Which isn't to say it's wrong (that's just my current weakly held guess). My point is just that even if you're correct, the way it looks a priori to most worldviews is sufficient to explain why people are bouncing off it and not engaging properly.

Perhaps I'll encounter information in the future that indicates my bouncing off was a mistake, and I'll go back.

  1. ^

    There are a couple of layers of maybes, so the scenario doesn't seem likely. I expect power to be more concentrated. I expect takeoff to be faster. I expect capabilities to have a high cap. I expect alignment to be hard for any goal. Something about maintaining a similar societal structure without various chaotic game-board-flips seems unlikely. The goals-instilled-in-our-replacements are pretty specific (institution-aligned), and pretty obviously misaligned from overall human flourishing. Sure humans are usually myopic, but we do sometimes consider the consequences and act against local incentives. 

    I don't know whether these reasons are correct, or how well you've argued against them. They're weakly held and weakly considered, so I wouldn't have usually written them down. They are just here to make my point more concrete.

I think 'people aren't paying attention to your work' is somewhat different situation than voiced in the original post. I'm discussing specific ways in which people engage with the argument, as opposed to just ignoring it. It is the baseline that most people ignore most arguments most of time. 

Also it's probably worth noting the ways seem somewhat specific to the crowd over-represented here - in different contexts people are engaging with it in different ways. 
 

[-]dr_s62

I think the shell games point is interesting though. It's not psychoanalysing (one can think that people are in denial or have rational beliefs about this, not much point second guessing too far), it's pointing out a specific fallacy: a sort of god of the gaps in which every person with a focus on subsystem X assumes the problem will be solved in subsystem Y, which they understand or care less about because it's not their specialty. If everyone does it, that does indeed lead to completely ignoring serious problems due to a sort of bystander effect.

[+][comment deleted]20
[-]Sam MarksΩ10176

Thanks for writing this reflection, I found it useful.

Just to quickly comment on my own epistemic state here:

  1. I haven't read GD.
  2. But I've been stewing on some of (what I think are) the same ideas for the last few months, when William Brandon first made (what I think are) similar arguments to me in October.
    1. (You can judge from this Twitter discussion whether I seem to get the core ideas)
  3. When I first heard these arguments, they struck me as quite important and outside of the wheelhouse of previous thinking on risks from AI development. I think they raise concerns that I don't currently know how to refute around "even if we solve technical AI alignment, we still might lose control over our future."
  4. That said, I'm currently in a state of "I don't know what to do about GD-type issues, but I have a lot of ideas about what to do about technical alignment." For me at least, I think this creates an impulse to dismiss away GD-type concerns, so that I can justify continuing doing something where "the work cut out for me" (if not in absolute terms, then at least relative to working on GD-type issues).
  5. In my case in particular I think it actually makes sense to keep working on technical alignment (because I think it's going pretty productively).
  6. But I think that other people who work (or are considering working in) technical alignment or governance should maybe consider trying to make progress on understanding and solving GD-type issues (assuming that's possible).

I think my quick guess is that what's going on is something like:
- People generally have a ton of content to potentially consume and limited time, and are thus really picky. 
- Researchers often have unique models and a lot of specific nuanced they care about.
- Most research of this type is really bad. Tons of people on Twitter now seem to have some big-picture theory of what AI will do to civilization.
- Researchers also have the curse of knowledge, and think their work is simpler than it is.

So basically, people aren't flinching because of bizarre and specific epistemic limitations. It's more like,
> "this seems complicated, learning it would take effort, my prior is that this is fairly useless anyway, so I'll be very quick to dismiss this."

My quick impression is that this is a brutal and highly significant limitation of this kind of research. It's just incredibly expensive for others to read and evaluate, so it's very common for it to get ignored. (Learned in part from myself trying to put a lot of similar work out there, then seeing it get ignored)

Related to this -
I'd predict that if you improved the arguments by 50%, it would lead to little extra uptake. But if you got someone really prestigious to highly recommend it, then suddenly a bunch of people would be much more convinced. 

Exactly! I've also noticed there are so many ideas and theories out there, relative to the available resources to evaluate them and find the best to work on.

A lot of good ideas which I feel deserve a ton of further investigation, seem to be barely talked about after they're introduced. E.g. off the top of my head,

My opinion is that there isn't enough funds and manpower. I have an idea on increasing that, which ironically also got ignored, yay!

My quick impression is that this is a brutal and highly significant limitation of this kind of research. It's just incredibly expensive for others to read and evaluate, so it's very common for it to get ignored.

I'd predict that if you improved the arguments by 50%, it would lead to little extra uptake.

I think this is wrong. The introduction of the GD paper takes no more than 10 minutes to read and no significant cognitive effort to grasp, really. I don't think there is more than 10% potential of making it any clearer or approachable.

The introduction of the GD paper takes no more than 10 minutes to read 

Even 10 minutes is a lot, for many people. I might see 100 semi-interesting Tweets and Hacker News posts that link to lengthy articles per day, and that's already filtered - I definitely can't spend 10 min each on many of them.
 

and no significant cognitive effort to grasp, really.

"No significant cognitive effort" to read a nuanced semi-academic article with unique terminology? I tried spending around ~20-30min understanding this paper, and didn't find it trivial. I think it's very easy to make mistakes about what papers like this are really trying to say (In many ways, the above post lists out a bunch of common mistakes, for instance). I know the authors and a lot of related work, and even with that, I didn't find it trivial. I imagine things are severely harder for people much less close to this area. 

I think there's an important crux here.

For people who write ideas/theories, and hope their ideas/theories get traction, the frustration is often directed at critics who reject their idea without taking the time to read it.

Meanwhile, there are many supportive people in the comments, who did take the time to read the idea, and did say "yes, this is a good idea, I never thought of it! Good luck working on it."

The author only sees these two groups of people, and feels that his/her fight is to push people in the former group to read their idea more clearly, so that they may move to the latter group.

But the author doesn't realize that even if they did read the idea, and did move to the latter supportive group. The idea will be almost as badly ignored in the end.

It would cure his/her frustration towards "people who never bothered to read," but his/her idea won't take off and succeed either. He/she will finally learn that there is something else to be frustrated about: even if everyone reads your idea and agrees with your idea, nobody has time to do anything about it.

A lot of authors never reach this second stage of frustration, because there is indeed a large group of critics going around criticizing ideas without reading them.

But these critics are rarely the real reason your idea is getting ignored.

I'm one of the people who have strongly supported many ideas/theories, only to never talk about them again, because I don't have the time. I see many others doing this too.

The real problem is still the lack of time.

EDIT: actually I'm a bit confused. Maybe the real problem is the argument cannot just argue why the idea is good or why the theory is plausible, but why a reader (satisfying some criteria) should drop what she is doing and work on the idea/theory for a while. Maybe it should give a cost and benefit analysis? I'm not sure if this will fix the Idea Ignoring Problem.

I would guess that the range of things people propose for the shell game is tractable to get a good survey of. It'd be interesting to try to plot out the system as a causal graph with recurrence so one can point to, "hey look, this kind of component is present in a lot of places", and see if one can get that causal graph visualization to show enough that it starts to feel clear to people why this is a problem. I doubt I'll get to this, but if I play with this, I might try to visualize it [edit: probably with the help of a skilled human visual artist to make the whole chart into an evocative comic] with arrays of arrows vaguely like,

a -> b -> c_1  ->  c_1
          ...  ->  ...
          c_n  ->  c_n

          |
          v

          d_1 ... d_n
^
|         |     /
          v    v

f    <-   e
              

where c might be, idk, people's bank accounts or something, d might be people's job decisions, e might be an action by some single person, etc. there's a lot of complexity in the world, but it's finite, and not obviously beyond us to display the major interactions. being able to point to the graph and say "I think there are arrows missing here" seems like it might be helpful. it should feel like, when one looks at the part of the causal graph that contains ones' own behavior, "oh yeah, that's pretty much got all the things I interact with in at least an abstract form that seems to capture most of what goes on for me", and that should be generally true for basically anyone with meaningful influence on the world.

ideally then this could be a simulation that can be visualized as a steppable system. I've seen people make sim visualizations for public consumption - https://ncase.me/, https://www.youtube.com/@PrimerBlobs - it doesn't exactly look trivial to do, but it seems like it'd allow people to grok the edges of normality better to see normality generated by a thing that has grounding, and then see that thing in another, intuitively-possible parameter setup. It'd help a lot with people who are used to thinking about only one part of a system.

But of course trying to simulate abstracted versions of a large fraction of what goes on on earth sounds like it's only maybe at the edge of tractability for a team of humans with AI assistance, at best.

I'm sure some of people's ignorance of these threat models comes from the reasons. But my intuition is that most of it comes from "these are vaguer threat models that seem very up in the air, and other ones seem more obviously real and more shovel-ready" (this is similar to your "Flinch", but I think more conscious and endorsed).

Thus, I think the best way to converge on whether these threat models are real/likely/actionable is to work through as-detailed-as-possible example trajectories. Someone objects that the state will handle it? Let's actually think through how the state might look like in 5 years! Someone objects that democracy will prevent it? Let's actually think through the actual consequences of cheap cognitive labor in democracy!
This is analogous to what pessimists about single-single alignment have gone through. They have some abstract arguments, people don't buy them, so they start working through them in more detail or provide example failures. I buy some parts of them, but not others. And if you did the same for this threat model, I'm uncertain how much I'd buy!

Of course, the paper might have been your way of doing that. I enjoyed it, but still would have preferred more fully detailed examples, on top of the abstract arguments. You do use examples (both past and hypothetical), but they are more like "small, local examples that embody one of the abstract arguments", rather than "an ambitious (if incomplete) and partly arbitrary picture of how these abstract arguments might actually pan out in practice". And I would like to know the messy details of how you envision these abstract arguments coming into contact with reality. This is why I liked TASRA, and indeed I was more looking forward to an expanded, updated and more detailed version of TASRA.

I have been discussing thoughts along these lines. My essay A Path to Humany Autonomy argues that we need to slow AI progress and speed up human intelligence progress. My plan for how to accomplish slowing AI progress is to use novel decentralized governance mechanisms aided by narrow AI tools. I am working on fleshing out these governance ideas in a doc. Happy to share.

[-]plex72

the extent human civilization is human-aligned, most of the reason for the alignment is that humans are extremely useful to various social systems like the economy, and states, or as substrate of cultural evolution. When human cognition ceases to be useful, we should expect these systems to become less aligned, leading to human disempowerment.

oh good, I've been thinking this basically word for word for a while and had it in my backlog. Glad this is written up nicely, far better than I would likely have done :)

The one thing I'm not a big fan of: I'd bet "Gradual Disempowerment" sounds like a "this might take many decades or longer" to most readers, whereas with capabilities curves this could be a few months to single digit years thing.

Do we have a good story about why this hasn't already happened to humans?  Systems don't actually care about the individuals they comprise, and certainly don't care about the individuals that are neither taxpayers, selectorate, contributors, or customers.

Why do modern economies support so many non-participants?  Let alone the marginal and slightly sub-marginal workers, which don't cost much and may have option value or be useful to keep money moving in some way, there are a lot who are clearly a drain on resources.  

[-]qbolec100

I think the framework from "Dictator's Handbook" can be applied: citizens get as much freedom an benefits as is (short-term) optimal for the rulers. For example, if a country needs skilled labor and transportation to create tax revenue, then you can predict the govt will fund schools, roads and maybe even hospitals. OTOH if the country has rich deposits of gold located near the ports, then there's no need for any of that.

Since reading this book I am also very worried by scenarios of human disempowerment. I've tried to ask some questions around it:

I wonder if this is somehow harder to understand for citizens of USA, than for someone from a country which didn't care about its citizens at all. For example, after Lukashenko was "elected" in Belarus, people went to the streets to protest, yet, this didn't make any impression on the rulers. They didn't have any bargaining power, it seems.

importantly, in the dictators handbook case, some humans do actually get the power.

This is why I was stating the scenario in the paper cannot really lead to existential catastrophe, at least without other assumptions here.

Participants are at least somewhat aligned with non-participants. People care about their loved ones even if they are a drain on resources. That said, in human history, we do see lots of cases where “sub-marginal participants” are dealt with via genocide or eugenics (both defined broadly), often even when it isn’t a matter of resource constraints.

When humans fall well below marginal utility compared to AIs, will their priorities matter to a system that has made them essentially obsolete? What happens when humans become the equivalent of advanced Alzheimer’s patients who’ve escaped from their memory care units trying to participate in general society?

When humans fall well below marginal utility compared to AIs, will their priorities matter to a system that has made them essentially obsolete?

The point behind my question is "we don't know.  If we reason analogously to human institutions (which are made of humans, but not really made or controlled BY individual humans), we have examples in both directions.  AIs have less biological drive to care about humans than humans do, but also have more training on human writings and thinking than any individual human does.  

My suspicion  is that it won't take long (in historical time measure; perhaps only a few decades, but more likely centuries) for a fully-disempowered species to become mostly irrelevant.  Humans will be pets, perhaps, or parasites (allowed to live because it's easier than exterminating them).  Of course, there are plenty of believable paths that are NOT "computational intelligence eclipses biology in all aspects" - it may hit a wall, it may never develop intent/desire, it may find a way to integrate with biologicals rather than remaining separate, etc.  Oh, and it may be fragile enough that it dies out along with humans.

I think you miss the point where gradual disempowerment from AI happens as AI is more economically and otherwise performant option that systems can and will select instead of humans. Less reliance on human involvement leads to less bargaining power for humans. 

But I mean we already have examples like molochian corporate structures that kind of lost the need to value individual humans as they can afford high churn rate and there are always other people to get a decently paid corporate job even if the conditions are ... suboptimal.

Even for those not directly employed by AI labs, there are similar dynamics in the broader AI safety community. Careers, research funding, and professional networks are increasingly built around certain ways of thinking about AI risk.  Gradual disempowerment doesn't fit neatly into these frameworks. It suggests we need different kinds of expertise and different approaches than what many have invested years developing. Academic incentives also currently do not point here - there are likely less than ten economists taking this seriously, trans-disciplinary nature of the problem makes it hard sell as a grant proposal.

I agree this is unfortunate, but this also seems irrelevant? Academic economics (as well as sociology, political science, anthropology, etc.) are approximately completely irrelevant to shaping major governments' AI policies. "Societal preparedness" and "governance" teams at major AI labs and BigTech giants seem to have approximately no influence on the concrete decisions and strategies of their employers.

The last economist who influenced the economic and policy trajectory significantly was Milton Friedman perhaps?

If not research, what can affect the economic and policy trajectory at all in a deliberate way (disqualifying the unsteerable memetic and cultural drift forces), apart from powerful leaders themselves (Xi, Trump, Putin, Musk, etc.)? Perhaps the way we explore the "technology tree" (see https://michaelnotebook.com/optimism/index.html)? Such as the internet, social media, blockchain, form factors of AI models, etc. I don't hold too much hope here, but this looks to me like the only plausible lever.

[-]dr_s20

So, we will have nice, specific things like Prevention of Alzheimer's, or some safer, more reliable descendant of CRISPR may cure most genetic disease in existing people. Also, we will need to have some conversation because the human economy will be obsolete and incentives for states to care about people will be obsolete.
 

 

I feel like the fundamental problem with this is that while scientific and technological progress can be advanced intentionally, I can't think of an actual example of large scale social change happening in some kind of planned way. Yes, the thoughts of philosophers and economists have some influence on it, but it almost never takes the shape of whatever they originally envisioned. I don't think Karl Marx would have been super happy with the USSR. And very often the causal arrows goes the other way around - philosophers and economists express and give shape to a sentiment that already exists formless in the zeitgeist, due to various circumstances changing and thus causing a corresponding cultural shift. There is a feedback loop there, but generally speaking, the idea that we can even have intentional "conversations" about these things and somehow steer them very meaningfully seems more wishful thinking than reality to me.

It generally goes that Scientist Invents Thing, unleashes it into the world, and then everything inevitably and chaotically slides towards the natural equilibrium point of the new regime. 

I broadly agree with this, though I'll state 2 things:

  1. Limited steering ability doesn't equal 0 steering ability, and while there's an argument to be made that people overestimate how much you can do with pure social engineering, I do still think there can be multiple equilibrium points, even if a lot of what happens is ultimately controlled by incentives.

  2. AIs probably have a much easier time coordinating on what to do, and importantly can route around a lot of the bottlenecks that exist in human societies solely due to copying, merging and scaling, so assuming alignment is achieved, it's very possible for single humans to do large scale social change by controlling the economy and military, and working your way from there.

It's possible a few concrete stories can illustrate why these shell game objections are wrong.

Unfortunately, it's very hard to use concrete stories when the uncertainty is very high, and each specific case is quite unlikely. Here's one of my attempts. I admit the probability of anything even roughly similar is 0.1%, and it sounds closer to science fiction than reality. Maybe you can write a better one?

I think a big part of the issue is not just the assumptions people use, but also because your scenario doesn't really lead to existential catastrophe in most worlds, if only because a few very augmented humans determine a lot of what the future does hold, at least under single-single alignment scenarios, and a lot of AI thought has been directed towards worlds where AI does do existential risk, and a lot of this is because of the values of the first thinkers on the topic.

More below:

https://www.lesswrong.com/posts/pZhEQieM9otKXhxmd/gradual-disempowerment-systemic-existential-risks-from#GChLyapXkhuHaBewq

@the gears to ascension is there a plausible scenario in your mind where the gradual disempowerment leads to the death/very bad fates for all humans?

Because I'm currently struggling to understand the perspective where alignment is solved, but all humans still die/irreversibly lose control due to gradually being disempowered.

A key part of the challenge is that you must construct the scenario in a world where the single-single alignment problem/classic alignment problem as envisioned by LW is basically solved for all intents and purposes.

"all" humans? like, maybe no, I expect a few would survive, but the future wouldn't be human, it'd be whatever distorted things those humans turn into. My core take here is that humans generalize basically just as poorly as we expect AIs to, (maybe a little better, but on a log scale, not much), in terms of their preferences still pointing at the things even they thought they did given a huge increase in power. crown wearing the king, drug seeking behavior, luxury messing up people's motivation, etc. if you solve "make an ai be entirely obedient to a single person", then that person needs to be wise enough to not screw that up, and I trust exactly no one to even successfully use that situation to do what they want, nevermind what others around them want. For an evocative cariacature of the intuition here, see rick sanchez.

"all" humans?

 
The vast majority of actual humans are already dead.  The overwhelming majority of currently-living humans should expect 95%+ chance they'll die in under a century.  

If immortality is solved, it will only apply to "that distorted thing those humans turn into".   Note that this is something the stereotypical Victorian would understand completely - there may be biological similarities with today's humans, but they're culturally a different species.

I mean, we're not going to the future without getting changed by it, agreed. but how quickly one has to figure out how to make good use of a big power jump seems like it has a big effect on how much risk the power jump carries for your ability to actually implement the preferences you'd have had if you didn't rush yourself.

very bad fates for all humans

I believe there’s also a disagreement here where the same scenario will be considered fine by some and very bad by others (humans as happy pets comes to mind).

To be clear, I'm expecting scenarios much more clearly bad than that, like "the universe is almost entirely populated by worker drone AIs and there are like 5 humans who are high all the time and not even in a way they would have signed up for, and then one human who is being copied repeatedly and is starkly superintelligent thanks to boosts from their AI assistants but who had replaced almost all of their preferences with an obsession with growth in order to get to being the one who had command of the first AI, and didn't manage to break out of it using that AI, and then got more weird in rapid jumps thanks to the intense things they asked for help with."

like, the general pattern here being, the crucible of competition tends to beat out of you whatever it was you wanted to compete to get, and suddenly getting a huge windfall of a type you have little experience with that puts you in a new realm of possibility will tend to get massively underused and not end up managing to solve subtle problems.

Nothing like, "oh yeah humanity generally survived and will be kept around indefinitely without significant suffering".

My main crux here is I think that no strong AI rights will likely be given before near-full alignment to one person is achieved, and maybe not even then, and a lot of the failure modes of giving AIs power in gradual disempowerment scenario fundamentally route through giving AIs very strong rights, but thankfully, this is disincentivized by default, because otherwise AIs would be more expensive.

The main way this changes the scenario is that the 6 humans here remain broadly in control here, and aren't just high all the time, and the first one probably doesn't just replace their preferences with pure growth, because at the level of billionaires, status dominates, so they are likely living very rich lives with their own servants.

No guarantees about anyone else surviving though:

  • No strong AI rights before full alignment: There won't be a powerful society that gives extremely productive AIs "human-like rights" (and in particular strong property rights) prior to being relatively confident that AIs are aligned to human values.
    • I think it's plausible that fully AI-run entities are given the same status as companies - but I expect that the surplus they generate will remain owned by some humans throughout the relevant transition period.
    • I also think it's plausible that some weak entities will give AIs these rights, but that this won't matter because most "AI power" will be controlled by humans that care about it remaining the case as long as we don't have full alignment.
       

If the current trajectory continues, it's not the case that the AI you have is a faithful representative of you, personally, run in your garage. Rather it seems there is a complex socio-economic process leading to the creation of the AIs, and the smarter they are, the more likely it is they were created by a powerful company or a government.

This process itself shapes what the AIs are "aligned" to. Even if we solve some parts of the technical alignment problem we still face the question of what is the sociotechnical process acting as “principal”.

This touches an idea I'd really like to get more attention. The idea that we should build AI fundamentally tethered to human nature so that this drift toward arbitrary form that e.g. "just so happens to sell best" doesn't happen. I call it tetherware - more about in my post here.

One problem is that arguments against AI development do look a lot like known-bad arguments against prior innovations. It is true that people have had complaints and luddite arguments against all manner of social and technological change in the past, and yet the clear trend of at least 600 years is that things trend substantially towards getting better in pretty obvious ways. 

For example, Maxwell Tabbarok had a thread recently comparing fears of humans becoming economically redundant to the "Lump of Labor" fallacy that was been invoked when past automation put people out of work. "Aha," he thinks. "Here is an easy misconception to refute. To say otherwise would be to say that the future will look nothing like the past. It is practically to deny inductive reasoning itself." And he pats himself on the back. 

So we are hamstrung from the jump by needing to convey "yes, similar arguments were wrong repeatedly... but here's why this is different." It's a hard sell, and many smart people will be unwilling to trust us on that first step.

but note that the gradual problem makes the risk of coups go up.

Just a request for editing the post to clarify: do you mean coups by humans (using AI), coups by autonomous misaligned AI, or both?

[+][comment deleted]Ω3100