(I’ve only read the parts I’m responding to)
My high-level view is that the convincing versions of gradual disempowerment either rely on misalignment or result [from] power concentration among humans.
It feels like this statement should be qualified more; later it is stated that GD isn’t “similarly plausible to the risks from power-seeking AI or AI-enabled coups”, but this is holding GD to a higher bar; the relevant bar would seem to be “is plausible enough to be worth considering”.
“Rely[ing] on misalignment” is also an extremely weak condition: I claim that current systems are not aligned, and gradual disempowerment dynamics are already at play (cf AI “arms race”).
The analysis of economic disempowerment seems to take place in a vacuum, ignoring one of the main arguments we make, which is that different forms of disempowerment can mutually reinforce each other. The most concerning version of this, I think, is not just “we don't get UBI”, but rather that the memes that say “it's good to hand over as much power as quickly as possible to AI” win the day.
The analysis of cultural disempowerment goes one step “worse”, arguing that “If humans remain economically empowered (in the sense of having much more money than AI), I think they will likely remain culturally empowered.” I think we agree that a reasonable model here is one where cultural and economic are tightly coupled, but I don’t see why that means they won’t both go off the rails. You seem to think that they are almost guaranteed to feedback on each other in a way that maintains human power, but I think it can easily go the opposite way.
Regarding political disempowerment, you state: “It’s hard to see how those leading the state and the top AI companies could be disempowered, absent misalignment.” Personally, I find this quite easy. Insufficient elite coordination is one mechanism (discussed below). But reality can also just be unfriendly to you and force you to make choices about how you prioritize long-term vs. short-term objectives, leading people to accept deals like: “I'll be rich and powerful for the next hundred years, and then my AI will take over my domain and do as it pleases”. Furthermore, if more people take such deals, this creates pressure for others to do so as well, since you need to get power in the short-term in order to remain “solvent” in the long term, even if you aren’t myopic yourself. I think this is already happening; the AI arms race is burning the commons every day; I don’t expect it to stop.
Regarding elite coordination, I also looked at the list under the heading “Sceptic: Why don’t the elites realise what’s happening and coordinate to stop it?” Another important reason not mentioned is that cooperating usually produces a bargaining game where there is no clearly correct way to split the proceeds of the cooperation.
Yeah I roughly agree.
EtA: I might say algorithmic trading and marketting (which are older) are alread doing this, e.g., but it's a bit subjective and uncertain.
and to date, this fact has little to do with AI.
This seems incorrect over the last couple of years. But also incorrect historically if you broaden from AI to "information processing and person modelling technologies that help turn money into influence".
But more generally, GD can be viewed as a continuation of historical trends or not. I think I'm more in the "continuation" camp vs. e.g. Duvenaud, who would stress that things change once humans become redundant.
Seems to be missing old stuff by Stuart Armstrong (?)
Some further half-baked thoughts:
One thing that is still not clear (both in reality, and per this article) is the extent to which we should view a model as having a coherent persona/goal.
This is a tiny bit related to the question of whether models are strictly simulators, or if some personas / optimization daemons "take on a life of their own", and e.g.:
1) bias the model towards simulating them and/or
2) influence the behavior of other personas
It seems like these things do in fact happen, and the implications are that the "simulator" viewpoint becomes less accurate over time.
Why?
This was an interesting article, however, taking a cynical/critical lens, it seems like "the void" is just... underspecification causing an inner alignment failure? The post has this to say on the topic of inner alignment:
And one might notice, too, that the threat model – about inhuman, spontaneously generated, secret AI goals – predates Claude by a long shot. In 2016 there was an odd fad in the SF rationalist community about stuff kind of like this, under the name “optimization demons.” Then that discourse got sort of refurbished, and renamed to “inner alignment” vs. “outer alignment.”
This is in the context of mocking these concerns as delusional self-fulfilling prophecies.
I guess the devil is in the details, and the point of the post is more to dispute the framing and ontology of the safety community, which I found useful. But it does seem weirdly uncharitable in how it does so.
In the big round (without counterarguments), arguments pushed people upward slightly more:
(more than downward -- not more than previous surveys)
First, RE the role of "solving alignment" in this discussion, I just want to note that:
1) I disagree that alignment solves gradual disempowerment problems.
2) Even if it would that does not imply that gradual disempowerment problems aren't important (since we can't assume alignment will be solved).
3) I'm not sure what you mean by "alignment is solved"; I'm taking it to mean "AI systems can be trivially intent aligned". Such a system may still say things like "Well, I can build you a successor that I think has only a 90% chance of being aligned, but will make you win (e.g. survive) if it is aligned. Is that what you want?" and people can respond with "yes" -- this is the sort of thing that probably still happens IMO.
4) Alternatively, you might say we're in the "alignment basin" -- I'm not sure what that means, precisely, but I would operationalize it as something like "the AI system is playing a roughly optimal CIRL game". It's unclear how good of performance that can yield in practice (e.g. it can't actually be optimal due to compute limitations), but I suspect it still leaves significant room for fuck-ups.
5) I'm more interested in the case where alignment is not "perfectly" "solved", and so there are simply clear and obvious opportunities to trade-off safety and performance; I think this is much more realistic to consider.
6) I expect such trade-off opportunities to persist when it comes to assurance (even if alignment is solved), since I expect high-quality assurance to be extremely costly. And it is irresponsible (because it's subjectively risky) to trust a perfectly aligned AI system absent strong assurances. But of course, people who are willing to YOLO it and just say "seems aligned, let's ship" will win. This is also part of the problem...
My main response, at a high level:
Consider a simple model:
I predict that group A survives, but the humans are no longer in power. I think this illustrates the basic dynamic. EtA: Do you understand what I'm getting at? Can you explain what you think it wrong with thinking of it this way?
Responding to some particular points below:
Sure, but these things don't result in non-human entities obtaining power right?
Yes, they do; they result in beaurocracies and automated decision-making systems obtaining power. People were already having to implement and interact with stupid automated decision-making systems before AI came along.
Like usually these are somewhat negative sum, but mostly just involve inefficient transfer of power. I don't see why these mechanisms would on net transfer power from human control of resources to some other control of resources in the long run. To consider the most extreme case, why would these mechanisms result in humans or human appointed successors not having control of what compute is doing in the long run?
My main claim was not that these are mechanisms of human disempowerment (although I think they are), but rather that they are indicators of the overall low level of functionality of the world.
I think we disagree about:
1) The level of "functionality" of the current world/institutions.
2) How strong and decisive competitive pressures are and will be in determining outcomes.
I view the world today as highly dysfunctional in many ways: corruption, coordination failures, preference falsification, coercion, inequality, etc. are rampant. This state of affairs both causes many bad outcomes and many aspects are self-reinforcing. I don't expect AI to fix these problems; I expect it to exacerbate them.
I do believe it has the potential to fix them, however, I think the use of AI for such pro-social ends is not going to be sufficiently incentivized, especially on short time-scales (e.g. a few years), and we will instead see a race-to-the-bottom that encourages highly reckless, negligent, short-sighted, selfish decisions around AI development, deployment, and use. The current AI arms race is a great example -- Companies and nations all view it as more important that they be the ones to develop ASI than to do it carefully or put effort into cooperation/coordination.
Given these views:
1) Asking AI for advice instead of letting it take decisions directly seems unrealistically uncompetitive. When we can plausibly simulate human meetings in seconds it will be organizational suicide to take hours-to-weeks to let the humans make an informed and thoughtful decision.
2) The idea that decision-makers who "think a goverance structure will yield total human disempowerment" will "do something else" also seems quite implausible. Such decision-makers will likely struggle to retain power. Decision-makers who prioritize their own "power" (and feel empowered even as they hand off increasing decision-making to AI) and their immediate political survival above all else will be empowered.
Another features of the future which seems likely and can already be witnessed beginning is the gradual emergence and ascendance of pro-AI-takeover and pro-arms-race ideologies, which endorse the more competitive moves of rapidly handing off power to AI systems in insufficiently cooperative ways.
Thanks!
> Do you think that, absent AI power-seeking, this dynamic is highly likely to lead to human disempowerment? (If so, then i disagree.)
As a sort-of answer, I would just say that I am concerned that people might knowingly and deliberately build power-seeking AIs and hand over power to them, even if we have the means to build AIs that are not power-seeking.
> I said "absent misalignemnt", and I think your story involves misalignment?
It does not. The point of my story is: "reality can also just be unfriendly to you". There are trade-offs, and so people optimize for selfish, short-term objectives. You could argue people already do that, but cranking up the optimization power without fixing that seems likely to be bad.
My true objection is more that I think we will see extreme safety/performance trade-offs due to technical inadequacies -- ie (roughly) the alignment tax is large (although I don't like that framing). In that case, you have misalignment despite also having a solution to alignment: competitive pressures prevent people from adopting the solution.