The best approaches for mitigating "the intelligence curse" (or gradual disempowerment); my quick guesses at the best object-level interventions

My response to the alignment / AI representatives proposals:

Even if AIs are "baseline aligned" to their creators, this doesn't automatically mean they are aligned with broader human flourishing or capable of compelling humans to coordinate against systemic risks. For an AI to effectively say, "You are messing up, please coordinate with other nations/groups, stop what you are doing" requires not just truthfulness but also immense persuasive power and, crucially, human receptiveness. Even if pausing AI was the correct thing to do, Claude is not going to suggest this to Dario for obvious reasons. As we've seen even with entirely human systems (Trump’s Administration and Tariff), possessing information or even offering correct advice doesn't guarantee it will be heeded or lead to effective collective action.

[...] "Politicians...will remain aware...able to change what the system is if it has obviously bad consequences." The climate change analogy is pertinent here. We have extensive scientific consensus, an "oracle IPCC report", detailing dire consequences, yet coordinated global action remains insufficient to meet the scale of the challenge. Political systems can be slow, captured by short-term interests, or unable to enact unpopular measures even when long-term risks are "obviously bad." The paper [gradual disempowerment] argues AI could further entrench these issues by providing powerful tools for influencing public opinion or creating economic dependencies that make change harder.

Extract copy pasted from a longer comment here.

[-]Thane Ruthenis5moΩ102815

I don't see how these help.

First, it seems to me that interoperability + advisors would be useless for helping people sociopolitically maneuver precisely up until the point the AI models are good enough to disempower most of them. Imagine some not-particularly-smart person with no AI expertise and not much mental slack for fighting some abstract battle about the future of humanity. A demographic of those people is then up against the legal teams of major corporations and the departments of major governments navigating the transition. The primary full-time jobs of the people working in the latter groups, at which they'd be very skilled, would be figuring out how to disempower the former demographics. In what case do the at-risk demographics stand any chance?

Well, if the AI models are good enough that the skills and the attentions of the humans deploying them don't matter. Which is to say: past the point at which the at-risk demographics are already disempowered.

Conversely, if the AI capabilities are not yet there, then different people can use AIs more or less effectively depending on how smart and skilled they are, and how much resources/spare attention for the fight they have. In which case the extant powers are massively advantaged and on-expectation win.

Second, I'm skeptical about feasibility. This requires, basically, some centralized distribution system which (1) faithfully trains AI models to be ultimately loyal to their end advisees, (2) fully subsidizes the compute costs for serving these model to all the economically useless people (... in the US? ... in the world?). How is that centralized system not subverted/seized by the extant powers? (E. g., the government insisting, in a way that sounds surface-level reasonable, on ultimate loyalty to its laws first, which it can then freely rewrite to have complete control.)

Like, suppose this system is set up some time before AI models are good enough to make human workers obsolete. As per the first point, the whole timespan prior to the AI models becoming that good would involve the entrenched powers successfully scheming towards precisely the gradual disempowerment we're trying to prevent. How do we expect this UBI-like setup^[1] to still be in place, uncorrupted, by the point AI models become good enough, given that it is the largest threat to the ability of extant powers to maintain/increase their power? This would be the thing everyone in power is trying to dismantle. (And, again, unless the AIs are already good enough to disempower the risk demographics, the risk demographics plus their AI representatives would be massively worse at fighting this battle than their opponents plus their AI representatives.)

Third: Okay, let's suppose the system is somehow in place, everyone in the world has a loyal AI representative advising them and advocating for their interests. What are these representatives supposed to do? Like, imagine some below-average-intelligence menial-labor worker hopelessly outclassed by robots. What moves is their AI representative supposed to make to preserve their power and agency? They don't really have any resources to bargain with; it's the ground truth.

Will the AIs organize them to march to the military bases/datacenters/politicians' and CEOs' homes and physically seize the crucial resources/take the key players hostage, or what?

There was a thought experiment going around on Twitter a few months back, which went:

Suppose people have to press one of two buttons, Blue and Red. If more than 50% of people press Blue, nobody dies. If more than 50% press Red, everyone who pressed Blue dies. Which button do you press?

Red is guaranteed to make you safe, and in theory, if everyone pressed Red, everyone would be safe. But if we imagine something isomorphic to this happening in reality, we should note that getting 100% of people to make the correct choice is incredibly hard. Some would be distracted, or confused, or they'd slip and fall and smack the wrong button accidentally.

To me, all variations on "let's give everyone in the world an AGI advisor to prevent disempowerment!" read as "let's all play Red!". The details don't quite match (50% survival/non-disempowerment rate is not at all guaranteed), but the vibe and the failure modes are the same.

^{^}
Which is also supposed to be implemented during Trump's term?

[-]Stephen Martin5mo30

One objection that I have with your feasibility section is that you seem to lump in "the powers that be" as a single group.

This would be the thing everyone in power is trying to dismantle

The world is more multipolar than this, and so are the US legal and political systems. Trump and the silicon valley accelerationist crowd do hold a lot of power, but just a few years ago they were persona non-grata in many circles.

Even now when they want to pass bills or change laws, they need to lobby for support from disparate groups both within and without their party. With a sufficiently multipolar world where even just a few different groups have powerful models assisting in their efforts, there will be some who want to change laws and rules in one way, others who want to change it in a different way, and others who don't want to change it at all. There will be some who are ambivalent.

I'm not saying the end result isn't corruption, I think that parasitic middlemanning any redistribution is a basin of attraction for any political spending and/or power. But there will be many different parties competing to corrupt it, or shore it up, according to their own beliefs and interests.

I think the argument that making the world more multipolar where a more diverse array of parties have models, may in fact lead to greater stability and less corruption (or at least more diverse coalitions when coalition building occurs).

[-]Joseph Miller5mo144

Implicit in my views is that the problem would be mostly resolved if people had aligned AI representatives which helped them wield their (current) power effectively.

Can you make the case for this a bit more? How are AI representatives going to help people prevent themselves becoming disempowered / economically redundant? (Especially given that you explicitly state you are skeptical of "generally make humans+AI (rather than just AI) more competitive").

Mandatory interoperability for alignment and fine-tuning

Furthermore, I don't really see how fine-tuning access helps create AI representatives. Models are already trained to be helpful and most people don't have very useful personal data that would make their AI work much better for them (that can't be put in context of any model).

The hope here would be to get the reductions in concentration of power that come from open source

The concentration of power from closed source AI comes from (1) the AI companies' profits and (2) the AI companies having access to more advanced AI than the public. Open source solves (1), but fine-tuning access solves neither. (Obviously your "Deploying models more frequently" proposal does help with (2)).

[-]Chris_Leong5mo20

Fine-tuning access could address (1) if there's still sufficient access to drive down prices insofar as the fine-tuned model operators capture profit that would otherwise go to the main AI labs.

Fine-tuning access allows the public to safely access models that might be too dangerous to open-source/open-weight.

[-]L Rudolf L5mo133

Thanks a lot for this post! I appreciate you taking the time to engage, I think your recommendations are good, and I agree with most of what you say. Some comments below.

Intelligence curse != Gradual disempowerment

"the intelligence curse" or "gradual disempowerment"—concerns that most humans would end up disempowered (or even dying) because their labor is no longer valuable.

The intelligence curse and GD are not equivalent. In particular, I expect @Jan_Kulveit & co. would see GD as a broader bucket including also various subtle forms of cultural misalignment (which tbc I think also matter!), whereas IC is more specifically about things downstream of economic (and hard power, and political power) incentives. (And I would see e.g. @Tom Davidson's AI-enabled coup risk work as a subset of IC, as representing the most sudden and dramatic way that IC incentives could play out)

On the stakes

It's worth noting I doubt that these threats would result in huge casualty counts (due to e.g. starvation) or disempowerment of all humans (though substantial concentration of power among a smaller group of humans seems quite plausible).
[fn:]
That said, I do think that technical misalignment issues are pretty likely to disempower all humans and I think war, terrorism, or accidental release of homicidal bioweapons could kill many. That's why I focus on misalignment risks.

I think if you follow the arguments, disempowerment of all humans is plausible, and disempowerment of the vast majority even more so. I agree that technical misalignment is more likely to lead to high casualty counts if it happens (and I think the technical misalignment --> x-risk pathway is possible and incredibly urgent to make progress on).

I think there's also a difference between working on mitigating very clear sequences of steps that lead to catastrophe (e.g. X --> Y --> everyone drops dead), and working on maintaining the basic premises that make things not broken (e.g. for the last 200 years when things have been getting much better, the incentives of power and humans have been remarkably correlated, and maybe we should try to not decorrelate them). The first is more obvious, but I think you should also be able to admit theories of change of the second type at least sufficiently that, for example, you would've decided to resist communism in the 1950s ("freedom good" is vague, and there wasn't yet consensus that market-based economies would provide better living standards in the long run, but it was still correct to bet against the communists if you cared about human welfare! basic liberalism is very powerful!).

The proposals

Mandatory interoperability for alignment and fine-tuning is a great idea that I'm not sure I've heard before!
Alignment to the user is also a great idea (which @lukedrago & I also wrote about in The Intelligence Curse). There are various reasons I think the per-user alignment needs to be quite fine-grained (e.g. I expect it requires finetuning / RLHF, not just basic prompting), which I think you also buy (?), and which I hope to write about more later.

Implicit in my views is that the problem would be mostly resolved if people had aligned AI representatives which helped them wield their (current) power effectively.

Yep, this is a big part of the future I'm excited to build towards.

Prioritization of proposals

I'm skeptical of generally diffusing AI into the economy, working on systems for assisting humans, and generally uplifting human capabilities. This might help some with societal awareness, but doesn't seem like a particularly leveraged intervention for this. Things like emulated minds and highly advanced BCIs might help with misalignment, but otherwise seems worse than AI representatives (which aren't backdoored and don't have secret loyalties/biases).

I think there are two basic factors that affect uplift chances:

Takeoff speed—if this is fast, then uplift matters less. However, note that there are two distinct ways in which time helps:
- more societal awareness over time and more time to figure out what policies to advocate and what steps to take and so on—and the value of this degrades very quickly with takeoff speed increasing
- people have more power going into the extreme part of takeoff—but note that how rapidly you can ramp up power also increases with takeoff speed (e.g. if you can achieve huge things in the last year of human labor because of AI uplift, you're in a better position when going into the singularity, and the AI uplift amount is related to takeoff speed)
How contingent is progress along the tech tree. I believe:
- The current race towards agentic AGI in particular is much more like 50% cultural/path-dependent than 5% cultural/path-dependent and 95% obvious. I think the decisions of the major labs are significantly influenced by particular beliefs about AGI & timelines; while these are likely (at least directionally) true beliefs, it's not at all clear to me that the industry would've been this "situationally aware" in alternative timelines.
- Tech progress, especially on fuzzier / less-technical things about human-machine interaction and social processes, is really quite contingent. I think we'd have much meaningfully worse computer interfaces today if Steve Jobs had never lived.

(More fundamentally, there's also the question of how high you think human/AI complementarity at cognitive skills to be—right now it's surprisingly high IMO)

I'm skeptical that local data is important.

I'm curious what your take on the basic Hayek point is?

I agree that AI enabled contracts, AI enabled coordination, and AIs speeding up key government processes would be good (to preserve some version of rule of law such that hard power is less important). It seems tricky to advance this now.

I expect a track record of trying out some form of coordination at scale is really helpful for later getting it into government / into use by more "serious" actors. I think it's plausible that it's really hard to get governments to try any new coordination or governance mechanism before it's too late, but if you wanted to increase the odds, I think you should just very clearly be trying them out in practice.

Understanding agency, civilizational social processes, and how you could do “civilizational alignment” seems relatively hard and single-single aligned AI advisors/representatives could study these areas as needed (coordinating research funding across many people as needed).

I agree these are hard, and also like an area where it's unclear if cracking R&D automation to the point where we can hill-climb on ML performance metrics gets you AI that does non-fake work on these questions. I really want very good AI representatives that are very carefully aligned to individual people if we're going to have the AIs work on this.

[-]Noosphere895mo31

The current race towards agentic AGI in particular is much more like 50% cultural/path-dependent than 5% cultural/path-dependent and 95% obvious. I think the decisions of the major labs are significantly influenced by particular beliefs about AGI & timelines; while these are likely (at least directionally) true beliefs, it's not at all clear to me that the industry would've been this "situationally aware" in alternative timelines.

This is probably cruxy here, as I viewed the race to replace humans with AI as much less path-dependent ever since I realized the giant scale-up of compute happened, as well as the bitter lesson occuring, combined with scale-up of pure self-supervised learning as hitting slowdowns, and more generally subscribe to a view in which research is less path-dependent than people think.

More generally, I'm very skeptical of changing the ultimate paradigm for AGI into something that's safer but less competitive, and I believe your initial proposals relied on changing the AI paradigm to significantly complement humans using local knowledge, rather than straight-up automate them, but I view automation as unlocking >99% of the value or more due to the long tail of cases that occur IRL, so this is a big amount of value to give up.

(More fundamentally, there's also the question of how high you think human/AI complementarity at cognitive skills to be—right now it's surprisingly high IMO)

I also suspect this is a lesser crux, and while I do think complementarities exist, I'd say that the human+AI complement is basically always much less valuable than an AI straight up replacing the human, if replacing the human actually worked.

[-]ryan_greenblatt5mo20

The intelligence curse and GD are not equivalent.

Yep, though I think solutions are often overlapping. I should have clarified this.

I'm not going to respond to the rest of this (at least right now), sorry.

[-]Jan_Kulveit5moΩ590

Do states and corporations also have their aligned representatives? Is the cognitive power of the representatives equal, roughly equal, or wildly unequal? If it is unequal, why are the resulting equilibria pro-human? (i.e. if I imagine individual humans like me represented by eg GPT4 while the government runs tens of thousands o4s, I would expect my representative to get convinced about whatever government wants)

[-]ryan_greenblatt5moΩ670

Note: as a general policy I'm not planning on engaging with the comments here, this is just because I don't want to spend a bunch of time on this topic and this could easily eat a bunch of time. Sorry about that.

[-]RogerDearnaley5mo40

Aligning AI representatives / advisors to individual humans: If every human had a competitive and aligned AI representative which gave them advice on how to advance their interests as well as just directly pursuing their interests based on their direction (and this happened early before people were disempowered), this would resolve most of these concerns.

My personal prediction is that this would result in vast coordination problems that would likely rapidly lead to war and x-risk. You need a mechanism to produce a consensus or social compact, one that is at least as effective as our existing mechanisms, preferably more so. (While thinking about this challenge, please allow for the fact that 2–4% of humans are sociopathic, so an AI representative representing their viewpoint is likely to be significantly less prosocial.)

Possibly you were concealing some assumptions of pro-social/coordination behavior inside the phrase "aligned AI representative" — I read that as "aligned to them, and them only, to the exclusion of the rest of society — since they had it realigned that way", but possibly that's not how you meant it?

[-]Stephen McAleese5mo41

Some of my thoughts on avoiding the intelligence curse or gradual disempowerment and ensure that humans stay relevant:

One solution to ensure that the gap between human and AI intelligence does not grow too large:
- I think it's often easier to verify solutions than generate them which allows less intelligent agents to supervise more intelligent agents. For example, writing a complex computer program might take 10 hours but checking that the code generally takes ~1 hour and running the program and seeing if it behaves as expected only takes a few minutes. This goal could be achieved by limiting the intelligence of AIs or enhancing human cognitive ability somehow.
Devise ways for giving humans a privileged status:
- AI agents and their outputs will soon vastly outnumber those of humans. Additionally it's becoming impossible to distinguish between the outputs of AIs and humans.
- One solution to this problem is to make humans more identifiable by watermarking AI outputs (note that watermarks are widely used for paper money) or developing strong proof of human identity (e.g. the blue Twitter mark, iPhone face ID, fingerprint login). This approach is similar to authentication which is a well-known security problem.
- A short-term solution to differentiating between humans and AIs is to conduct activities in the physical world (although this won't work once sufficiently advanced humanoid robots are developed). For example, voting, exams, and interviews can be carried out in the real world to ensure that participants are human.
- Once you have solved the problem of differentiating between AI and human outputs, you could upweight the value of human outputs (e.g. writing, art).

[-]Raphael Roche5mo10

Human authentification and real world activities seem indeed very important. Deepfake is a form of disempowerment and can destroy or destabilize states before employment becomes a concern. AI generated content can already be near or sometimes strictly undistinguishable from human generated content. Texts, pictures, videos. We are just at the beginning of the flood. Disinformation explodes on the internet and governments fall in the hands of populist and nationalist parties the one after the other. It's also a dramatic concern for justice. Should we go back to analogic contents ?

[-]lukedrago5mo3-1

Good to see your work on this! I'll avoid jumping in on weighing this relative to other problems as it's not the core of your post.

Rudolf and I are proponents of alignment to the user, which seems very similar to your second suggestion. Do you think there's a difference in the approach you outline vs the one we do? I'm considering doing a larger write-up on this approach, so your feedback would be helpful.

[-]ryan_greenblatt5mo40

Nope, not claiming my proposal here is different from "alignment to the user". I probably should have made this clear. I wasn't trying to claim the interventions were novel approaches (though I think mandatory interoperability is), just that my prioritization/highlighting was novel.

[-]Jan_Kulveit5mo20

My guess is for the prioritization work in particular, it would be useful to understand the threat model better.

[-]ryan_greenblatt5mo40

Seems right, I just had some thoughts which seemed maybe useful so I decided to quickly write them up.

(Rudolf encouraged me to post as a top level post, I was initially going to post as a quick take.)

[-]Tim H5mo20

I was surprised to not see much consideration, either here or in the original GD and IC essays, of the brute force approach of "ban development of certain forms of AI," such as Anthony Aguirre proposes. Is that more (a) because it would be too difficult to enforce such a ban or (b) because those forms of AI are considered net positive despite the risk of human disempowerment?

[-]Jan_Kulveit5mo3-1

Not commenting on here, but from my perspective, in very short form
- bans and pauses have a big problem to overcome: being "incentive compatible" (it's mostly not enforcement - stuff can be enforced by hard power - but why would actors agree?)
- in some sense this is a coordination problem
- my guess is most likely form how to overcome the coordination problem in good way involves some AI cognition helping humans to coordinate -> suggests differential technological development
- other viable forms of overcoming the coordination problem seems possible, but often unappealing for various reasons I don't want to advocate atm

^{^}

That said, I do think that technical misalignment issues are pretty likely to disempower all humans and I think war, terrorism, or accidental release of homicidal bioweapons could kill many. That's why I focus on misalignment risks.

LESSWRONG
LW