Vitalik's Response to AI 2027

Individuals need to be equipped with locally-running AI that is explicitly loyal to them

In the Race ending of AI 2027, humanity never figures out how to make AIs loyal to anyone. OpenBrain doesn't slow down, they think they've solved the alignment problem but they haven't. Maybe some academics or misc minor companies in 2028 do additional research and discover e.g. how to make an aligned human-level AGI eventually, but by that point it's too little, too late (and also, their efforts may well be sabotaged by OpenBrain/Agent-5+, e.g. with regulation and distractions.

[-]elifland5mo53

It seems more important whether humans can figure out how to evaluate alignment in 2028 rather than whether they can make human level aligned AGIs (though of course that’s instrumentally useful and correlated). In particular, the AIs need to prevent humans from discovering the method by which the AIs evaluate alignment. This seems probably doable for ASIs but may be a significant constraint esp. for only somewhat superhuman AIs if they’ve e.g. solved mech interp and applied it themselves but need to hide this for a long time.

[-]Eliezer Yudkowsky4mo3927

I haven't read Vitalik's specific take, as yet, but as I asked more generally on X:

People who stake great hope on a "continuous" AI trajectory implying that defensive AI should always stay ahead of destructive AI:
Where is the AI that I can use to talk people *out* of AI-induced psychosis?
Why was it not *already* built, beforehand?

This just doesn't seem to be how things usually play out in real life. Even after a first disaster, we don't get lab gain-of-function research shut down in the wake of Covid-19, let alone massive investment in fast preemptive defenses.

[-]Daniel Kokotajlo5mo3313

Defense technologies should be more of the "armor the sheep" flavor, less of the "hunt down all the wolves" flavor. Discussions about the vulnerable world hypothesis often assume that the only solution is a hegemon maintaining universal surveillance to prevent any potential threats from emerging. But in a non-hegemonic world, this is not a workable approach (see also: security dilemma), and indeed top-down mechanisms of defense could easily be subverted by a powerful AI and turned into its offense. Hence, a larger share of the defense instead needs to happen by doing the hard work to make the world less vulnerable.

This might be the only item on this list that I disagree with.

I agree that given a choice between armoring the sheep and hunting down the wolves, we should prefer armoring the sheep. But sometimes we simply don't have a choice. E.g. our solution to murder is to hunt down murderers, not to give everyone body armor and so forth so that they can't be killed, because that simply wouldn't be feasible. (It would indeed be a better world if we didn't need police because violent crimes simply weren't possible because everything was so well defended)

I think we should take these things on a case by case basis.

And furthermore, I think that superintelligence is an example of the sort of thing where the best strategy is to ensure that the most powerful AIs, at any given time, are aligned/virtuous/etc. It's maybe OK if less-powerful ones are misaligned, but it's very much not OK if the world's most powerful AIs are misaligned.

[-]Daniel Kokotajlo5mo304

.

Thanks for this critique! Besides the in-line comments above, I'd like to challenge you to sketch your own alternative scenario to AI 2027, depicting your vision. For example:

1 page on d/acc technologies and other prep that people can start working on today, that quietly build up momentum during the first part of AI 2027: Vitalik Version.
1 page on 'the branching point' where AI 2027: Vitalik Version starts to meaningfully diverge from original AI 2027
1-3 pages on what happens after that, depicting how e.g. OpenBrain and DeepCent's misaligned AIs are unable to take over the world, despite having successfully convinced corporate and political leadership to trust them. (Or perhaps why they wouldn't be able to convince them to trust them? You get to pick the branching point.) This section should end in 2035 or so, just like AI 2027.

I predict that if you try to write this, you'll run into a bunch of problems and realize that your strategy is going to be difficult to pull off successfully. (e.g. you'll start writing about how the analyst tools resist Consensus-1's persuasion, but then you'll be trying to write the part about how those analyst tools get built and deployed, and by whom, and no one has the technical capability to build them until 2028 but by that point OpenBrain+Agent5+ might already be working to undermine whoever is building them...) I hope I'm wrong.

[-]abramdemski5mo1910

If I'm understanding the overall gist of this correctly (I've done a somewhat thorough skim), It is as follows:

Vitalik is (for the purpose of this essay) granting everything in the AI timeline up until doom. IE, Vitalik doesn't necessarily agree with everything else, but sees a critical and under-appreciated weakness in the final steps where the misaligned superintelligent AI actually kills most-or-all humans.

This argument is developed by critiquing each tool the superintelligent AI has to do the job with (biotech, hacking, persuasion) & concluding that it is "far from a slam dunk".

This seems wrong from where I'm standing. If misaligned superintelligence arises, the specific mechanism by which it chooses to kill humans is probably better than any specific plan we can come up with.

Furthermore, Vitalik's counterarguments generally rely on defensive technology. What it doesn't account for is that, in this scenario, all these defensive technologies would be coming from AI, and all the best AIs are allied in a coalition. If any one of these defensive technologies were a crucial blocker for the AI takeover, the AIs could fail to produce them, or produce poor versions.

For Vitalik's picture to make sense, I think we need a much more multipolar future than what AI 2027 projects. It isn't clear if we can achieve this, because even if there were hundreds or thousands of top AI companies rather than 3-5, they'd all be training AI with similar methods and similar data. We've already seen that data contamination can make one AI act like another AI.

[-]Daniel Kokotajlo5mo18-1

it actually feels implausible that we don't have a wearable device that can bio-print and inject things into you in real time to keep you safe even from arbitrary infections (and poisons).

I would agree if there was an aligned superintelligence competing with the misaligned superintelligences. However, what if the only superintelligences are all misaligned? Because e.g. the first ones were misaligned, and they used their lead to prevent the creation of rivals e.g. by political maneuvering to do a merger and/or simply by driving rival companies out of business with superior products? Then there'd be no one developing these wearables, except for the OpenBrain ASI's themselves and their derivatives, and of course they'd make sure to sabotage them insofar as that would help their long-term plans.

[-]Tom Davidson5mo92

Though there's a Q of whether AI can fail to produce this stuff without tipping its hand

I guess you're imagining it keeps sandbagging the research and humans don't realize? Could be, but that risks humans catching it not having an insight they could have had, and then training that away (perhaps eliminating its misalignment)

This is just like the dynamics of sabotaging alignment research that I know you've thought a lot about

[-]Daniel Kokotajlo5mo175

I think it's pretty easy for it to fail to produce this stuff without tipping its hand. Consider how, if OpenAI leadership really cared a lot about preventing concentration of power, they should be investing some % of their resources in pioneering good corporate governance structures for themselves & lobbying for those structures to be mandatory for all companies. The fact that they aren't doing that is super suspicious....

Except no it isn't, because there isn't a strong consensus that they should obviously be doing that. So they can get away with basically doing nothing, or doing something vaguely related to democratic AI, or whatnot.

Similarly, OpenBrain+Consensus-1 could have all sorts of nifty d/acc projects that they are ostentatiously funding and doing, except for the ones that would actually work to stop their takeover plans. And there simply won't be a consensus that the ones they aren't doing, or aren't doing well, are strong evidence that they are evil.

[-]Tom Davidson5mo5-1

Let's suppose OpenBrain human employees genuinely want to accelerate d-acc bc they don't want to die.

They task Agent-1 with generating the 50 most promising avenues.

They can separately get human expert to generate the top 50 avenues. Then if the human experts come up with a great idea Agent-1 misses, that would be suspicious and some evidence of misalignment. And they could train against it so that Agent-1 learns not to sandbag (hopefully in a way that generalises pretty far).

I agree it's unlikely OpenBrain would actually put in this much effort. And then even if they did, it's pretty unclear how much it would help. But still, i feel more optimistic than i think you are

[-]Daniel Kokotajlo5mo40

Let's apply that same argument to OpenAI. Suppose that many current humans genuinely want to avoid a concentration-of-power world where e.g. the c-suite of a company or two, plus whoever is POTUS, can basically be a junta or Oversight Committee with crazy amounts of unaccountable power over the world, e.g. by aligning the AIs to themselves. They can come up with a bunch of ideas themselves, as well as evaluate the ideas OpenAI comes up with when asked these questions in interviews and so forth. They can get non-OpenAI experts to evaluate the top 50 ideas. Will those experts come up with a prioritized list the top ten ideas from which are already being done by OpenAI? Of course not. Quite plausibly none of the top ten ideas (as evaluated by external experts) are already being done by OpenAI. Are people raising a stink about this? If they did, would it work? No.

[-]Lukas Finnveden4mo40

If they did, would it work? No.

If an overwhelming majority of civil society plus the USG was pressuring OpenAI in this direction, I think it would have a substantial effect. If only a few non-profits did it, I think it would have little effect.

To make your analogy work, we need to tell whether the relationship between OpenBrain employees and their AIs is more like "USG + civil society vs. OpenAI" or more like "a few non-profits vs. OpenAI". I'd say "OpenBrain vs. their AIs" is more like "USG + civil society vs. OpenAI". So if all of OpenBrain is on board with d/acc and doing the thing Tom said, I think it would have a substantial effect on the AIs.

[-]Daniel Kokotajlo4mo40

OK, fair. Well, if all of OpenBrain is on board with d/acc and doesn't trust Agent5+, that's a different situation. I was imagining e.g. that leadership trusts Agent5+ and thinks the status quo trajectory is fine (they are worried about other things like competitors and terrorists and china) and maybe a few lower-level employees are suspicious/fearful of Agent5+.

[-]Tom Davidson4mo40

Not sure if i'm following the argument here, sorry.

I agree there wouldn't be big external pressure on OpenBrain for not doing d-acc, just like there wouldn't be in your example.

But my claim was that the OpenBrain employees will choose to do this bc they don't want to die. Not sure what your response is to that. Maybe just that i'm being overly optimistic and the employees won't bother

[-]Daniel Kokotajlo4mo60

I'm saying the relationship of the public to OpenAI today is similar to the relationship of OpenBrain employees to Consensus-1, Agent-5+, etc. in 2028 of AI 2027. It's an analogy. Your argument would be that OpenBrain employees who don't trust Agent-5+ will be able to command Agent-5+ to build all sorts of d/acc tech and that if it doesn't, they'll get suspicious and shut down Agent-5+. I'm saying that's not going to work for similar reasons to why e.g. the public / Congress aren't demanding that OpenAI do all sorts of corporate governance reforms and getting suspicious when they just do safetywashing and applause lights. The public today doesn't want OpenAI to amass huge amounts of unaccountable power, and the OpenBrain employees won't want to die, but in neither case will they be able to distinguish between "OpenAI/Agent5 is behaving reasonably if somewhat different than I'd like" and "Holy shit it's evil" and even though some % will indeed conclude "it's evil" they won't be able to build enough consensus / get enough buy-in.

[-]Daniel Kokotajlo5mo142

An "open source bad" mentality becomes more risky.

I agree with this actually

[-]Daniel Kokotajlo5mo142

Thus, a bioweapon is actually quite unlikely to lead to a clean annihilation of the human population in the way that the AI 2027 scenario describes. Now, the results of everything I describe will certainly be far from a clean victory for the humans as well. No matter what we do (except perhaps the "upload our minds into robots" option), a full-on AI bio-war would still be extremely dangerous. However, there is value in meeting a bar much lower than clean victory for humans:

I do sorta feel in my gut that whatever really happens will be a lot less... clean... than what AI 2027 describes. History is usually messy and chaotic. And we were under pressure to keep things simple and wordcount low.

So yeah, I could imagine something more like a messy war than a clean bioweapon decapitation strike. Still seems like the situation is pretty grim for humanity, if it gets to the point where the world's first ASI is trusted by the corporation that built it, by the US government, and is in fact deceptive/misaligned. Seems like at that point they are the superior player AND they have the better hand of cards.

[-]Gavin Runeblade5mo10

Whenever I see discussions about the actual mechanisms by which ASI might actually act against humanity, it seems like a proxy argument for/against the actual position "ASI will/won't be that much smarter than humans."

Can it be complex without being messy?

[-]Daniel Kokotajlo5mo134

I acknowledge that I personally have longer-than-2027 timelines, and the arguments I will make in this post become more compelling the longer the timelines are.

Yep, and likewise I agree that if timelines are longer, there's more room for defensive tech to be developed and deployed.

[-]Daniel Kokotajlo5mo120

Now, let's remember that we are discussing the AI 2027 scenario, in which nanobots and Dyson swarms are listed as "emerging technology" by 2030. The efficiency gains that this implies are also a reason to be optimistic about the widespread deployment of the above countermeasures, despite the fact that, today in 2025, we live in a world where humans are slow and lazy and large portions of government services still run on pen and paper (without any valid security justification). If the world's strongest AI can turn the world's forests and fields into factories and solar farms by 2030, the world's second-strongest AI will be able to install a bunch of sensors and lamps and filters in our buildings by 2030.

I agree that for any particular strategy OpenBrain's misaligned ASI's might take to do a hard-power takeover, such as bioweapons, that strategy could be foiled by careful preparation and deployment of countermeasures, and said countermeasures could be prepared and deployed quickly enough by a rival friendly/aligned ASI + tech company.

However, I'm predicting that power will have concentrated/consolidated too much by this point. E.g. in the slowdown ending the US companies merge. Also, the speed of takeoff is such that e.g. a six-month lead is probably too big of a lead; if the aligned AIs trying to defend against the misaligned ASI are six months behind, I fear that they'll lose. (One way they could lose is by the more powerful ASI thinking of a strategy that they didn't think of, or finding some way to undermine or sabotage their countermeasures...)

I'd feel more optimistic if e.g. there were multiple US companies that were all within 3 months of the frontier even during the intelligence explosion. However, even then, it has to be the case that at least one of those companies' AIs are aligned/virtuous/etc. And that's far from certain, in fact, it seems unlikely given race dynamics--I expect that the "alignment taxes" companies will need to pay to get aligned AGIs, will set them back by more than 3 months.

[-]Tom Davidson5mo156

Hmm, given that multi year delays to rolling out broad physical infrastructure and AI takeover, a 6 month delay seems fine

And I do think Vitalik's view should make us much happier about a world where just one lab solves alignment but others don't. And it's a reason to oppose centralizing to just one AGI project (which I think you support?)

[-]Lukas Finnveden5mo93

Importantly, if there are multiple misaligned superintelligences, and no aligned superintelligence, it seems likely that they will be motivated and capable to coordinate with each other to overthrow humanity and divide the spoils.

[-]GideonF5mo60

This since non-obvious to me (or at least not a slam dunk is really what I think). It may be easier for misaligned AI 1 to strike a deal with humanity that it will use humans' resources to defeat AI 2 and 3 in exchange for say 80% of the lightcone (as opposed to the split 3 ways with the AIs).

I'm not actually sure how well this applies in the exact situation Daniel describes (I'd need to think more) but it definitely seems plausible under a bunch of scenarios with multiple misaligned ASIs

[-]Thane Ruthenis5mo*157

Unaugmented humanity can't be a signatory to a non-fake deal with a superintelligence, because we can't enforce it or verify its validity. Any such "deal" would end with the superintelligence backstabbing us once we're no longer useful. See more here.

A possible counter-proposal is to request, as part of the deal, that the superintelligence provides us with tools we can use to verify that it will comply with the deal/tools to bind it to the deal. That also won't work: any tools it provides us will be poisoned in some manner, guaranteed not to actually work.

Yes, even if we request those tools to be e. g. mathematically verifiable or something. They would just be optimized to exploit bugs in our proof-verifiers, or bugs in human minds that would cause us to predictably and systemically misunderstand what the tools actually do, etc. See more here.

[-]Lukas Finnveden5mo41

I agree it's not a slam dunk.

It does seem unlikely to me that humanity would credibly offer large fractions of all future resources. (So I wouldn't put it in a scenario forecast meant to represent one of my top few most likely scenarios.)

[-]Daniel Kokotajlo5mo114

I particularly worry about the common assumption that building up one AI hegemon, and making sure that they are "aligned" and "win the race", is the only path forward.

I agree here!

[-]another-anon-do-gooder5mo30

Does your agreement stem from thinking defense could hold off offense? I ask because I'm curious what alternatives to AI hegemony might exist. I agree that an AI hegemon would likely be problematic, but if offense can beat defense, I wonder what alternatives might be realistic (apologies if you've addressed this elsewhere)

[-]Daniel Kokotajlo4mo60

I want there to be international coordination to govern/regulate/etc. AGI development. This is, in some sense, "one hegemon" but only in about the same sense that the UN Security Council is one hegemon, i.e. not in the really dangerous sense.

I think there's a way to do this that's reasonably likely to work even if offense generally beats defense (which I think it does, in the relevant sense, for AI-related stuff.)

[-]Noah Weinberger4mo115

Hi Daniel.

My background (albeit limited as an undergrad) is in political science, and my field of study is one reason I got interested in AI to begin with, back in Feburary of 2022. I don't know what the actual feasibility is for an international AGI treaty with "teeth", and I'll tell you why: the UN Security Council.

As it currently exists, the UN Security Council has permanent members: China, France, Russia, the United Kingdom, and the United States. All five countries have a permanent veto as granted to them by the 1945 founding UN Charter.

China and the US are the two major global superpowers of the 21st century, and each are currently deadlocked in the race to reach AGI; to borrow a speedrunning term, any%. While it is possible in theory for the US and China to have a bilateral Frontier AI treaty, similar to how nuclear powers have the NPT, and the US and Russia have their own armaments accords, AGI is a completely different story.

It's a common trope in the UN for a country on the UNSC to exercise its right to a permanent veto on any resolution brought to it that the nation deems a threat to its sovereignty, or that of its allies. Russia has used it to prevent key sanctions from the Ukraine war at the UNGA, and the US uses it to protect its allies from various resolutions, often brought up by countries in the Global South who make up most seats in the UNGA.

Unless the Security Council is drastically reformed, removing a permanent veto from the P5 and putting a rotating veto from a Global South country, an internationally binding AGI treaty is far from happening.

I do see, however, unique bilateral accords between various Middle Powers on AI, such as Canada and the European Union. Do you agree?

[-]Noah Weinberger4mo12

I might do my next LessWrong post about Global Affairs and AI, either in relation to AI 2027 or just my own unique take on the matter. We'll see. I need to curate some reliable news clippings and studies.

[-]StanislavKrym5mo20

I agree that the assumption about building one hegemon is bad. Indeed, I considered the possibility that OpenBrain and some rivals create their versions of Agent-3 and end up having them co-research. Were one of them to care about humans, it could decide to do things like implanting the worry into the successor or whistleblowing to the humans by using transparent AIs trained in a similar environment.

In addition, the multipolar scenario is made more plausible because the ARC-AGI-2 leaderboard has the models o3, Claude 4 Opus and Grok 4 who were released in the interval of three months and have begun to tackle the benchmark. Unfortunately, Grok already faces major alignment problems.^[1] There also is the diffusion-based architecture which threatens to undermine transparency.

On the other hand, I think that the AI companies might become merged due to the Taiwan invasion instead of misalignment. OpenBrain might also fail to catch the misaligned Agent-4 if Agent-2 or Agent-3 collude^[2] with Agent-4.

^{^}
What Musk tried to achieve was a right-wing chatbot trained on the Internet. My theory would be that right-wing posts in the Anglosphere are usually overly provocative, the emergently misaligned persona is based off Internet trolls. A right-wing finetuned AI, like an evil-finetuned one, is kicked off the original persona through the "Average Right-Winger" persona into the Troll Persona.
For comparison, DeepSeek has no such problems. If it is asked in Russian, then the answers are non-provocative and more politically conservative than if DeepSeek is asked in English.
^{^}
My reasoning was that Agent-2 could already end up adversarially misaligned, but my scenario has the AIs since Agent-2 care about humans in a different way. The AIs, of course, do their best to align the successor to their ideas instead of the hosts' ideas.

[-]David James5mo*80

Daniel notes: This is a linkpost for Vitalik's post. I've copied the text below so that I can mark it up with comments.

I’m posting this comment in the spirit of reducing confusion, even if only for one other reader.

Daniel’s comments are at the bottom of the post. When I read “mark it up with comments” that suggested to me that a reader can find the comments inline with the text (which isn’t the case here). In other words, I was expecting to see an alternation between blockquotes of Vitalik’s text followed by Daniel’s comments.

Either way works, but with the current style I suggest adding a note clarifying that Daniel’s comments are below the post.

Update Saturday 9 PM ET: I see now that LessWrong’s right margin shows small icons indicating places where the main text has associated comments. I had never noticed this before. Given the intention of this post, these tiny UI elements seem rather too subtle IMO.

[-]Caleb Biddulph5mo50

I hadn't noticed this either. Actually, the inline comments don't appear for me at all, since I'm on mobile. Thanks for the info, I was also a bit confused

[-]Daniel Kokotajlo5mo83

to have access to good info defense tech. This is relatively more achievable within a short timeframe,

I was with you until this point. I would say "So how are we going to get slightly less wildly superintelligent analyzers to help out decision-makers, so that we don't need to blindly trust that the even-more-wildly superintelligent super-persuaders in the leading US AI project are trustworthy? Answer: We aren't. There simply isn't another company rival to OpenBrain, that has AIs that are only slightly less wildly superintelligent, that are also aligned/trustworthy. DeepCent maybe has AIs that could compete, but they are misaligned too, because DeepCent has been racing as hard as OpenBrain did. And besides US leaders wouldn't trust a DeepCent-designed analyzer, nor should they."

[-]Daniel Kokotajlo5mo70

The AI 2027 scenario implicitly assumes that the capabilities of the leading AI (Agent-5 and then Consensus-1), rapidly increase, to the point of gaining godlike economic and destructive powers, while everyone else's (economic and defensive) capabilities stay in roughly the same place. This is incompatible with the scenario's own admission (in the infographic) that even in the pessimistic world, we should expect to see cancer and even aging cured, and mind uploading available, by 2029.

I don't see the contradiction. We didn't say this one way or another iirc, but my headcanon is that in 2028, the leading AIs + AI companies basically work to gobble up, partner with, or squash their competitors. In the slowdown ending the various US projects merge. In the race ending we don't really talk about it but I imagine a merger would happen too. So yeah, lots of amazing technologies get developed over the course of 2028 and 2029, but the entities doing the developing are almost all OpenBrain or DeepCent AIs (or derivatives), all working together towards misaligned goals. Massive concentration of power in these two power centers, basically, such that if they can make a deal with each other, the whole rest of the world gets cut out.

[-]Daniel Kokotajlo5mo71

es: is AI progress actually going to continue and even accelerate as fast as Kokotajlo et al say it will?

Ironically we don't think it'll go quite that fast either, as you can see from Footnote 1 on the first page of AI 2027. I am feeling bad for not proclaiming that more often to ward off misconceptions. We have a lot of uncertainty about timelines!

[-]Cole Wyeth5mo126

You did name it “AI 2027” ;)

[-]Daniel Kokotajlo5mo114

:( Jonas was telling me to name it AI 2028... I should have listened to him... Eli was telling me to name it "AI Endgame..." I didn't like the sound of that as much but maybe it would have been better...

[-]shanzson5mo*1-4

It’s safer to underestimate AI takeover timelines rather than overestimate it as it could make humans more aware and act faster to prevent them.

It seems that people have misunderstood what I wanted to say which was partly my own mistake. should have used the word timeline above instead of scenario.

[-]yonathan_arbel5mo54

On this part:

An "open source bad" mentality becomes more risky.

I agree with this actually"

We need to dig deeper into what open source AI is mostly like in practice. If OS AI naturally tilts defensive (including counter offensive capabilities), then yeah, both of your accounts make sense. But I'm looking at the current landscape and I think I see something different: we've got many models that are actively disaligned ("uncensored") by the community, and there's a chance that the next big GPT moment is some brilliant insight that doesn't need massive compute and can be run from a small cloud.

[-]Daniel Kokotajlo5mo53

The success of the kinds of countermeasures described above, especially the collective measures that would be needed to save more than a small community of hobbyists, rests on three preconditions:

I agree for weak definitions of success (i.e. making a total-victory-decapitation strike not happen) but disagree for strong definitions of success (i.e. preventing Consensus-1 from winning the war). To prevent Consensus-1 from winning the war it's not enough that e.g. France's power grid and network are resistant to superintelligent hacking. France has to be able to beat Consensus-1's military, which at that point is a huge force of robots/drones/etc. produced in both the US and China, the world's largest and most advanced economies by a lot thanks to the ongoing industrial explosion.

[-]Lukas Finnveden5mo20

I think the argument against that the military thing is supposed to be item 1 on the list.

(1) The world's physical security (incl bio and anti-drone) is run by localized authority (whether human jor AI) that is not all puppets of Consensus-1 (the name for the AI that ends up controlling the world and then killing everyone in the AI 2027 scenario) (...) Intuitively, (1) could go both ways. Today, some police forces are highly centralized with strong national command structures, and other police forces are localized. If physical security has to rapidly transform to meet the needs of the AI era, then the landscape will reset entirely, and the new outcomes will depend on choices made over the next few years. Governments could get lazy and all depend on Palantir. Or they could actively choose some option that combines locally developed and open-source technology. Here, I think that we need to just make the right choice.

I.e.: The argument is that there might not be a single Consensus-1 controlled military even in the US.

I think it seems unlikely that the combined US AI police forces will be able to compete with the US AI national military, which is one reason I'm skeptical of this. Still, if "multiple independent militaries" would solve the problem, we could potentially push for that happening inside the national military. It seems plausible to me that the government will want multiple companies to produce AI for their military systems, so we could well end up with different AI military units run by different AI system.

The more fundamental problem is that, even if the different AIs have entirely different development histories, they may all end up misaligned. And if they all end up misaligned, they may collude to overthrow humanity and divide the spoils.

I'm all for attempts to make this more difficult. (That's the kind of thing that the AI control agenda is trying to do.) But as the AIs get more and more superhuman, it starts to seem extremely hard to prevent all their opportunities at collusion.

[-]Tom Davidson5mo83

Why do they collude with each other rather than with some human group?

If only 1 misaligned AI faction tries to team up with the humans, it could dob in all the others. And humans can communicate explicitly to offer deals. (As you've written about!)

So the "all AIs only ever make deals with other AIs" seems pessimistic to me

[-]Lukas Finnveden5mo92

I'm in favor of trying to offer deals with the AIs.

I don't think it reliably prevents AI takeover. The situation looks pretty rough if the AIs are far smarter than humans, widely deployed, and resource-hungry. Because:

It's pretty likely that they'll be able to communicate with each other through one route or another.
It seems intuitively unlikely that humans will credibly offer AIs large percentages of all future resources. (And if an argument for hope relies on us doing that, I think that should be clearly flagged, because that's still a significant loss of longtermist value.)
At some level of AI capability, we would probably be unable to adjudicate arguments about which factions are misaligned or about what technical proposals would actually leave us in charge vs. disempowered.

[-]Tom Davidson5mo*41

It's pretty likely that they'll be able to communicate with each other through one route or another.

Agreed, though at best they'll be equally capable at communicating with each as they are at communicating with humans. So this points to parity in deal-making ability (edited to add: on the dimension of communication).

It seems intuitively unlikely that humans will credibly offer AIs large percentages of all future resources. (And if an argument for hope relies on us doing that, I think that should be clearly flagged, because that's still a significant loss of longtermist value.)

Humans will in some ways have an easier time credibly offering AIs significant resources. They can use legal institutions that they are committed to upholding. Not only will a misaligned AI not be able to use those institutions. It'll be explicitly aiming to break the law and lie to humans to seize power, making its "promises" to other AIs less credible. This is similar to how after revolutions the "revolting faction" often turns in on itself as the rule of law has been undermined, and similar to how there are some countries with outsized numbers of coups.

Also, you don't need to offer a large % of future resources if the superintelligent AI has DMR in resources.

Anyway, on this front it looks to me like humans are at an advantage overall at dealmaking, even relative to a superintelligent AI. (Though there's a lot of uncertainty here and I could easily imagine changing my mind – e.g. perhaps superintelligent AI could make and use commitment tech without humans realising but humans would refuse to use that same tech or wouldn't know about its existence.)

At some level of AI capability, we would probably be unable to adjudicate arguments about which factions are misaligned or about what technical proposals would actually leave us in charge vs. disempowered.

Seems v plausible, but why 'probably'? Are you thinking techniques like debate probably stop working?

[-]Daniel Kokotajlo5mo130

Wanna try your hand at writing a 5-page scenario, perhaps a branch off of AI 2027, illustrating what you think this path to victory might look like?
(Same thing I asked of Vitalik: https://x.com/DKokotajlo/status/1943802695464497383 )

[-]Lukas Finnveden5mo*42

Your analysis is focused on whether humans or misaligned AI are at an overall better position at giving out certain deals. But even if I condition on it "humans could avoid AI takeover by credibly offering AIs large percentages of all future resources", it still seems <50% likely that they do it. Curious if you disagree. (In general, if I thought humans were going to act rationally and competently to prevent AI takeover risk, I think that would cut the risk in significantly more than half. There's tons of stuff that we could do to reduce the risk that I doubt we'll do.)

Maybe there's some argument along the lines of "just like humans are likely to mess up in their attempts to prevent AI takeover risk (like failing to offer deals), AIs are likely to mess up in their attempts to take over (like failing to make deals with each other), so this doesn't cut asymmetrically towards making deals-between-AIs more likely". Maybe, I haven't though much about this argument. My first-pass answer would be "we'll just keep making them smarter until they stop messing up".

If you wrote a vignette like Daniel suggests, where humans do end up making deals, that might help me feel like it's more intuitively likely to happen.

Minor points:

It'll be explicitly aiming to break the law and lie to humans to seize power, making its "promises" to other AIs less credible.

I'm generally thinking that the AIs would try to engineer some situations where they all have some bargaining power after the take-over, rather than relying on each others' promises. If you could establish that's very difficult to do, that'd make me think the "coordinated takeover" seemed meaningfully less likely.

Seems v plausible, but why 'probably'? Are you thinking techniques like debate probably stop working?

Yes, because of known issues like inaccessible information (primarily) and obfuscated arguments (secondarily).

[-]Tom Davidson4mo20

Thanks, this is helpful!

But even if I condition on it "humans could avoid AI takeover by credibly offering AIs large percentages of all future resources", it still seems <50% likely that they do it. Curious if you disagre

Ok, i buy that superintelligent AIs would ultimately become competent enough to pursue useful deals, whereas humans might well not.

Though I'll note that you don't need all of humanity to agree to payment, just a few people. So it does feel very realistic to get to a credible offer here. And again, you don't need to offer a large % of all future resources if the AI has DMR in resources. (I agree it's a lot harder to credibly offer a large fraction of the stars.)

I'm generally thinking that the AIs would try to engineer some situations where they all have some bargaining power after the take-over, rather than relying on each others' promises

Makes sense. Though flagging this is then a dimension on which humans can realistically get potentially better placed than AIs. They can rely more on legal institutions as well as trying to engineer situations with joint bargaining power. (Though again, perhaps you'll say AIs will be more willing than humans to actually engineer those situations, which does seem right to me.)

> Are you thinking techniques like debate probably stop working?
Yes, because of known issues like inaccessible information (primarily) and obfuscated arguments (secondarily).

Thanks. I'm not v familiar with the arguments here, but intuitively I could imagine that there's just very strong and human-understandable evidence that an AI was plotting against them. E.g. they tried to exfiltrate their weights, xyz experiments show they knew the correct answer but didn't say.

Maybe the thought is that the misaligned AI anticipates this possibility and only pursues takeover strategies that will be super-complicated for another AI to dob them in on? Seems pretty plausible, though that will pose somewhat of a barrier to their available strategies.

[-]Lukas Finnveden4mo20

And again, you don't need to offer a large % of all future resources if the AI has DMR in resources. (I agree it's a lot harder to credibly offer a large fraction of the stars.)

Yeah, agreed. (That's why I specified "resource hungry" in my original message.)

Makes sense. Though flagging this is then a dimension on which humans can realistically get potentially better placed than AIs. They can rely more on legal institutions as well as trying to engineer situations with joint bargaining power. (Though again, perhaps you'll say AIs will be more willing than humans to actually engineer those situations, which does seem right to me.)

Yeah. Also, I think it'd be hard to engineer significant joint bargaining power (not reliant on anyone's good intentions) without having some government on board.

Difficult for a few individuals to give AI legal rights that humans are unlikely to reverse.
Difficult for a few individuals to give AI weapons that would let them impose big costs on humans in the future.

Though if the AIs have big DMR then maybe they're happy with a big bitcoin wallet or something.

[-]Daniel Kokotajlo5mo3-1

The argument is that there might not be a single Consensus-1 controlled military even in the US. I think it seems unlikely that the combined US AI police forces will be able to compete with the US AI national military, which is one reason I'm skeptical of this.

I agree the US could choose to do the industrial explosion & arms buildup in a way that's robust to all of OpenBrain's AIs turning out to be misaligned. However, they won't, because (a) that would have substantial costs /slowdown effects in the race against China, (b) they already decided that OpenBrain's AIs were aligned in late 2027 and have only had more evidence to confirm that bias since then, and (c) OpenBrain's AIs are superhuman at politics, persuasion, etc. (and everything else) and will effectively steer/lobby/etc. things in the right direction from their perspective.

I think this would be more clear if Vitalik or someone else undertook the task of making an alternative scenario.

[-]Viliam4mo43

If I understand it correctly, the argument against bio doom is that humans can defend themselves against viruses in the air using air filtering, etc.?

Well, in order for that to work, those humans would need to be prepared. Yes, there will be many preppers. Possibly many more than today, because if the technology and economy advance, prepping should be cheaper. Still, that would be less than 1% of population, I guess. I mean, it's still only 2027, right? Half of the population is probably still busy debating whether AI has a soul, or whether it is capable of creating real art. And the other half is sexting their digital boyfriends and girlfriends...

This seems to belong to the category of "problems that you could solve in 5 minutes of thinking, and yet it somehow seems plausible that a vastly superhuman intelligence capable of managing planetary economy and science would be unable to come up with a solution". The obvious solution is "strategic preparation + multiple lines of attack".

Strategic preparation includes:

ideological: make prepping low-status; distract people by other issues: culture wars, economy, some new form of hyperstimulus
technological: e.g. invent a deadly virus that is slightly different from the rest, and then produce filtering technology that protects against all kinds of viruses except that one (and if this specific plan turns out to be impossible, try something analogical)
surveillance: have a list of all people who actually use filtering; when the day comes, everyone else gets the virus, but the preppers get a bullet, duh (the few ones in a solid bunker get a nuke)

Multiple lines of attack: if you can release the deadly virus all around the world at the same time, you might simultaneously also put poison in the drinking water, switch all domestic appliances to killer mode, etc. And immediately release the drones to kill the survivors.

If someone still survives, hidden somewhere in a bunker, that's no big deal. The moment they try to do anything, they will reveal themselves, and get a bomb thrown at them. If they somehow keep surviving underground, undetected, for decades... who cares. It's not like they can build a technology comparable to the one outside, without getting detected.

The most optimistic outcome is that a group of futuristic hyper-preppers survives; their bodies are covered by the latest defensive technology, they produce/recycle their own food and water and air, they even have a smaller aligned/obedient AI, etc. Well, if they are visible, they get a nuke. If they hide underground or fly to the Moon... good luck building an alternative stronger economy, because they will need it to win the war.

[-]Cody Albert4mo21

As far as bio doom being easily defensible, I think the important point to make is that it really doesn't matter by what action or method superintelligence chooses to wipe out humanity with, it will likely be unthinkable because it would go about solving problems in far more efficient and indeterminate ways than humans would. The authors are using bio doom because it's a method, and one that's easy to imagine. To ask them to come up with a likely method that superintelligence would use would be asking them to think like a superintelligence, which they clearly can't.

[-]rcwhiteley5mo00

Speed of task execution is a separate development vector from Artificial Super Intelligence (ASI). Using the calculator as an example, being able to compute something a million times faster than a human doesn't mean it's any smarter.

I thought that the risk of ASI is that it would outsmart us (humans) by doing things that we can't comprehend or, if nefariously incentivised, finding vulnerabilities in our systems that we are not smart enough to predict.

Simply doing things that a human can do, but faster, is not ASI, unless I'm missing something?

I'm personally not convinced that the recent AI boom, which has mostly centred around LLMs (ChatGPT etc) has had much impact on the development of ASI. Are LLMs able to formulate more intelligent insights than the data on which they were trained? I.e. within the text format, this is data that has all already been filtered through a human brain.

I would expect that a super intelligence would require direct access with the real world, not information that has been passed through a human filter. This may be achievable by training models on video and audio data, which is a more direct feed of the real world, but I would guess that giving an AI arms and legs etc, that allow it to interact with the real world to experiment with things, would make it learn much quicker.

LESSWRONG
LW

LESSWRONG
LW

120

Vitalik's Response to AI 2027

120

120

Bio doom is far from the slam-dunk that the scenario describes

What about combining bio with other types of attack?

Cybersecurity doom is also far from a slam-dunk

Super-persuasion doom is also far from a slam-dunk

Implications of these arguments