Thanks for writing this.
I agree with your logic that shutdown is easier than controlled takeoff, but I think controlled takeoff is much more viable.
I see three major blockers to full shutdown:
Given all of that, I'd expect covert or overt government projects to continue even if we get a treaty banning private research. (Which I do find quite possible; I expect scary demos to happen naturally).
There's one set of experts saying alignment is almost impossible, and another group of experts saying it's probably do-able as long as we aren't dumb about it. That's not a rational reason for a shutdown if you're not longtermist. (edit: - and older, like most decision-makers, so shutdown probably means you personally die).
Full shutdown, or surviving without it, seem to share a critical-path component. Conceptually clarifying and communicating about alignment difficulty on the current path seem necessary for either route. That's what it would take to shift expert opinions enough to get a full shutdown, and it would help the odds of solving alignment in time if we don't get a full shutdown. So I'd love to see more effort in that direction (like we got with your The Title is Reasonable post!)
I really hope I'm missing something! I'll continue advocating for shutdown as a means to any viable slowdown, but I'd love to have some genuine hope for it!
The only way I can see to prevent some programs continuiing is if experts were to unify behind "alignment is highly ulikely to succeed using current methods". If everyone with a clue said "we'll probably die if we try it any time soon", that might be enough to dissuade selfish decision-makers. This would require some amazing communication and clarification. If we take Paul Christiano as representative of expert optimists, he had p(doom) from misalignment at around 20% (and another ~20% of other catastrophes following soon... but those seem separable). Those are decent odds if you only care about yourself and your loved ones.
Lots of people would jump at the chance to gamble the entire future against their own immortality on those odds. If that's representative of people in the labs, I don't see how we prevent those in power from gambling the future.
And people like Christiano and Shah clearly do understand the problem. So shifting their odds dramatically seems like it would take some breakthrough in conceptual clarification of the problem, or communication, or more likely, both.
The improvements in undestanding and communicating the difficulty of the alignment problem seem like critical-path for a global shutdown or even slowdown, and also for attempting alignment in worlds where those don't happen. That's why my efforts are going there.
(Given the logic that shutdown is highly unlikely, my best-case hope is that the international treaty agree that a very few teams, probably one US and one Chinese, proceed, while all others are banned (enforcement would happen like any other treaty). Such projects would ideally be public, and communicate with each other on alignment risks and achievements. The idea would be that everyone agrees to roughly split the enormously growing pie, and to share generously with the rest of the world.
Given that logic, it seems inevitable to me that at least one or two projects push ahead fairly quickly, even in a best-case scenario. That's why my efforts focus on that scenario, where we try alignment on roughly the current path.
Having raised this question, let me state clearly: I think we should shut it all down. I state this publicly and will continue to do so. I think alignment is probably hard and the odds of achieving it under race conditions are not good. If we get a substantial slowdown that probably means I'll personally die, and I'd take that trade to improve humanity's odds. But I'm probably more utilitarian and longtermist than 99% of humanity.
So: Am I missing something?
(edited for clarity just after first posting)
Thank you for your high-quality engagement on this and for including the clear statement!
I think my most substantial disagreement with you on the difficulty of a shutdown is related to longtermism. Most normal people would not take a 5% risk of destroying the world in order to greatly improve their lives and the lives of their children. That isn't because they are longtermist, but primarily because they are simply horrified by the concept of destroying the world.
It is in fact almost entirely utilitarians who are in favor of taking that risk, because they are able to justify it to themselves after doing some simplified calculation. Ordinary people, rational or irrational, who just want good things for themselves and their kids usually don't want to risk their own lives, certainly don't want to risk their kids' lives, and it wouldn't cross their mind to risk other people's kids' lives, when put in stark terms.
"Human civilization should not be made to collapse in the next few decades" and "humanity should survive for a good long while" are longtermist positions, but they are also what >90% of people in every nation on earth already believe.
Most normal people would not take a 5% risk of destroying the world in order to greatly improve their lives and the lives of their children.
Polls suggest that most normal people expect AGI to be bad for them and they don't want it. I'm more speculating here, but I think the typical expectation is something like "AGI will put me out of a job; billionaires will get even richer and I'll get nothing."
This isn't terribly decision-relevant except for deciding what type of alignment work to do. But that does seem nontrival. My bottom line is: push for pause/slowdown, but don't get overoptimistic. Simultaneously work toward alignment on the current path, as fast as possible, because that might well be our only chance.
To your point:
I take your point on the standard reasoning. I agree that most adults would turn down even a 95-5%, 19 vs 1 odds of improving their lives with a small chance of destroying the world.
But I'm afraid those with decision-making power would take far worse odds in private, where it matters. That's because for them, it's not just a better life; it's immortality vs. dying soon. And that tends to change decision-making.
I added and marked an edit to make this part of the logic explicit.
Most humans with decision-making power, e.g. in goverment, are 50+ years old, and mostly older since power tends to accumulate at least until sharp cognitive declines set in. There's a pretty good chance they will die of natural causes if ASI isn't created to do groundbreaking medical research within their lifetimes.
That's on top of any actual nationalistic tendencies, or fears of being killed, enslaved, tortured, or worse, mocked, by losing the race to one's political enemies covertly pursuing ASI.
And that's on top of worrying that sociopaths (or similarly cold/selfish decision-makers) are over-represented in the halls of power. Those arguments seem pretty strong to me, too.
How this would unfold is highly unclear to me. I think it's important to develop gears-level models of how these processes might happen, as Raemon suggests in this post.
My guess is that covert programs are between likely and inevitable. Public pressure will be for caution; private and powerful opinions will be much harder to predict.
As for the public statements, it works just fine to say, or more likely convince yourself, that you think aligment is solvable with little chance of failure, and that patriotism and horror over (insert enemy ideology here) controlling the future are your motivations.
Lots of people would jump at the chance to gamble the entire future against their own immortality on those odds.
This would assume that those people are also convinced that something like radical life extension is possible in principle, and that more advanced AI would be required for delivering it.
I have no idea for how many people that is true. Many people dismiss suggestions of radical life extension with the same reflexive "that's sci-fi" reflex that AI x-risk scenarios get. Even if they became convinced of AI, life extension might stay in that category.
And if they did get convinced of its possibility, the most likely scenario I could see would be if advances in more narrow AI had already delivered proofs of concept. You could imagine it being solved by just something like extensive biological modeling tools that were more developed than what we have today, but did not yet cross the threshold to transformative AI.
It seems to me that believing ASI can kill you and believing ASI can save you are both pretty directly downstream of believing in ASI at all. Since the premise is that everyone believes pretty strongly in the possibility of doom, it seems they'd mostly get there by believing in ASI and would mostly also believe in the upside potentials too.
Yes. But because we're discussing a scenario in which the world is ready to slow down or shut down AGI research, I'm assuming those steps have been crossed.
The biggest step IMO, "alignment is hard" doesn't intervene between taking ASI seriously and thinking it could prevent you from dying of natural causes.
Given all of that, I'd expect covert or overt government projects to continue even if we get a treaty banning private research. (Which I do find quite possible; I expect scary demos to happen naturally).
Why is this special to Shutdown, vs Controlled Takeoff? (Here, I'm specifically comparing two plans that both route through "first, do a pretty difficult political action of get countries agreeing to centralize GPUs"). If you just expect people to defect from that, what's the point?
The scenario I'm thinking of mostly is the overt version, which is controlled takeoff. A few governments pursue controlled takeoff projects. This is hopefully done in the open and relatively collaboratively across US and Chinese teams. I'd assume they'd recruit from existing teams, effectively consolidating labs under government control. I'd hope that the Western and Chinese teams would agree to share their results on capabilities as well as alignment, although of course they'd worry about defection on that, too.
If they did it covertly that would be defecting. I haven't thought about that scenario as much.
This scenario doesn't seem like it would require consolidating GPUs, just monitoring their usage to some degree. It seems like it would be a lot easier to not make that part of the treaty.
The post is specifically about "globally controlled takeoff", in which multiple governments have agreed to locate their GPUs in locations that are easy for each other to inspect.
There's a spectrum between "Literally all countries agree to consolidate and monitor compute", "US/UK/China do it", "US/UK/Europe agree to do it among themselves", "US does it just for itself" and "individual orgs are just being idk a bit careful and a bit cooperative in an ad-hoc fashion."
I call the latter end of the spectrum "ad hoc semi-controlled semi-slowed takeof" at the beginning of the post. If we get something somewhere in the middle, seems probably an improvement.
I thought I was addressing the premise of your post: the world is ready to do serious restrictions on AI research: do they do shutdown or controlled takeoff?
I guess maybe I'm missing what's important about the physical consolidation vs other methods of inspection and enforcement.
I think my scenario conforms to all of the gears you mention. It could be seen as adding another gear: the incentives/psychologies of government decision-makers.
That's not a rational reason for a shutdown if you're not longtermist. (edit: - and older, like most decision-makers, so shutdown probably means you personally die).
This reads as if 'longtermism' and 'not caring at all about future generations or people who would outlive you' are the only possibilities.
Those are decent odds if you only care about yourself and your loved ones.
This assumes none of your loved ones are younger than you.
If someone believes a pause would meaningfully reduce extinction risk but also reduce their chance of personal immortality, they don't have to be a 'longtermist' (or utilitarian, altruist, scope-insensitive, etc) to prefer to pause, just care enough about some posterity.
(This isn't a claim about whether decision-makers do or don't have the preferences you're ascribing. I'm saying the dichotomy between those preferences and 'longtermism' is false, and also (like Haiku's sibling comment) I don't think they describe most humans even though 'longtermism' doesn't either, and this is important.)
Good points; I agree with all of them.
It's hard to know how to weigh them.
My mental model does include those gradients even though I expressed them as categories.
I currently think it's all too likely that decision-makers would accept a 20% or more chance of extinction in exchange for the benefits.
One route is to make better guesses about what happens by default. The other is to try to create better decisions by spreading the relevant logic.
Those who want to gamble will certainly push the "it should be fine and we need to do it!" logic. The two sets of beliefs will probably develop symbiotically. It's hard to separate emotional from rational reasons for beliefs.
It looks to me like people automatically convince themselves that what they want to do emotionally is also the logical thing to do. See Motivated reasoning/confirmation bias as the most important cognitive bias for a brief discussion.
Based on that logic, I actually think human cognitive biases and cognitive limitations are the biggest challenge to surviving ASI. We're silly creatures with a spark of reason.
I think there are just very few people for whom this is a compelling argument. I don't think government are coming anywhere close to explicitly making this calculation. I think some people in labs are maybe making this decision but they aren't actually the target audience for this.
I agree that governments aren't coming anywhere close to making this calculation at this point. I mean they very well might once they've actually thought about the issue. I think it will depend a lot on their collective distribution of p(doom). I'd think they'd push ahead if they could convince themselves it was lower than maybe 20% or thereabouts. I'd love to be convinced they would be more cautious.
Of course I think it very much depends on who is in the relevant governments at that time. I think that the issue could play a large role in elections, and that might help a lot.
I think I don't (automatically) see the dots connected here of how the linked section translates into something I'd call "US-led entente" (I think I maybe get it based on vague recollections of Dario writings or something, but, would appreciate you spelling it out more)
See footnote 11. One-sentence version: US and allies enforce control on hardware, domestically and abroad, and there's carrots for cooperating and large sticks for not cooperating. Beyond that, not worth getting into / it would take me a long time to articulate something helpful. But happy to chat live, e.g. call me tomorrow.
(I think the thing you're calling US-led entente is something towards the latter end of the "Globally Controlled Takeof --> Ad hoc semi-controlled Takeoff" spectrum in this comment, does that sound right?)
IMO, a big appeal of controlled takeoff is that, if successful, it slows down all of takeoff.
Whereas a global shut down, that might have happened at a time before we had great automated alignment research, and that might incidentally ban a lot of safety research as well… might just end some number of years later, whereupon we might quickly go through the remainder of takeoff, and incur similarly much risk as without the shutdown.
(Things that can cause a shutdown to end: elections or deaths swap out who rules countries, geopolitical power shifts, verification becoming harder as it becomes more plausible that ppl could invest a lot to develop and hide compute and data centers where they can’t be seen, and maybe as AI software efficiency advances using smaller scale experiments that were hard to ban.)
Successful controlled takeoff definitely seems more likely to me than ”shutdown so long that intelligence augmented humans have time to grow up”, and also more likely than ”shutdown so long that we can solve superintelligence alignment up front without having very smart models to help us or to experiment with”.
Short shutdown to do some prep before controlled takeoff seems reasonable.
Edit: I guess technically, some very mildly intelligence augmented humans (via embryo selection) are already being born, and they have a decent chance to grow up before superintelligence even without shutdown. I was thinking about intelligence augmentation that was good enough to significantly reduce x-risk. (Though I'm not sure how long people expect that to take.)
For those who believe that a global shutdown of AGI R&D is next to impossible, or much more difficult to succeed at than a different plan:
Something I think is very important, and which I would be very grateful for, is if you can consistently signal that you would prefer a shutdown if you thought it was feasible.
There are many people who are making an argument against a shutdown, who would genuinely prefer AGI/ASI to be created in the near future and who want to prevent it from being shut down. If that does not describe you, please make it clear in your various communications, especially external communications, that shutting it all down would be best, and you are merely pessimistic that that can be done.
The situation is not symmetrical. A citizen or policymaker hearing "shut it down" who would otherwise want AI to proceed with caution moves in the direction of more caution. A citizen or policymaker hearing "proceed with caution" who would otherwise want AI to be shut down moves in the direction of less caution. Nonetheless, many people who advocate for a shutdown do say that they would like for superintelligent AI to eventually be created, and they simply see other plans as woefully unlikely to succeed.
I fully agree and have been pleased to see this logic clarified in the recent discussions of IABIED. We must choose where to put our primary efforts, but those like me who think alignment might be achievable on the fast path should still say we'd prefer shutdown if at all possible. I will not only continue to say that, but try to share this logic broadly. Pushing hard for caution of any sort will on average improve our odds. I don't think we can get a shutdown (see other comment) but I'll still state clearly that we should shut it all down.
It only takes a moment to say "well of course I think we'd shut it all down if we were wise, but assuming we don't, here are my plans and hopes...."
As an anecdote, a few months ago I met a former MIRI researcher while handing out flyers for PauseAI. We had a great conversation, and they were very concerned about x-risk.
When I asked if they would be willing to sign PauseAI US's petition, they declined, stating that they don't think a shutdown is feasible. I was very confused by this, because for those who are concerned about x-risk, I do not see a strong relationship between the feasibility of a shutdown and whether it is a good idea to advocate for one.
To expand on my point about asymmetry: If the shutdown plan fails, then we are undertaking one of the other plans. If the other plans fail, we die (with unacceptably high probability). Accidentally getting a shutdown when you meant to proceed with caution is a win condition, at least temporarily, and it allows for many other plans to be improved and enacted. Accidentally proceeding with caution when you meant to get a shutdown is walking on thin an ice. The distance from shutdown to death is larger than the distance from proceeding with caution to death.
To address another viewpoint: If you are concerned about x-risk and you believe that all effort going toward advocating for a shutdown is wasted, and that the world would be better off if no one talked about a shutdown, I think you're simply confused about normal social and political dynamics.
But I think there are various flavors of Sufficiently Scary Demos that will make the threat much more salient without needing to route through abstract arguments.
I suspect that the most plausible SSD would be a rogue AI replicating in the wild, as proposed by Alvin Anestrand. This AI-2027 fanwork has open-sourced AIs become able to replicate because someone will release a capable model. Then, according to the fanwork, this wilderness is infested by Agent-2, then Agent-4. Agent-4 continues the research in order to create Agent-5 and succeeds, obtaining a part of the lightcone.
My worse-case modification is that early open-sourced AIs will somehow have enough agency to work on the analogue of Agent-5 or Consensus-1. This worse-case scenario would be able to prompt the AI companies to race instead of slowing down, leading to a fiasco.
I think Sufficiently Scary Demos need to do something that
a) directly, clearly is capable of threatening specific world leaders from multiple nations at once, in a way that is viscerally salient to them specifically
b) but, you don't end up going to jail for it (i.e. something like the difference between "a really good prank" and "actually hurting someone")
c) ideally, relies as little as possible on general intelligence, as opposed to extremely powerful narrow stuff (with just enough general intelligent agency there to demonstrate that this is scary because it can be self-directed, as opposed to just being a weapon you want to make sure you control)
Two somewhat different plans for buying time and improving AI outcomes are: "Global Shutdown" and "Global Controlled Takeoff."
(Some other plans some people believe in include "ad hoc semi-controlled semi-slowed takeoff" and "race, then burn the lead on either superalignment or scary demos" and "decentralized differential defensive tech world.". I mostly don't expect those to work, but am mostly not talking about them in this post.)
"Global Shutdown" and "Global Controlled Takeoff" both include an early step of "consolidate all GPUs and similar chips into locations that can be easily monitored."
The Shut Down plan then says things like "you cannot do any frontier development with the consolidated GPUs" (maybe you can use GPUs to run existing models that seem pretty safe, depends on implimentation details). Also, maybe, any research into new algorithms needs to be approved by an international org, and frontier algorithm development is illegal. (This is maybe hard to enforce, but, it is might dramatically reduce the amount of R&D that goes into it, since you can't be a billion dollar company who straightforwardly pours tons of resources into it without going to jail)
Controlled Takeoff says instead (as I currently understand advocates to advocate) something like "Frontier research continues, slowly, carefully, leveraging frontier controlled AI to do a ton of alignment research."
I'm generally pro "Shut It Down", but I also think Global Controlled Takeoff is much better than the status quo (both because it seems better in isolation, and because achieving it makes Shut Down easier), and I see some of the appeal depending on your exact beliefs.
But, some notes on strategy here.
A lot of AI safety arguments boil down to "what seems least impossible?". Is it more impossible to get a Global Shutdown, or to solve safe superintelligence with anything remotely like our current understanding or the understanding we're likely to get over the next 5-10 years?
I've heard a number of people say flatly "you're not going to get a global shut down", with a tone of finality that sounds like they think this is basically impossible.
I'm not entirely sure I've correctly tracked which people are saying which things and whether I'm accidentally conflating statements from different people. But I think I've heard at least some people say "you're not getting a shut down" with that tone, who nonetheless advocate for controlled takeoff.
I certainly agree getting a global shutdown is very hard. But it's not obvious to me that getting a global controlled takeoff is much easier.
Two gears I want to make sure people are tracking:
Gear 1: "Consolidate and monitor the GPUs" is a huge political lift, regardless.
By the time you've gotten various world powers and corporations to do this extremely major, expensive action, I think something has significantly changed about the political landscape. I don't see how you'd get it without world leaders taking AI more "fundamentally seriously", in a way that would make other expensive plans a lot more tractable.
Gear 2: "You need to compare the tractability of Global Shut Down vs Global Controlled Takeoff That Actually Works, as opposed to Something That Looks Close To But Not Actually A Controlled Takeoff."
Along with Gear 3: "Shut it down" is much simpler than "Controlled Takeoff."
A Global Controlled Takeoff That Works has a lot of moving parts.
You need the international agreement to be capable of making any kind of sensible distinctions between safe and unsafe training runs, or even "marginally safer" vs "marginally less safe" training runs.
You need the international agreement to not turn into molochian regulatory-captured horror that perversely reverses the intent of the agreement and creates a class of bureaucrats who don't know anything about AI and use the agreement to dole out favors.
These problems still exist in some versions of Shut It Down too, to be clear (if you're trying to also ban algorithmic research – a lot of versions of that seem like they leave room to argue about whether agent foundations or interpretability count). But, they at least get coupled with "no large training runs, period."
I think "guys, everyone just stop" is a way easier schelling point to coordinate around, than "everyone, we're going to slow down and try to figure out alignment as best we can using current techniques."
So, I am not currently convinced that Global Controlled Takeoff That Actually Works is any more politically tractable than Global Shut Down.
(Caveat: Insofar as your plan is "well, we will totally get a molochian moral maze horror, but, it'll generally move slower and that buys time", eh, okay, seems reasonable. But, at least be clear to yourself about what you're aiming for)
Gear 4: Removing pressure to accelerate is valuable for the epistemics of the people doing the AI-assisted alignment (if you're trying that).
One reason I think the Anthropic plan is actively bad, instead of "at-least-okay-ish," is that (given how hard they seem to actively oppose any kind of serious regulation that would slow them down), they seem intent on remaining in a world where, while they are supposedly working on aligning the next generation of AI, they have constant economic pressure to ship the next thing soon.
I believe, maybe, you can leverage AI to help you align AI.
I am pretty confident that at least some of the tools you need to navigate aligning unbounded superintelligence (or confidently avoiding creating unbounded superintelligence,) involves "precise conceptual reasoning" of a kind Anthropic-et-all seem actively allergic to. (see also behaviorism vs cognitivism. Anthropic culture seems to actively pride itself on empirics and be actively suspicious of attempts to reason ahead without empirics)
I'm not confident that you need that much precise conceptual reasoning / reasoning ahead. (MIRI has an inside view that says this is... not like impossible hard, but, is hard in a fairly deep way that nobody is showing respect for. I don't have a clear inside view about "how hard is it", but I have an inside view that it's harder than Anthropic's revealed actions thinks it is)
I think thinking through this and figuring out whether you need conceptual tools that you aren't currently good at in order to succeed, is very hard, and people are extremely biased about it.
I think the difficult is exacerbated further if your competitor is shipping the next generation of product, and know-in-your-heart that you're reaching ASL danger levels that at least should give you some pause to think about it, but, the evidence isn't clear, and it would be extremely convenient for you and your org if your current level of control/alignment was sufficient to run the next training run.
So a lot of what I care most about with Shutdown/Controlled-Takeoff is making it no longer true that there is an economic incentive to rush ahead. (I think either Shutdown-y or Controlled Takeoff-y can both potentially work for this, if there's actually a trusted third party who is the one that makes calls about whether the next training run is allowed, who has the guns and compute).
Gear 5: Political tractability will change as demos get scarier.
I'm not super thrilled with the "race to the edge, then burn the lead on scary demos" plan (specifically the "racing" part). But, I do think we will get much scarier demos as we approach AGI.
Politicians maybe don't understand abstract arguments (although I think responses to If Anyone Builds It suggests they at least sometimes do). But I think there are various flavors of Sufficiently Scary Demos that will make the threat much more salient without needing to route through abstract arguments.
I think one of the most important things to be preparing for is leveraging Sufficiently Scary Demos when they arrive. I think this includes beginning to argue seriously now for global treaty shaped things and have them on hand so people go "oh, okay, I guess we do need that thing Those Guys Were Talking About After All" instead of just being bewildered.
I won't make a decisive claim that any of the above should be decisive in anyone's decisionmaking. I'm still processing the update that I can't really simulate the entire political disagreement yet and I'm not sure what other gears I'm missing from other people's perspective.
But, these are all individual gears that seem pretty important to me, which I think should be part of other people's overall strategizing.
I have a similar point comparing the feasibility of "Global Shut Down" vs "Decentralized Differentially Defensive Tech world that Actually Works", but, that's a fairly complex and different argument.
to be clear this bias also applies in the MIRI-esque direction. But, they're not the one rushing ahead inventing AGI.