"Shut It Down" is simpler than "Controlled Takeoff"

[-]Lukas Finnveden2mo*104

IMO, a big appeal of controlled takeoff is that, if successful, it slows down all of takeoff.

Whereas a global shut down, that might have happened at a time before we had great automated alignment research, and that might incidentally ban a lot of safety research as well… might just end some number of years later, whereupon we might quickly go through the remainder of takeoff, and incur similarly much risk as without the shutdown.

(Things that can cause a shutdown to end: elections or deaths swap out who rules countries, geopolitical power shifts, verification becoming harder as it becomes more plausible that ppl could invest a lot to develop and hide compute and data centers where they can’t be seen, and maybe as AI software efficiency advances using smaller scale experiments that were hard to ban.)

Successful controlled takeoff definitely seems more likely to me than ”shutdown so long that intelligence augmented humans have time to grow up”, and also more likely than ”shutdown so long that we can solve superintelligence alignment up front without having very smart models to help us or to experiment with”.

Short shutdown to do some prep before controlled takeoff seems reasonable.

Edit: I guess technically, some very mildly intelligence augmented humans (via embryo selection) are already being born, and they have a decent chance to grow up before superintelligence even without shutdown. I was thinking about intelligence augmentation that was good enough to significantly reduce x-risk. (Though I'm not sure how long people expect that to take.)

[-]Zach Stein-Perlman2mo100

Note that there are two different ways to control the compute: global cooperation or US-led entente (I don't have a good link on entente but see here).

[-]Raemon2mo20

I think I don't (automatically) see the dots connected here of how the linked section translates into something I'd call "US-led entente" (I think I maybe get it based on vague recollections of Dario writings or something, but, would appreciate you spelling it out more)

[-]Zach Stein-Perlman2mo60

See footnote 11. One-sentence version: US and allies enforce control on hardware, domestically and abroad, and there's carrots for cooperating and large sticks for not cooperating. Beyond that, not worth getting into / it would take me a long time to articulate something helpful. But happy to chat live, e.g. call me tomorrow.

[-]Raemon2mo20

Nod, that's clear enough for now.

[-]Raemon2mo20

(I think the thing you're calling US-led entente is something towards the latter end of the "Globally Controlled Takeof --> Ad hoc semi-controlled Takeoff" spectrum in this comment, does that sound right?)

[-]Zach Stein-Perlman2mo20

I don't think so. I mean globally controlled takeoff where the US-led-coalition is in charge.

[-]Seth Herd2mo*7-3

Thanks for writing this.

I agree with your logic that shutdown is easier than controlled takeoff, but I think controlled takeoff is much more viable.

I see three major blockers to full shutdown:

Many experts currently think alignment isn't incredibly hard
Most current humans aren't utilitarian nor longtermist
- Not dying personally is a large incentive
Winner controls the future

Given all of that, I'd expect covert or overt government projects to continue even if we get a treaty banning private research. (Which I do find quite possible; I expect scary demos to happen naturally).

There's one set of experts saying alignment is almost impossible, and another group of experts saying it's probably do-able as long as we aren't dumb about it. That's not a rational reason for a shutdown if you're not longtermist. (edit: - and older, like most decision-makers, so shutdown probably means you personally die).

Full shutdown, or surviving without it, seem to share a critical-path component. Conceptually clarifying and communicating about alignment difficulty on the current path seem necessary for either route. That's what it would take to shift expert opinions enough to get a full shutdown, and it would help the odds of solving alignment in time if we don't get a full shutdown. So I'd love to see more effort in that direction (like we got with your The Title is Reasonable post!)

I really hope I'm missing something! I'll continue advocating for shutdown as a means to any viable slowdown, but I'd love to have some genuine hope for it!

The only way I can see to prevent some programs continuiing is if experts were to unify behind "alignment is highly ulikely to succeed using current methods". If everyone with a clue said "we'll probably die if we try it any time soon", that might be enough to dissuade selfish decision-makers. This would require some amazing communication and clarification. If we take Paul Christiano as representative of expert optimists, he had p(doom) from misalignment at around 20% (and another ~20% of other catastrophes following soon... but those seem separable). Those are decent odds if you only care about yourself and your loved ones.

Lots of people would jump at the chance to gamble the entire future against their own immortality on those odds. If that's representative of people in the labs, I don't see how we prevent those in power from gambling the future.

And people like Christiano and Shah clearly do understand the problem. So shifting their odds dramatically seems like it would take some breakthrough in conceptual clarification of the problem, or communication, or more likely, both.

The improvements in undestanding and communicating the difficulty of the alignment problem seem like critical-path for a global shutdown or even slowdown, and also for attempting alignment in worlds where those don't happen. That's why my efforts are going there.

(Given the logic that shutdown is highly unlikely, my best-case hope is that the international treaty agree that a very few teams, probably one US and one Chinese, proceed, while all others are banned (enforcement would happen like any other treaty). Such projects would ideally be public, and communicate with each other on alignment risks and achievements. The idea would be that everyone agrees to roughly split the enormously growing pie, and to share generously with the rest of the world.

Given that logic, it seems inevitable to me that at least one or two projects push ahead fairly quickly, even in a best-case scenario. That's why my efforts focus on that scenario, where we try alignment on roughly the current path.

Having raised this question, let me state clearly: I think we should shut it all down. I state this publicly and will continue to do so. I think alignment is probably hard and the odds of achieving it under race conditions are not good. If we get a substantial slowdown that probably means I'll personally die, and I'd take that trade to improve humanity's odds. But I'm probably more utilitarian and longtermist than 99% of humanity.

So: Am I missing something?

(edited for clarity just after first posting)

[-]Nick_Tarleton2mo61

That's not a rational reason for a shutdown if you're not longtermist. (edit: - and older, like most decision-makers, so shutdown probably means you personally die).

This reads as if 'longtermism' and 'not caring at all about future generations or people who would outlive you' are the only possibilities.

Those are decent odds if you only care about yourself and your loved ones.

This assumes none of your loved ones are younger than you.

If someone believes a pause would meaningfully reduce extinction risk but also reduce their chance of personal immortality, they don't have to be a 'longtermist' (or utilitarian, altruist, scope-insensitive, etc) to prefer to pause, just care enough about some posterity.

(This isn't a claim about whether decision-makers do or don't have the preferences you're ascribing. I'm saying the dichotomy between those preferences and 'longtermism' is false, and also (like Haiku's sibling comment) I don't think they describe most humans even though 'longtermism' doesn't either, and this is important.)

[-]Seth Herd2mo40

Good points; I agree with all of them.

It's hard to know how to weigh them.

My mental model does include those gradients even though I expressed them as categories.

I currently think it's all too likely that decision-makers would accept a 20% or more chance of extinction in exchange for the benefits.

One route is to make better guesses about what happens by default. The other is to try to create better decisions by spreading the relevant logic.

Those who want to gamble will certainly push the "it should be fine and we need to do it!" logic. The two sets of beliefs will probably develop symbiotically. It's hard to separate emotional from rational reasons for beliefs.

It looks to me like people automatically convince themselves that what they want to do emotionally is also the logical thing to do. See Motivated reasoning/confirmation bias as the most important cognitive bias for a brief discussion.

Based on that logic, I actually think human cognitive biases and cognitive limitations are the biggest challenge to surviving ASI. We're silly creatures with a spark of reason.

[-]Raemon2mo40

I think there are just very few people for whom this is a compelling argument. I don't think government are coming anywhere close to explicitly making this calculation. I think some people in labs are maybe making this decision but they aren't actually the target audience for this.

[-]Seth Herd2mo42

I agree that governments aren't coming anywhere close to making this calculation at this point. I mean they very well might once they've actually thought about the issue. I think it will depend a lot on their collective distribution of p(doom). I'd think they'd push ahead if they could convince themselves it was lower than maybe 20% or thereabouts. I'd love to be convinced they would be more cautious.

Of course I think it very much depends on who is in the relevant governments at that time. I think that the issue could play a large role in elections, and that might help a lot.

[-]Kaj_Sotala2mo52

Lots of people would jump at the chance to gamble the entire future against their own immortality on those odds.

This would assume that those people are also convinced that something like radical life extension is possible in principle, and that more advanced AI would be required for delivering it.

I have no idea for how many people that is true. Many people dismiss suggestions of radical life extension with the same reflexive "that's sci-fi" reflex that AI x-risk scenarios get. Even if they became convinced of AI, life extension might stay in that category.

And if they did get convinced of its possibility, the most likely scenario I could see would be if advances in more narrow AI had already delivered proofs of concept. You could imagine it being solved by just something like extensive biological modeling tools that were more developed than what we have today, but did not yet cross the threshold to transformative AI.

[-]Seth Herd2mo42

It seems to me that believing ASI can kill you and believing ASI can save you are both pretty directly downstream of believing in ASI at all. Since the premise is that everyone believes pretty strongly in the possibility of doom, it seems they'd mostly get there by believing in ASI and would mostly also believe in the upside potentials too.

[-]TAG2mo20

There are several intermediate steps in the argument from ASI to doom.

[-]Seth Herd2mo42

Yes. But because we're discussing a scenario in which the world is ready to slow down or shut down AGI research, I'm assuming those steps have been crossed.

The biggest step IMO, "alignment is hard" doesn't intervene between taking ASI seriously and thinking it could prevent you from dying of natural causes.

[-]Haiku2mo512

Thank you for your high-quality engagement on this and for including the clear statement!

I think my most substantial disagreement with you on the difficulty of a shutdown is related to longtermism. Most normal people would not take a 5% risk of destroying the world in order to greatly improve their lives and the lives of their children. That isn't because they are longtermist, but primarily because they are simply horrified by the concept of destroying the world.

It is in fact almost entirely utilitarians who are in favor of taking that risk, because they are able to justify it to themselves after doing some simplified calculation. Ordinary people, rational or irrational, who just want good things for themselves and their kids usually don't want to risk their own lives, certainly don't want to risk their kids' lives, and it wouldn't cross their mind to risk other people's kids' lives, when put in stark terms.

"Human civilization should not be made to collapse in the next few decades" and "humanity should survive for a good long while" are longtermist positions, but they are also what >90% of people in every nation on earth already believe.

[-]MichaelDickens2mo105

Most normal people would not take a 5% risk of destroying the world in order to greatly improve their lives and the lives of their children.

Polls suggest that most normal people expect AGI to be bad for them and they don't want it. I'm more speculating here, but I think the typical expectation is something like "AGI will put me out of a job; billionaires will get even richer and I'll get nothing."

[-]Seth Herd2mo72

This isn't terribly decision-relevant except for deciding what type of alignment work to do. But that does seem nontrival. My bottom line is: push for pause/slowdown, but don't get overoptimistic. Simultaneously work toward alignment on the current path, as fast as possible, because that might well be our only chance.

To your point:

I take your point on the standard reasoning. I agree that most adults would turn down even a 95-5%, 19 vs 1 odds of improving their lives with a small chance of destroying the world.

But I'm afraid those with decision-making power would take far worse odds in private, where it matters. That's because for them, it's not just a better life; it's immortality vs. dying soon. And that tends to change decision-making.

I added and marked an edit to make this part of the logic explicit.

Most humans with decision-making power, e.g. in goverment, are 50+ years old, and mostly older since power tends to accumulate at least until sharp cognitive declines set in. There's a pretty good chance they will die of natural causes if ASI isn't created to do groundbreaking medical research within their lifetimes.

That's on top of any actual nationalistic tendencies, or fears of being killed, enslaved, tortured, or worse, mocked, by losing the race to one's political enemies covertly pursuing ASI.

And that's on top of worrying that sociopaths (or similarly cold/selfish decision-makers) are over-represented in the halls of power. Those arguments seem pretty strong to me, too.

How this would unfold is highly unclear to me. I think it's important to develop gears-level models of how these processes might happen, as Raemon suggests in this post.

My guess is that covert programs are between likely and inevitable. Public pressure will be for caution; private and powerful opinions will be much harder to predict.

As for the public statements, it works just fine to say, or more likely convince yourself, that you think aligment is solvable with little chance of failure, and that patriotism and horror over (insert enemy ideology here) controlling the future are your motivations.

[-]Raemon2mo40

Given all of that, I'd expect covert or overt government projects to continue even if we get a treaty banning private research. (Which I do find quite possible; I expect scary demos to happen naturally).

Why is this special to Shutdown, vs Controlled Takeoff? (Here, I'm specifically comparing two plans that both route through "first, do a pretty difficult political action of get countries agreeing to centralize GPUs"). If you just expect people to defect from that, what's the point?

[-]Seth Herd2mo40

The scenario I'm thinking of mostly is the overt version, which is controlled takeoff. A few governments pursue controlled takeoff projects. This is hopefully done in the open and relatively collaboratively across US and Chinese teams. I'd assume they'd recruit from existing teams, effectively consolidating labs under government control. I'd hope that the Western and Chinese teams would agree to share their results on capabilities as well as alignment, although of course they'd worry about defection on that, too.

If they did it covertly that would be defecting. I haven't thought about that scenario as much.

This scenario doesn't seem like it would require consolidating GPUs, just monitoring their usage to some degree. It seems like it would be a lot easier to not make that part of the treaty.

[-]Raemon2mo20

The post is specifically about "globally controlled takeoff", in which multiple governments have agreed to locate their GPUs in locations that are easy for each other to inspect.

There's a spectrum between "Literally all countries agree to consolidate and monitor compute", "US/UK/China do it", "US/UK/Europe agree to do it among themselves", "US does it just for itself" and "individual orgs are just being idk a bit careful and a bit cooperative in an ad-hoc fashion."

I call the latter end of the spectrum "ad hoc semi-controlled semi-slowed takeof" at the beginning of the post. If we get something somewhere in the middle, seems probably an improvement.

[-]Seth Herd2mo42

I thought I was addressing the premise of your post: the world is ready to do serious restrictions on AI research: do they do shutdown or controlled takeoff?

I guess maybe I'm missing what's important about the physical consolidation vs other methods of inspection and enforcement.

I think my scenario conforms to all of the gears you mention. It could be seen as adding another gear: the incentives/psychologies of government decision-makers.

[-]Haiku2mo54

For those who believe that a global shutdown of AGI R&D is next to impossible, or much more difficult to succeed at than a different plan:

Something I think is very important, and which I would be very grateful for, is if you can consistently signal that you would prefer a shutdown if you thought it was feasible.

There are many people who are making an argument against a shutdown, who would genuinely prefer AGI/ASI to be created in the near future and who want to prevent it from being shut down. If that does not describe you, please make it clear in your various communications, especially external communications, that shutting it all down would be best, and you are merely pessimistic that that can be done.

The situation is not symmetrical. A citizen or policymaker hearing "shut it down" who would otherwise want AI to proceed with caution moves in the direction of more caution. A citizen or policymaker hearing "proceed with caution" who would otherwise want AI to be shut down moves in the direction of less caution. Nonetheless, many people who advocate for a shutdown do say that they would like for superintelligent AI to eventually be created, and they simply see other plans as woefully unlikely to succeed.

[-]Haiku2mo60

As an anecdote, a few months ago I met a former MIRI researcher while handing out flyers for PauseAI. We had a great conversation, and they were very concerned about x-risk.

When I asked if they would be willing to sign PauseAI US's petition, they declined, stating that they don't think a shutdown is feasible. I was very confused by this, because for those who are concerned about x-risk, I do not see a strong relationship between the feasibility of a shutdown and whether it is a good idea to advocate for one.

To expand on my point about asymmetry: If the shutdown plan fails, then we are undertaking one of the other plans. If the other plans fail, we die (with unacceptably high probability). Accidentally getting a shutdown when you meant to proceed with caution is a win condition, at least temporarily, and it allows for many other plans to be improved and enacted. Accidentally proceeding with caution when you meant to get a shutdown is walking on thin an ice. The distance from shutdown to death is larger than the distance from proceeding with caution to death.

To address another viewpoint: If you are concerned about x-risk and you believe that all effort going toward advocating for a shutdown is wasted, and that the world would be better off if no one talked about a shutdown, I think you're simply confused about normal social and political dynamics.

[-]Seth Herd2mo51

I fully agree and have been pleased to see this logic clarified in the recent discussions of IABIED. We must choose where to put our primary efforts, but those like me who think alignment might be achievable on the fast path should still say we'd prefer shutdown if at all possible. I will not only continue to say that, but try to share this logic broadly. Pushing hard for caution of any sort will on average improve our odds. I don't think we can get a shutdown (see other comment) but I'll still state clearly that we should shut it all down.

It only takes a moment to say "well of course I think we'd shut it all down if we were wise, but assuming we don't, here are my plans and hopes...."

[-]Chris van Merwijk2mo30

These problems still exist in some versions of Shut It Down too, to be clear (if you're trying to also ban algorithmic research – a lot of versions of that seem like they leave room to argue about whether agent foundations or interpretability count).

This is the main reason for why, despite being strongly in favor of Shut It Down, I'm still very nervous about it.

[-]StanislavKrym2mo30

But I think there are various flavors of Sufficiently Scary Demos that will make the threat much more salient without needing to route through abstract arguments.

I suspect that the most plausible SSD would be a rogue AI replicating in the wild, as proposed by Alvin Anestrand. This AI-2027 fanwork has open-sourced AIs become able to replicate because someone will release a capable model. Then, according to the fanwork, this wilderness is infested by Agent-2, then Agent-4. Agent-4 continues the research in order to create Agent-5 and succeeds, obtaining a part of the lightcone.

My worse-case modification is that early open-sourced AIs will somehow have enough agency to work on the analogue of Agent-5 or Consensus-1. This worse-case scenario would be able to prompt the AI companies to race instead of slowing down, leading to a fiasco.

[-]Raemon2mo6-3

I think Sufficiently Scary Demos need to do something that

a) directly, clearly is capable of threatening specific world leaders from multiple nations at once, in a way that is viscerally salient to them specifically

b) but, you don't end up going to jail for it (i.e. something like the difference between "a really good prank" and "actually hurting someone")

c) ideally, relies as little as possible on general intelligence, as opposed to extremely powerful narrow stuff (with just enough general intelligent agency there to demonstrate that this is scary because it can be self-directed, as opposed to just being a weapon you want to make sure you control)

^{^}

to be clear this bias also applies in the MIRI-esque direction. But, they're not the one rushing ahead inventing AGI.

LESSWRONG
LW

LESSWRONG
LW

102

"Shut It Down" is simpler than "Controlled Takeoff"

102

102

"What's more impossible?"

Gears rather than Bottom Lines