In the late 19th century, two researchers meet to discuss their differing views on the existential risk posed by future Uncontrollable Super-Powerful Explosives.

  • Catastrophist: I predict that one day, not too far in the future, we will find a way to unlock a qualitatively new kind of explosive power. This explosive will represent a fundamental break with what has come before. It will be so much more powerful than any other explosive that whoever gets to this technology first might be in a position to gain a DSA over any opposition. Also, the governance and military strategies that we were using to prevent wars or win them will be fundamentally unable to control this new technology, so we'll have to reinvent everything on the fly or die in an extinction-level war. There's no way we'd be competent enough to handle something of that power without killing ourselves almost immediately.
  • Gradualist: I’m also concerned about the prospect of explosives one day becoming far more destructive than they are now, with possibly catastrophic consequences if we aren't prepared. I'm not so sure that we’d instantly go extinct if you were right, though I agree that if anything like what you're describing is real, we’re in a great deal of danger. But we'll leave questions of Governance for another time. In the meantime, I want to push back against the idea that this will all happen so suddenly. What does that word 'DSA' mean?
  • C: Decisive strategic advantage. Anyone who has that technology would be able to render any military opposition irrelevant. Probably, they'd be able to wipe out entire cities with one bomb and force their opponents to surrender almost immediately.
  • G: That seems like a weirdly specific prediction to make. Why assume something so unlikely? Have you got any evidence such a thing is even possible?
  • C: I have my reasons, but first let me deal with what you just said, because I can't let that slip. Zero-to-one discontinuities are actually pretty common in the history of technology. Someone had to invent guns or steam engines for the first time. Why wouldn't there be a zero-to-one transition for explosive power someday?
  • G: Because zero-to-one discontinuities happen when you do something for the first time.
  • C: Yes, that's what I'm suggesting.
  • G: No you aren't: we've already had our zero-to-one discontinuity! We've invented black powder, then dynamite and fuses, from now on there'll be incremental changes and inventions that increase explosive power. We might see step changes when some new kind of chemical is discovered, but what you're talking about isn't possible. Or it's at least highly unlikely.
  • C: Why? You’ve just admitted step changes happen all the time.
  • G: Because what you're talking about requires people to just ignore a hugely promising road to technological improvement, probably consisting of many steps, for ages, to get so much of a lead over their competitors in explosives technology.
  • C: And who says that explosive power actually works like that?
  • G: Because that's our default expectation with a technology like explosives, where there are lots of paths to improvement and lots of effort exerted on every part of the problem. Unless, you maybe have evidence that this isn't how it works?
  • C: Yes, I was just getting to that. Your priors don't mean anything if we have already seen an existence proof for qualitatively new energy sources.
  • G: So you have a design for this super-explosive?
  • C: No, but that’s not necessary for my point.
  • G: So you've at least found a new principle of physics that implies it is possible?
  • C: I'm talking about the Sun. The energy the Sun outputs is overwhelming, enough to warm the entire earth. One day, we'll discover how to release those energies ourselves, and that will give us qualitatively better explosives. I can't say how to do it of course, other than maybe giving you some vague hints about replicating the conditions inside the sun, but to be honest I don't really expect our super-explosive to look much like the Sun, any more than trains look like horses. All I know is that it will use the same underlying principle that the Sun uses to release its incredible power.
  • G: How does that get you to assuming there'll be a discontinuity?
  • C: Because not getting a discontinuity when we discover the power of the Sun would require an extreme coincidence.
  • G: How can getting that incredibly specific outcome of a massive jump in capability be the default?
  • C: The Sun is an existence proof for the new kind of explosive energy. And we know that under the right circumstances this energy can exceed our best regular explosives by a vast amount, so it seems foolhardy to assume our first super-explosive will just happen to be as powerful as our best normal explosive technologies are whenever we make the discovery. Why would that be the case?
  • G: Oh, I see. Well the flaw in your argument is clear then. You're assuming the Sun is something qualitatively new.
  • C: You think it isn't? I'd like to see you pile together enough TNT to heat the Earth.
  • G: The idea that there'll be a 'first person to discover the power of the Sun' is a mirage. I offer a micro-foundational explanation of why the Sun seems like a qualitatively new energy source, but really it isn't.
  • C: Let's hear it, then.
  • G: I think that the Sun is nothing but a giant ball of gas heated by gravitational potential energy. One day, in some crazy distant future, we might be able to pile on enough gas that gravity implodes and heats it, but that'll require us to be able to literally build stars, it's not going to occur suddenly. We'll pile up a small amount of gas, then a larger amount, and so on after we've given up on assembling bigger and bigger piles of explosives.
  • C: Why assume that's how the sun works, and it's not something new? Do you have any evidence for your view?
  • G: This is compared to the view that the sun is powered by something entirely new and unknown?
  • C: New, maybe, but it's not unknown. Just look out of a window.
  • G: I don't have any specific evidence that it's powered by gravitational collapse, since my model was made to retrodict observations we’ve already made, but your theory explains the same phenomena more vaguely while making a bunch of extra assumptions that we don't need. I can already provide a somewhat satisfactory explanation, even if a few details don't make complete sense yet, like the age of the Earth. So you might be right, in principle. I've not checked my maths on this model that thoroughly.
  • C: I think you're overly confident in your gravitational collapse theory because it fits your priors about 'smooth progress', even though it’s only a vague match with observation. My theory that the Sun is powered by something different is a much more natural explanation.
  • G: I'm still not seeing any good evidence against my view, though. And I even grant that you don't have literally zero evidence in favour of your view, from the existence of the Sun. (Though I note you’re also retrodicting things we've both already observed). Something like the Sun existing is somewhat more natural if you assume there's some incomprehensible physics secret powering it instead of gravitational collapse, but I can't seriously credit your world-view much beyond that otherwise.
  • C: I'm working on a proof that the Sun can't just be heated by gravitational collapse.
  • G: I’d like to see that. In the meantime, let’s see if we can find out if our conflicting theories about the Sun predict anything different that we can test right now. That’s probably the best way to move this forward.

53

12 comments, sorted by Click to highlight new comments since: Today at 3:31 AM
New Comment

Okay but the US didn’t take over the world in 1945.

In a very real sense, we did. The US and allies dictated the terms of the post-WWII world order, then did so again financially when the left the Bretton-Woods system and moved the world to fiat currencies, then did so again geopolitically when they dictated terms to post-Soviet Russia in the 1990s. Sure, there was a period where American dominance was uncertain, once they also get the atomic bomb, and that was a bit less clear while the USSR was still in place, but by the 1980s in was inevitable that they had lost, and in 1990 the Soviet Union fell.  It's been another 20 years, and during a large part of that time, the leading hypothesis was that history had ended, with the US and the liberal order as the victor.

I think the most relevant takeaway is that we did end up with an arsenal of weapons that have now put us, at all times, hours away from nuclear winter by a very reasonable metric of counterfactual possibility.

And while nuclear winter in practice probably wouldn’t be quite an extinction-level event from what I hear, it was still a very counterfactually close possibility that a nuke’s surprisingly runaway chain reaction could have been just a little more runaway.

Participants in this kind of dialogue should come in with a healthy respect for the likelihood that a big extinction risk will become salient when research figures out how to harness a new kind of power.

As much as it maybe ruins the fun for me to just point out the message: the major point of the story was that you weren't supposed to condition on us knowing that nuclear weapons are real, and instead ask whether the Gradualist or Catastrophist's arguments actually make sense given what they knew.

That's the situation I think we're in with Fast AI Takeoff. We're trying to interpret what the existence of general intelligences like humans (the Sun) implies for future progress on ML algorithms (normal explosives), without either a clear underlying theory for what the Sun's power really is, or any direct evidence that there'll be a jump.

That remark about the 'micro-foundational explanation for why the sun looks qualitatively new but really isn't' refers to Richard Ngo's explanation of why humans are so much better than chimps: https://www.lesswrong.com/s/n945eovrA3oDueqtq/p/gf9hhmSvpZfyfS34B#13_1__Alignment_difficulty_debate__Richard_Ngo_s_case

Richard Ngo: You don’t have a specific argument about utility functions and their relationship to AGIs in a precise, technical way. Instead, it’s more like utility functions are like a pointer towards the type of later theory that will give us a much more precise understanding of how to think about intelligence and agency and AGIs pursuing goals and so on. And to Eliezer, it seems like we’ve got a bunch of different handles on what the shape of this larger scale theory might look like, but he can’t really explain it in precise terms. It’s maybe in the same way that for any other scientific theory, before you latch onto it, you can only gesture towards a bunch of different intuitions that you have and be like, “Hey guys, there are these links between them that I can’t make precise or rigorous or formal at this point.”

In my opinion the relevant detail is that we were not able to prevent the Soviets from getting the bomb. It took them all of about 3 years. It'll take China, Russia, open source hackers et al about 18 months max to replicate AGI once it arrives. So much for your decisive strategic advantage.

Nuclear technology is hours away from reducing the value of all human civilization by 10%, and for all we knew that figure could have been 100%. That’s the nuclear threat. I wouldn’t even classify that as a “geopolitical” threat. The fact that Soviet nuclear technology pretty quickly became comparable to US nuclear technology isn’t the most salient fact in the story. The story is that research got really close, and is still really close, to releasing hell, and the door to hell looks generally pretty easy to open.

18 months is more than enough to get a DSA if AGI turns out anything we fear (that is, something really powerful and difficult to control, probably arriving fast at such state through an intelligence explosion).

In fact, I'd even argue 18 days might be enough. AI is already beginning to solve protein folding (Alphafold). If it progresses from there and builds a nanosystem, that's more than enough to get a DSA aka take over the world. We currently see AIs like MuZero learning in hours what would take a lifetime for a human to learn, so it wouldn't surprise me an advanced AI solving advanced nanotech in a few days.

Whether the first AGI will be aligned or not is way more concerning. Not because who gets there first isn't also extremely important. Only because getting there first is the "easy" part.

I don't really think advanced AI can be compared to atomic bombs. The former is a way more explosive technology, pun intended.

Also the sun has incredibly low power density. This would not let you infer you could release enormous bursts of energy all at once.

As this essay is actually being written in a world where nuclear weapons are a thing, it becomes easy to cherry-pick the example of nuclear weapons. I can think of a number of things for which the catastrophist could have made a similar argument in the 19th century and just been wrong, like expecting everyone to get personal jetpacks, or to be able to routinely travel to Mars..

The debate continues on whether anti-matter bombs are possible or pose additional worrying dynamics.

The only good reference I know on that is the Gsponer Fourth Generation book, covering "subcritical fission-burn, magnetic compression, superheavy elements, antimatter, nuclear isomers, metallic hydrogen and superlasers"; the antimatter section discusses uses for things like H-bomb triggers & subcritical micro-nukes. Is there something more recent or better on anti-matter bombs?

Would the appropriate analogy to agents be that humans are a qualitatively different type of agent compared to animals and basic RL agents, and thus we should expect that there will be a fundamental discontinuity between what we have so far, and conscious agents?