Let’s think about slowing down AI

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Averting doom by not building the doom machine

If you fear that someone will build a machine that will seize control of the world and annihilate humanity, then one kind of response is to try to build further machines that will seize control of the world even earlier without destroying it, forestalling the ruinous machine’s conquest. An alternative or complementary kind of response is to try to avert such machines being built at all, at least while the degree of their apocalyptic tendencies is ambiguous. 

The latter approach seems to me  like the kind of basic and obvious thing worthy of at least consideration, and also in its favor, fits nicely in the genre ‘stuff that it isn’t that hard to imagine happening in the real world’. Yet my impression is that for people worried about extinction risk from artificial intelligence, strategies under the heading ‘actively slow down AI progress’ have historically been dismissed and ignored (though ‘don’t actively speed up AI progress’ is popular).

The conversation near me over the years has felt a bit like this: 

Some people: AI might kill everyone. We should design a godlike super-AI of perfect goodness to prevent that.

Others: wow that sounds extremely ambitious

Some people: yeah but it’s very important and also we are extremely smart so idk it could work

[Work on it for a decade and a half]

Some people: ok that’s pretty hard, we give up

Others: oh huh shouldn’t we maybe try to stop the building of this dangerous AI? 

Some people: hmm, that would involve coordinating numerous people—we may be arrogant enough to think that we might build a god-machine that can take over the world and remake it as a paradise, but we aren’t delusional

This seems like an error to me. (And lately, to a bunch of other people.) 

I don’t have a strong view on whether anything in the space of ‘try to slow down some AI research’ should be done. But I think a) the naive first-pass guess should be a strong ‘probably’, and b) a decent amount of thinking should happen before writing off everything in this large space of interventions. Whereas customarily the tentative answer seems to be, ‘of course not’ and then the topic seems to be avoided for further thinking. (At least in my experience—the AI safety community is large, and for most things I say here, different experiences are probably had in different bits of it.)

Maybe my strongest view is that one shouldn’t apply such different standards of ambition to these different classes of intervention. Like: yes, there appear to be substantial difficulties in slowing down AI progress to good effect. But in technical alignment, mountainous challenges are met with enthusiasm for mountainous efforts. And it is very non-obvious that the scale of difficulty here is much larger than that involved in designing acceptably safe versions of machines capable of taking over the world before anyone else in the world designs dangerous versions. 

I’ve been talking about this with people over the past many months, and have accumulated an abundance of reasons for not trying to slow down AI, most of which I’d like to argue about at least a bit. My impression is that arguing in real life has coincided with people moving toward my views.

Quick clarifications

First, to fend off misunderstanding—

  1. I take ‘slowing down dangerous AI’ to include any of: 
    1. reducing the speed at which AI progress is made in general, e.g. as would occur if general funding for AI declined.
    2. shifting AI efforts from work leading more directly to risky outcomes to other work, e.g. as might occur if there was broadscale concern about very large AI models, and people and funding moved to other projects.
    3. Halting categories of work until strong confidence in its safety is possible, e.g. as would occur if AI researchers agreed that certain systems posed catastrophic risks and should not be developed until they did not. (This might mean a permanent end to some systems, if they were intrinsically unsafe.)
    (So in particular, I’m including both actions whose direct aim is slowness in general, and actions whose aim is requiring safety before specific developments, which implies slower progress.)
  2. I do think there is serious attention on some versions of these things, generally under other names. I see people thinking about ‘differential progress’ (b. above), and strategizing about coordination to slow down AI at some point in the future (e.g. at ‘deployment’). And I think a lot of consideration is given to avoiding actively speeding up AI progress. What I’m saying is missing are, a) consideration of actively working to slow down AI now, and b) shooting straightforwardly to ‘slow down AI’, rather than wincing from that and only considering examples of it that show up under another conceptualization (perhaps this is an unfair diagnosis).
  3. AI Safety is a big community, and I’ve only ever been seeing a one-person window into it, so maybe things are different e.g. in DC, or in different conversations in Berkeley. I’m just saying that for my corner of the world, the level of disinterest in this has been notable, and in my view misjudged.

Why not slow down AI? Why not consider it?

Ok, so if we tentatively suppose that this topic is worth even thinking about, what do we think? Is slowing down AI a good idea at all? Are there great reasons for dismissing it?

Scott Alexander wrote a post a little while back raising reasons to dislike the idea, roughly:

  1. Do you want to lose an arms race? If the AI safety community tries to slow things down, it will disproportionately slow down progress in the US, and then people elsewhere will go fast and get to be the ones whose competence determines whether the world is destroyed, and whose values determine the future if there is one. Similarly, if AI safety people criticize those contributing to AI progress, it will mostly discourage the most friendly and careful AI capabilities companies, and the reckless ones will get there first.
  2. One might contemplate ‘coordination’ to avoid such morbid races. But coordinating anything with the whole world seems wildly tricky. For instance, some countries are large, scary, and hard to talk to.
  3. Agitating for slower AI progress is ‘defecting’ against the AI capabilities folks, who are good friends of the AI safety community, and their friendship is strategically valuable for ensuring that safety is taken seriously in AI labs (as well as being non-instrumentally lovely! Hi AI capabilities friends!). 

Other opinions I’ve heard, some of which I’ll address:

  1. Slowing AI progress is futile: for all your efforts you’ll probably just die a few years later
  2. Coordination based on convincing people that AI risk is a problem is absurdly ambitious. It’s practically impossible to convince AI professors of this, let alone any real fraction of humanity, and you’d need to convince a massive number of people.
  3. What are we going to do, build powerful AI never and die when the Earth is eaten by the sun?
  4. It’s actually better for safety if AI progress moves fast. This might be because the faster AI capabilities work happens, the smoother AI progress will be, and this is more important than the duration of the period. Or speeding up progress now might force future progress to be correspondingly slower. Or because safety work is probably better when done just before building the relevantly risky AI, in which case the best strategy might be to get as close to dangerous AI as possible and then stop and do safety work. Or if safety work is very useless ahead of time, maybe delay is fine, but there is little to gain by it. 
  5. Specific routes to slowing down AI are not worth it. For instance, avoiding working on AI capabilities research is bad because it’s so helpful for learning on the path to working on alignment. And AI safety people working in AI capabilities can be a force for making safer choices at those companies.
  6. Advanced AI will help enough with other existential risks as to represent a net lowering of existential risk overall.1
  7. Regulators are ignorant about the nature of advanced AI (partly because it doesn’t exist, so everyone is ignorant about it). Consequently they won’t be able to regulate it effectively, and bring about desired outcomes.

My impression is that there are also less endorsable or less altruistic or more silly motives floating around for this attention allocation. Some things that have come up at least once in talking to people about this, or that seem to be going on:

  • Advanced AI might bring manifold wonders, e.g. long lives of unabated thriving. Getting there a bit later is fine for posterity, but for our own generation it could mean dying as our ancestors did while on the cusp of a utopian eternity. Which would be pretty disappointing. For a person who really believes in this future, it can be tempting to shoot for the best scenario—humanity builds strong, safe AI in time to save this generation—rather than the scenario where our own lives are inevitably lost.
  • Sometimes people who have a heartfelt appreciation for the flourishing that technology has afforded so far can find it painful to be superficially on the side of Luddism here.
  • Figuring out how minds work well enough to create new ones out of math is an incredibly deep and interesting intellectual project, which feels right to take part in. It can be hard to intuitively feel like one shouldn’t do it.

    (Illustration from a co-founder of modern computational reinforcement learning: )

It will be the greatest intellectual achievement of all time.

An achievement of science, of engineering, and of the humanities,
whose significance is beyond humanity,
beyond life,
beyond good and bad.

— Richard Sutton (@RichardSSutton) September 29, 2022
  • It is uncomfortable to contemplate projects that would put you in conflict with other people. Advocating for slower AI feels like trying to impede someone else’s project, which feels adversarial and can feel like it has a higher burden of proof than just working on your own thing.
  • ‘Slow-down-AGI’ sends people’s minds to e.g. industrial sabotage or terrorism, rather than more boring courses, such as, ‘lobby for labs developing shared norms for when to pause deployment of models’. This understandably encourages dropping the thought as soon as possible.
  • My weak guess is that there’s a kind of bias at play in AI risk thinking in general, where any force that isn’t zero is taken to be arbitrarily intense. Like, if there is pressure for agents to exist, there will arbitrarily quickly be arbitrarily agentic things. If there is a feedback loop, it will be arbitrarily strong. Here, if stalling AI can’t be forever, then it’s essentially zero time. If a regulation won’t obstruct every dangerous project, then is worthless. Any finite economic disincentive for dangerous AI is nothing in the face of the omnipotent economic incentives for AI. I think this is a bad mental habit: things in the real world often come down to actual finite quantities. This is very possibly an unfair diagnosis. (I’m not going to discuss this later; this is pretty much what I have to say.)
  • I sense an assumption that slowing progress on a technology would be a radical and unheard-of move.
  • I agree with lc that there seems to have been a quasi-taboo on the topic, which perhaps explains a lot of the non-discussion, though still calls for its own explanation. I think it suggests that concerns about uncooperativeness play a part, and the same for thinking of slowing down AI as centrally involving antisocial strategies.

I’m not sure if any of this fully resolves why AI safety people haven’t thought about slowing down AI more, or whether people should try to do it. But my sense is that many of the above reasons are at least somewhat wrong, and motives somewhat misguided, so I want to argue about a lot of them in turn, including both arguments and vague motivational themes.

The mundanity of the proposal

Restraint is not radical

There seems to be a common thought that technology is a kind of inevitable path along which the world must tread, and that trying to slow down or avoid any part of it would be both futile and extreme.2 

But empirically, the world doesn’t pursue every technology—it barely pursues any technologies.

Sucky technologies

For a start, there are many machines that there is no pressure to make, because they have no value. Consider a machine that sprays shit in your eyes. We can technologically do that, but probably nobody has ever built that machine. 

This might seem like a stupid example, because no serious ‘technology is inevitable’ conjecture is going to claim that totally pointless technologies are inevitable. But if you are sufficiently pessimistic about AI, I think this is the right comparison: if there are kinds of AI that would cause huge net costs to their creators if created, according to our best understanding, then they are at least as useless to make as the ‘spray shit in your eyes’ machine. We might accidentally make them due to error, but there is not some deep economic force pulling us to make them. If unaligned superintelligence destroys the world with high probability when you ask it to do a thing, then this is the category it is in, and it is not strange for its designs to just rot in the scrap-heap, with the machine that sprays shit in your eyes and the machine that spreads caviar on roads.

Ok, but maybe the relevant actors are very committed to being wrong about whether unaligned superintelligence would be a great thing to deploy. Or maybe you think the situation is less immediately dire and building existentially risky AI really would be good for the people making decisions (e.g. because the costs won’t arrive for a while, and the people care a lot about a shot at scientific success relative to a chunk of the future). If the apparent economic incentives are large, are technologies unavoidable?

Extremely valuable technologies

It doesn’t look like it to me. Here are a few technologies which I’d guess have substantial economic value, where research progress or uptake appears to be drastically slower than it could be, for reasons of concern about safety or ethics3:

  1. Huge amounts of medical research, including really important medical research e.g. The FDA banned human trials of strep A vaccines from the 70s to the 2000s, in spite of 500,000 global deaths every year. A lot of people also died while covid vaccines went through all the proper trials. 
  2. Nuclear energy
  3. Fracking
  4. Various genetics things: genetic modification of foods, gene drives, early recombinant DNA researchers famously organized a moratorium and then ongoing research guidelines including prohibition of certain experiments (see the Asilomar Conference)
  5. Nuclear, biological, and maybe chemical weapons (or maybe these just aren’t useful)
  6. Various human reproductive innovation: cloning of humans, genetic manipulation of humans (a notable example of an economically valuable technology that is to my knowledge barely pursued across different countries, without explicit coordination between those countries, even though it would make those countries more competitive. Someone used CRISPR on babies in China, but was imprisoned for it.)
  7. Recreational drug development
  8. Geoengineering
  9. Much of science about humans? I recently ran this survey, and was reminded how encumbering ethical rules are for even incredibly innocuous research. As far as I could tell the EU now makes it illegal to collect data in the EU unless you promise to delete the data from anywhere that it might have gotten to if the person who gave you the data wishes for that at some point. In all, dealing with this and IRB-related things added maybe more than half of the effort of the project. Plausibly I misunderstand the rules, but I doubt other researchers are radically better at figuring them out than I am.
  10. There are probably examples from fields considered distasteful or embarrassing to associate with, but it’s hard as an outsider to tell which fields are genuinely hopeless versus erroneously considered so. If there are economically valuable health interventions among those considered wooish, I imagine they would be much slower to be identified and pursued by scientists with good reputations than a similarly promising technology not marred in that way. Scientific research into intelligence is more clearly slowed by stigma, but it is less clear to me what the economically valuable upshot would be.
  11. (I think there are many other things that could be in this list, but I don’t have time to review them at the moment. This page might collect more of them in future.)

It seems to me that intentionally slowing down progress in technologies to give time for even probably-excessive caution is commonplace. (And this is just looking at things slowed down over caution or ethics specifically—probably there are also other reasons things get slowed down.)

Furthermore, among valuable technologies that nobody is especially trying to slow down, it seems common enough for progress to be massively slowed by relatively minor obstacles, which is further evidence for a lack of overpowering strength of the economic forces at play. For instance, Fleming first took notice of mold’s effect on bacteria in 1928, but nobody took a serious, high-effort shot at developing it as a drug until 1939.4 Furthermore, in the thousands of years preceding these events, various people noticed numerous times that mold, other fungi or plants inhibited bacterial growth, but didn’t exploit this observation even enough for it not to be considered a new discovery in the 1920s. Meanwhile, people dying of infection was quite a thing. In 1930 about 300,000 Americans died of bacterial illnesses per year (around 250/100k).

My guess is that people make real choices about technology, and they do so in the face of economic forces that are feebler than commonly thought. 

Restraint is not terrorism, usually

I think people have historically imagined weird things when they think of ‘slowing down AI’. I posit that their central image is sometimes terrorism (which understandably they don’t want to think about for very long), and sometimes some sort of implausibly utopian global agreement.

Here are some other things that ‘slow down AI capabilities’ could look like (where the best positioned person to carry out each one differs, but if you are not that person, you could e.g. talk to someone who is):

  1. Don’t actively forward AI progress, e.g. by devoting your life or millions of dollars to it (this one is often considered already)
  2. Try to convince researchers, funders, hardware manufacturers, institutions etc that they too should stop actively forwarding AI progress
  3. Try to get any of those people to stop actively forwarding AI progress even if they don’t agree with you: through negotiation, payments, public reproof, or other activistic means.
  4. Try to get the message to the world that AI is heading toward being seriously endangering. If AI progress is broadly condemned, this will trickle into myriad decisions: job choices, lab policies, national laws. To do this, for instance produce compelling demos of risk, agitate for stigmatization of risky actions, write science fiction illustrating the problems broadly and evocatively (I think this has actually been helpful repeatedly in the past), go on TV, write opinion pieces, help organize and empower the people who are already concerned, etc.
  5. Help organize the researchers who think their work is potentially omnicidal into coordinated action on not doing it.
  6. Move AI resources from dangerous research to other research. Move investments from projects that lead to large but poorly understood capabilities, to projects that lead to understanding these things e.g. theory before scaling (see differential technological development in general5).
  7. Formulate specific precautions for AI researchers and labs to take in different well-defined future situations, Asilomar Conference style. These could include more intense vetting by particular parties or methods, modifying experiments, or pausing lines of inquiry entirely. Organize labs to coordinate on these.
  8. Reduce available compute for AI, e.g. via regulation of production and trade, seller choices, purchasing compute, trade strategy.
  9. At labs, choose policies that slow down other labs, e.g. reduce public helpful research outputs
  10. Alter the publishing system and incentives to reduce research dissemination. E.g. A journal verifies research results and releases the fact of their publication without any details, maintains records of research priority for later release, and distributes funding for participation. (This is how Szilárd and co. arranged the mitigation of 1940s nuclear research helping Germany, except I’m not sure if the compensatory funding idea was used.6)
  11. The above actions would be taken through choices made by scientists, or funders, or legislators, or labs, or public observers, etc. Communicate with those parties, or help them act.

Coordination is not miraculous world government, usually

The common image of coordination seems to be explicit, centralized, involving of every party in the world, and something like cooperating on a prisoners’ dilemma: incentives push every rational party toward defection at all times, yet maybe through deontological virtues or sophisticated decision theories or strong international treaties, everyone manages to not defect for enough teetering moments to find another solution.

That is a possible way coordination could be. (And I think one that shouldn’t be seen as so hopeless—the world has actually coordinated on some impressive things, e.g. nuclear non-proliferation.) But if what you want is for lots of people to coincide in doing one thing when they might have done another, then there are quite a few ways of achieving that. 

Consider some other case studies of coordinated behavior:

  • Not eating sand. The whole world coordinates to barely eat any sand at all. How do they manage it? It is actually not in almost anyone’s interest to eat sand, so the mere maintenance of sufficient epistemological health to have this widely recognized does the job.
  • Eschewing bestiality: probably some people think bestiality is moral, but enough don’t that engaging in it would risk huge stigma. Thus the world coordinates fairly well on doing very little of it.
  • Not wearing Victorian attire on the streets: this is similar but with no moral blame involved. Historic dress is arguably often more aesthetic than modern dress, but even people who strongly agree find it unthinkable to wear it in general, and assiduously avoid it except for when they have ‘excuses’ such as a special party. This is a very strong coordination against what appears to otherwise be a ubiquitous incentive (to be nicer to look at). As far as I can tell, it’s powered substantially by the fact that it is ‘not done’ and would now be weird to do otherwise. (Which is a very general-purpose mechanism.)
  • Political correctness: public discourse has strong norms about what it is okay to say, which do not appear to derive from a vast majority of people agreeing about this (as with bestiality say). New ideas about what constitutes being politically correct sometimes spread widely. This coordinated behavior seems to be roughly due to decentralized application of social punishment, from both a core of proponents, and from people who fear punishment for not punishing others. Then maybe also from people who are concerned by non-adherence to what now appears to be the norm given the actions of the others. This differs from the above examples, because it seems like it could persist even with a very small set of people agreeing with the object-level reasons for a norm. If failing to advocate for the norm gets you publicly shamed by advocates, then you might tend to advocate for it, making the pressure stronger for everyone else. 

These are all cases of very broadscale coordination of behavior, none of which involve prisoners’ dilemma type situations, or people making explicit agreements which they then have an incentive to break. They do not involve centralized organization of huge multilateral agreements. Coordinated behavior can come from everyone individually wanting to make a certain choice for correlated reasons, or from people wanting to do things that those around them are doing, or from distributed behavioral dynamics such as punishment of violations, or from collaboration in thinking about a topic.

You might think they are weird examples that aren’t very related to AI. I think, a) it’s important to remember the plethora of weird dynamics that actually arise in human group behavior and not get carried away theorizing about AI in a world drained of everything but prisoners’ dilemmas and binding commitments, and b) the above are actually all potentially relevant dynamics here.

If AI in fact poses a large existential risk within our lifetimes, such that it is net bad for any particular individual, then the situation in theory looks a lot like that in the ‘avoiding eating sand’ case. It’s an option that a rational person wouldn’t want to take if they were just alone and not facing any kind of multi-agent situation. If AI is that dangerous, then not taking this inferior option could largely come from a coordination mechanism as simple as distribution of good information. (You still need to deal with irrational people and people with unusual values.)

But even failing coordinated caution from ubiquitous insight into the situation, other models might work. For instance, if there came to be somewhat widespread concern that AI research is bad, that might substantially lessen participation in it, beyond the set of people who are concerned, via mechanisms similar to those described above. Or it might give rise to a wide crop of local regulation, enforcing whatever behavior is deemed acceptable. Such regulation need not be centrally organized across the world to serve the purpose of coordinating the world, as long as it grew up in different places similarly. Which might happen because different locales have similar interests (all rational governments should be similarly concerned about losing power to automated power-seeking systems with unverifiable goals), or because—as with individuals—there are social dynamics which support norms arising in a non-centralized way.

The arms race model and its alternatives

Ok, maybe in principle you might hope to coordinate to not do self-destructive things, but realistically, if the US tries to slow down, won’t China or Facebook or someone less cautious take over the world? 

Let’s be more careful about the game we are playing, game-theoretically speaking.

The arms race

What is an arms race, game theoretically? It’s an iterated prisoners’ dilemma, seems to me. Each round looks something like this:

Player 1 chooses a row, Player 2 chooses a column, and the resulting payoffs are listed in each cell, for {Player 1, Player 2}

In this example, building weapons costs one unit. If anyone ends the round with more weapons than anyone else, they take all of their stuff (ten units).

In a single round of the game it’s always better to build weapons than not (assuming your actions are devoid of implications about your opponent’s actions). And it’s always better to get the hell out of this game.

This is not much like what the current AI situation looks like, if you think AI poses a substantial risk of destroying the world.

The suicide race

A closer model: as above except if anyone chooses to build, everything is destroyed (everyone loses all their stuff—ten units of value—as well as one unit if they built).

This is importantly different from the classic ‘arms race’ in that pressing the ‘everyone loses now’ button isn’t an equilibrium strategy.

That is: for anyone who thinks powerful misaligned AI represents near-certain death, the existence of other possible AI builders is not any reason to ‘race’. 

But few people are that pessimistic. How about a milder version where there’s a good chance that the players ‘align the AI’?

The safety-or-suicide race 

Ok, let’s do a game like the last but where if anyone builds, everything is only maybe destroyed (minus ten to all), and in the case of survival, everyone returns to the original arms race fun of redistributing stuff based on who built more than whom (+10 to a builder and -10 to a non-builder if there is one of each). So if you build AI alone, and get lucky on the probabilistic apocalypse, can still win big.

Let’s take 50% as the chance of doom if any building happens. Then we have a game whose expected payoffs are half way between those in the last two games:

(These are expected payoffs—the minus one unit return to building alone comes from the one unit cost of building, plus half a chance of losing ten in an extinction event and half a chance of taking ten from your opponent in a world takeover event.)

Now you want to do whatever the other player is doing: build if they’ll build, pass if they’ll pass. 

If the odds of destroying the world were very low, this would become the original arms race, and you’d always want to build. If very high, it would become the suicide race, and you’d never want to build. What the probabilities have to be in the real world to get you into something like these different phases is going to be different, because all these parameters are made up (the downside of human extinction is not 10x the research costs of building powerful AI, for instance).

But my point stands: even in terms of simplish models, it’s very non-obvious that we are in or near an arms race. And therefore, very non-obvious that racing to build advanced AI faster is even promising at a first pass.

In less game-theoretic terms: if you don’t seem anywhere near solving alignment, then racing as hard as you can to be the one who it falls upon to have solved alignment—especially if that means having less time to do so, though I haven’t discussed that here—is probably unstrategic. Having more ideologically pro-safety AI designers win an ‘arms race’ against less concerned teams is futile if you don’t have a way for such people to implement enough safety to actually not die, which seems like a very live possibility. (Robby Bensinger and maybe Andrew Critch somewhere make similar points.)

Conversations with my friends on this kind of topic can go like this:

Me: there’s no real incentive to race if the prize is mutual death

Them: sure, but it isn’t—if there’s a sliver of hope of surviving unaligned AI, and if your side taking control in that case is a bit better in expectation, and if they are going to build powerful AI anyway, then it’s worth racing. The whole future is on the line!

Me: Wouldn’t you still be better off directing your own efforts to safety, since your safety efforts will also help everyone end up with a safe AI? 

Them: It will probably only help them somewhat—you don’t know if the other side will use your safety research. But also, it’s not just that they have less safety research. Their values are probably worse, by your lights. 

Me: If they succeed at alignment, are foreign values really worse than local ones? Probably any humans with vast intelligence at hand have a similar shot at creating a glorious human-ish utopia, no?

Them: No, even if you’re right that being similarly human gets you to similar values in the end, the other parties might be more foolish than our side, and lock-in7 some poorly thought-through version of their values that they want at the moment, or even if all projects would be so foolish, our side might have better poorly thought-through values to lock in, as well as being more likely to use safety ideas at all. Even if racing is very likely to lead to death, and survival is very likely to lead to squandering most of the value, in that sliver of happy worlds so much is at stake in whether it is us or someone else doing the squandering!

Me: Hmm, seems complicated, I’m going to need paper for this.

The complicated race/anti-race

Here is a spreadsheet of models you can make a copy of and play with.

The first model is like this:

  1. Each player divides their effort between safety and capabilities
  2. One player ‘wins’, i.e. builds ‘AGI’ (artificial general intelligence) first. 
  3. P(Alice wins) is a logistic function of Alice’s capabilities investment relative to Bob’s
  4. Each players’ total safety is their own safety investment plus a fraction of the other’s safety investment.
  5. For each player there is some distribution of outcomes if they achieve safety, and a set of outcomes if they do not, which takes into account e.g. their proclivities for enacting stupid near-term lock-ins.
  6. The outcome is a distribution over winners and states of alignment, each of which is a distribution of worlds (e.g. utopia, near-term good lock-in..)
  7. That all gives us a number of utils (Delicious utils!)

The second model is the same except that instead of dividing effort between safety and capabilities, you choose a speed, and the amount of alignment being done by each party is an exogenous parameter. 

These models probably aren’t very good, but so far support a key claim I want to make here: it’s pretty non-obvious whether one should go faster or slower in this kind of scenario—it’s sensitive to a lot of different parameters in plausible ranges. 

Furthermore, I don’t think the results of quantitative analysis match people’s intuitions here.

For example, here’s a situation which I think sounds intuitively like a you-should-race world, but where in the first model above, you should actually go as slowly as possible (this should be the one plugged into the spreadsheet now):

  • AI is pretty safe: unaligned AGI has a mere 7% chance of causing doom, plus a further 7% chance of causing short term lock-in of something mediocre
  • Your opponent risks bad lock-in: If there’s a ‘lock-in’ of something mediocre, your opponent has a 5% chance of locking in something actively terrible, whereas you’ll always pick good mediocre lock-in world (and mediocre lock-ins are either 5% as good as utopia, -5% as good)
  • Your opponent risks messing up utopia: In the event of aligned AGI, you will reliably achieve the best outcome, whereas your opponent has a 5% chance of ending up in a ‘mediocre bad’ scenario then too.
  • Safety investment obliterates your chance of getting to AGI first: moving from no safety at all to full safety means you go from a 50% chance of being first to a 0% chance
  • Your opponent is racing: Your opponent is investing everything in capabilities and nothing in safety
  • Safety work helps others at a steep discount:  your safety work contributes 50% to the other player’s safety 

Your best bet here (on this model) is still to maximize safety investment. Why? Because by aggressively pursuing safety, you can get the other side half way to full safety, which is worth a lot more than than the lost chance of winning. Especially since if you ‘win’, you do so without much safety, and your victory without safety is worse than your opponent’s victory with safety, even if that too is far from perfect.

So if you are in a situation in this space, and the other party is racing, it’s not obvious if it is even in your narrow interests within the game to go faster at the expense of safety, though it may be.

These models are flawed in many ways, but I think they are better than the intuitive models that support arms-racing. My guess is that the next better still models remain nuanced.

Other equilibria and other games

Even if it would be in your interests to race if the other person were racing, ‘(do nothing, do nothing)’ is often an equilibrium too in these games. At least for various settings of the parameters. It doesn’t necessarily make sense to do nothing in the hope of getting to that equilibrium if you know your opponent to be mistaken about that and racing anyway, but in conjunction with communicating with your ‘opponent’, it seems like a theoretically good strategy.

This has all been assuming the structure of the game. I think the traditional response to an arms race situation is to remember that you are in a more elaborate world with all kinds of unmodeled affordances, and try to get out of the arms race. 

Being friends with risk-takers

Caution is cooperative

Another big concern is that pushing for slower AI progress is ‘defecting’ against AI researchers who are friends of the AI safety community. 

For instance Steven Byrnes:

“I think that trying to slow down research towards AGI through regulation would fail, because everyone (politicians, voters, lobbyists, business, etc.) likes scientific research and technological development, it creates jobs, it cures diseases, etc. etc., and you’re saying we should have less of that. So I think the effort would fail, and also be massively counterproductive by making the community of AI researchers see the community of AGI safety / alignment people as their enemies, morons, weirdos, Luddites, whatever.”

(Also a good example of the view criticized earlier, that regulation of things that create jobs and cure diseases just doesn’t happen.)

Or Eliezer Yudkowsky, on worry that spreading fear about AI would alienate top AI labs:

This is the primary reason I didn't, and told others not to, earlier connect the point about human extinction from AGI with AI labs. Kerry has correctly characterized the position he is arguing against, IMO. I myself estimate the public will be toothless vs AGI lab heads.

— Eliezer Yudkowsky (@ESYudkowsky) August 4, 2022

I don’t think this is a natural or reasonable way to see things, because:

  1. The researchers themselves probably don’t want to destroy the world. Many of them also actually agree that AI is a serious existential risk. So in two natural ways, pushing for caution is cooperative with many if not most AI researchers.
  2. AI researchers do not have a moral right to endanger the world, that someone would be stepping on by requiring that they move more cautiously. Like, why does ‘cooperation’ look like the safety people bowing to what the more reckless capabilities people want, to the point of fearing to represent their actual interests, while the capabilities people uphold their side of the ‘cooperation’ by going ahead and building dangerous AI? This situation might make sense as a natural consequence of different people’s power in the situation. But then don’t call it a ‘cooperation’, from which safety-oriented parties would be dishonorably ‘defecting’ were they to consider exercising any power they did have. 

It could be that people in control of AI capabilities would respond negatively to AI safety people pushing for slower progress. But that should be called ‘we might get punished’ not ‘we shouldn’t defect’. ‘Defection’ has moral connotations that are not due. Calling one side pushing for their preferred outcome ‘defection’ unfairly disempowers them by wrongly setting commonsense morality against them.

At least if it is the safety side. If any of the available actions are ‘defection’ that the world in general should condemn, I claim that it is probably ‘building machines that will plausibly destroy the world, or standing by while it happens’. 

(This would be more complicated if the people involved were confident that they wouldn’t destroy the world and I merely disagreed with them. But about half of surveyed researchers are actually more pessimistic than me. And in a situation where the median AI researcher thinks the field has a 5-10% chance of causing human extinction, how confident can any responsible person be in their own judgment that it is safe?)  

On top of all that, I worry that highlighting the narrative that wanting more cautious progress is defection is further destructive, because it makes it more likely that AI capabilities people see AI safety people as thinking of themselves as betraying AI researchers, if anyone engages in any such efforts. Which makes the efforts more aggressive. Like, if every time you see friends, you refer to it as ‘cheating on my partner’, your partner may reasonably feel hurt by your continual desire to see friends, even though the activity itself is innocuous.

‘We’ are not the US, ‘we’ are not the AI safety community

“If ‘we’ try to slow down AI, then the other side might win.” “If ‘we’ ask for regulation, then it might harm ‘our’ relationships with AI capabilities companies.” Who are these ‘we’s? Why are people strategizing for those groups in particular? 

Even if slowing AI were uncooperative, and it were important for the AI Safety community to cooperate with the AI capabilities community, couldn’t one of the many people not in the AI Safety community work on it? 

I have a longstanding irritation with thoughtless talk about what ‘we’ should do, without regard for what collective one is speaking for. So I may be too sensitive about it here. But I think confusions arising from this have genuine consequences.

I think when people say ‘we’ here, they generally imagine that they are strategizing on behalf of, a) the AI safety community, b) the USA, c) themselves or d) they and their readers. But those are a small subset of people, and not even obviously the ones the speaker can most influence (does the fact that you are sitting in the US really make the US more likely to listen to your advice than e.g. Estonia? Yeah probably on average, but not infinitely much.) If these naturally identified-with groups don’t have good options, that hardly means there are no options to be had, or to be communicated to other parties. Could the speaker speak to a different ‘we’? Maybe someone in the ‘we’ the speaker has in mind knows someone not in that group? If there is a strategy for anyone in the world, and you can talk, then there is probably a strategy for you.

The starkest appearance of error along these lines to me is in writing off the slowing of AI as inherently destructive of relations between the AI safety community and other AI researchers. If we grant that such activity would be seen as a betrayal (which seems unreasonable to me, but maybe), surely it could only be a betrayal if carried out by the AI safety community. There are quite a lot of people who aren’t in the AI safety community and have a stake in this, so maybe some of them could do something. It seems like a huge oversight to give up on all slowing of AI progress because you are only considering affordances available to the AI Safety Community. 

Another example: if the world were in the basic arms race situation sometimes imagined, and the United States would be willing to make laws to mitigate AI risk, but could not because China would barge ahead, then that means China is in a great place to mitigate AI risk. Unlike the US, China could propose mutual slowing down, and the US would go along. Maybe it’s not impossible to communicate this to relevant people in China. 

An oddity of this kind of discussion which feels related is the persistent assumption that one’s ability to act is restricted to the United States. Maybe I fail to understand the extent to which Asia is an alien and distant land where agency doesn’t apply, but for instance I just wrote to like a thousand machine learning researchers there, and maybe a hundred wrote back, and it was a lot like interacting with people in the US.

I’m pretty ignorant about what interventions will work in any particular country, including the US, but I just think it’s weird to come to the table assuming that you can essentially only affect things in one country. Especially if the situation is that you believe you have unique knowledge about what is in the interests of people in other countries. Like, fair enough I would be deal-breaker-level pessimistic if you wanted to get an Asian government to elect you leader or something. But if you think advanced AI is highly likely to destroy the world, including other countries, then the situation is totally different. If you are right, then everyone’s incentives are basically aligned. 

I more weakly suspect some related mental shortcut is misshaping the discussion of arms races in general. The thought that something is a ‘race’ seems much stickier than alternatives, even if the true incentives don’t really make it a race. Like, against the laws of game theory, people sort of expect the enemy to try to believe falsehoods, because it will better contribute to their racing. And this feels like realism. The uncertain details of billions of people one barely knows about, with all manner of interests and relationships, just really wants to form itself into an ‘us’ and a ‘them’ in zero-sum battle. This is a mental shortcut that could really kill us.

My impression is that in practice, for many of the technologies slowed down for risk or ethics, mentioned in section ‘Extremely valuable technologies’ above, countries with fairly disparate cultures have converged on similar approaches to caution. I take this as evidence that none of ethical thought, social influence, political power, or rationality are actually very siloed by country, and in general the ‘countries in contest’ model of everything isn’t very good.

Notes on tractability

Convincing people doesn’t seem that hard

When I say that ‘coordination’ can just look like popular opinion punishing an activity, or that other countries don’t have much real incentive to build machines that will kill them, I think a common objection is that convincing people of the real situation is hopeless. The picture seems to be that the argument for AI risk is extremely sophisticated and only able to be appreciated by the most elite of intellectual elites—e.g. it’s hard enough to convince professors on Twitter, so surely the masses are beyond its reach, and foreign governments too. 

This doesn’t match my overall experience on various fronts.

Some observations:

  • The median surveyed ML researcher seems to think AI will destroy humanity with 5-10% chance, as I mentioned
  • Often people are already intellectually convinced but haven’t integrated that into their behavior, and it isn’t hard to help them organize to act on their tentative beliefs
  • As noted by Scott, a lot of AI safety people have gone into AI capabilities including running AI capabilities orgs, so those people presumably consider AI to be risky already
  • I don’t remember ever having any trouble discussing AI risk with random strangers. Sometimes they are also fairly worried (e.g. a makeup artist at Sephora gave an extended rant about the dangers of advanced AI, and my driver in Santiago excitedly concurred and showed me Homo Deus open on his front seat). The form of the concerns are probably a bit different from those of the AI Safety community, but I think broadly closer to, ‘AI agents are going to kill us all’ than ‘algorithmic bias will be bad’. I can’t remember how many times I have tried this, but pre-pandemic I used to talk to Uber drivers a lot, due to having no idea how to avoid it. I explained AI risk to my therapist recently, as an aside regarding his sense that I might be catastrophizing, and I feel like it went okay, though we may need to discuss again. 
  • My impression is that most people haven’t even come into contact with the arguments that might bring one to agree precisely with the AI safety community. For instance, my guess is that a lot of people assume that someone actually programmed modern AI systems, and if you told them that in fact they are random connections jiggled in an gainful direction unfathomably many times, just as mysterious to their makers, they might also fear misalignment. 
  • Nick Bostrom, Eliezer Yudkokwsy, and other early thinkers have had decent success at convincing a bunch of other people to worry about this problem, e.g. me. And to my knowledge, without writing any compelling and accessible account of why one should do so that would take less than two hours to read.
  • I arrogantly think I could write a broadly compelling and accessible case for AI risk

My weak guess is that immovable AI risk skeptics are concentrated in intellectual circles near the AI risk people, especially on Twitter, and that people with less of a horse in the intellectual status race are more readily like, ‘oh yeah, superintelligent robots are probably bad’. It’s not clear that most people even need convincing that there is a problem, though they don’t seem to consider it the most pressing problem in the world. (Though all of this may be different in cultures I am more distant from, e.g. in China.) I’m pretty non-confident about this, but skimming survey evidence suggests there is substantial though not overwhelming public concern about AI in the US8.

Do you need to convince everyone?

I could be wrong, but I’d guess convincing the ten most relevant leaders of AI labs that this is a massive deal, worth prioritizing, actually gets you a decent slow-down. I don’t have much evidence for this.

Buying time is big

You probably aren’t going to avoid AGI forever, and maybe huge efforts will buy you a couple of years.9 Could that even be worth it? 

Seems pretty plausible:

  1. Whatever kind of other AI safety research or policy work people were doing could be happening at a non-negligible rate per year. (Along with all other efforts to make the situation better—if you buy a year, that’s eight billion extra person years of time, so only a tiny bit has to be spent usefully for this to be big. If a lot of people are worried, that doesn’t seem crazy.)
  2. Geopolitics just changes pretty often. If you seriously think a big determiner of how badly things go is inability to coordinate with certain groups, then every year gets you non-negligible opportunities for the situation changing in a favorable way. 
  3. Public opinion can change a lot quickly. If you can only buy one year, you might still be buying a decent shot of people coming around and granting you more years. Perhaps especially if new evidence is actively avalanching in—people changed their minds a lot in February 2020.
  4. Other stuff happens over time. If you can take your doom today or after a couple of years of random events happening, the latter seems non-negligibly better in general.

It is also not obvious to me that these are the time-scales on the table. My sense is that things which are slowed down by regulation or general societal distaste are often slowed down much more than a year or two, and Eliezer’s stories presume that the world is full of collectives either trying to destroy the world or badly mistaken about it, which is not a foregone conclusion.

Delay is probably finite by default 

While some people worry that any delay would be so short as to be negligible, others seem to fear that if AI research were halted, it would never start again and we would fail to go to space or something. This sounds so wild to me that I think I’m missing too much of the reasoning to usefully counterargue.

Obstruction doesn’t need discernment

Another purported risk of trying to slow things down is that it might involve getting regulators involved, and they might be fairly ignorant about the details of futuristic AI, and so tenaciously make the wrong regulations. Relatedly, if you call on the public to worry about this, they might have inexacting worries that call for impotent solutions and distract from the real disaster.

I don’t buy it. If all you want is to slow down a broad area of activity, my guess is that ignorant regulations do just fine at that every day (usually unintentionally). In particular, my impression is that if you mess up regulating things, a usual outcome is that many things are randomly slower than hoped. If you wanted to speed a specific thing up, that’s a very different story, and might require understanding the thing in question.

The same goes for social opposition. Nobody need understand the details of how genetic engineering works for its ascendancy to be seriously impaired by people not liking it. Maybe by their lights it still isn’t optimally undermined yet, but just not liking anything in the vicinity does go a long way.

This has nothing to do with regulation or social shaming specifically. You need to understand much less about a car or a country or a conversation to mess it up than to make it run well. It is a consequence of the general rule that there are many more ways for a thing to be dysfunctional than functional: destruction is easier than creation.

Back at the object level, I tentatively expect efforts to broadly slow down things in the vicinity of AI progress to slow down AI progress on net, even if poorly aimed.

Safety from speed, clout from complicity

Maybe it’s actually better for safety to have AI go fast at present, for various reasons. Notably:

  1. Implementing what can be implemented as soon as possible probably means smoother progress, which is probably safer because a) it makes it harder for one party shoot ahead of everyone and gain power, and b) people make better choices all around if they are correct about what is going on (e.g. they don’t put trust in systems that turn out to be much more powerful than expected).
  2. If the main thing achieved by slowing down AI progress is more time for safety research, and safety research is more effective when carried out in the context of more advanced AI, and there is a certain amount of slowing down that can be done (e.g. because one is in fact in an arms race but has some lead over competitors), then it might better to use one’s slowing budget later.
  3. If there is some underlying curve of potential for progress (e.g. if money that might be spent on hardware just grows a certain amount each year), then perhaps if we push ahead now that will naturally require they be slower later, so it won’t affect the overall time to powerful AI, but will mean we spend more time in the informative pre-catastrophic-AI era.
  4. (More things go here I think)

And maybe it’s worth it to work on capabilities research at present, for instance because:

  1. As a researcher, working on capabilities prepares you to work on safety
  2. You think the room where AI happens will afford good options for a person who cares about safety

These all seem plausible. But also plausibly wrong. I don’t know of a decisive analysis of any of these considerations, and am not going to do one here. My impression is that they could basically all go either way.

I am actually particularly skeptical of the final argument, because if you believe what I take to be the normal argument for AI risk—that superhuman artificial agents won’t have acceptable values, and will aggressively manifest whatever values they do have, to the sooner or later annihilation of humanity—then the sentiments of the people turning on such machines seem like a very small factor, so long as they still turn the machines on. And I suspect that ‘having a person with my values doing X’ is commonly overrated. But the world is messier than these models, and I’d still pay a lot to be in the room to try.

Moods and philosophies, heuristics and attitudes 

It’s not clear what role these psychological characters should play in a rational assessment of how to act, but I think they do play a role, so I want to argue about them.

Technological choice is not luddism

Some technologies are better than others [citation not needed]. The best pro-technology visions should disproportionately involve awesome technologies and avoid shitty technologies, I claim. If you think AGI is highly likely to destroy the world, then it is the pinnacle of shittiness as a technology. Being opposed to having it into your techno-utopia is about as luddite as refusing to have radioactive toothpaste there. Colloquially, Luddites are against progress if it comes as technology.10 Even if that’s a terrible position, its wise reversal is not the endorsement of all ‘technology’, regardless of whether it comes as progress.

Non-AGI visions of near-term thriving

Perhaps slowing down AI progress means foregoing our own generation’s hope for life-changing technologies. Some people thus find it psychologically difficult to aim for less AI progress (with its real personal costs), rather than shooting for the perhaps unlikely ‘safe AGI soon’ scenario.

I’m not sure that this is a real dilemma. The narrow AI progress we have seen already—i.e. further applications of current techniques at current scales—seems plausibly able to help a lot with longevity and other medicine for instance. And to the extent AI efforts could be focused on e.g. medically relevant narrow systems over creating agentic scheming gods, it doesn’t sound crazy to imagine making more progress on anti-aging etc as a result (even before taking into account the probability that the agentic scheming god does not prioritize your physical wellbeing as hoped). Others disagree with me here.

Robust priors vs. specific galaxy-brained models

There are things that are robustly good in the world, and things that are good on highly specific inside-view models and terrible if those models are wrong. Slowing dangerous tech development seems like the former, whereas forwarding arms races for dangerous tech between world superpowers seems more like the latter.11 There is a general question of how much to trust your reasoning and risk the galaxy-brained plan.12 But whatever your take on that, I think we should all agree that the less thought you have put into it, the more you should regress to the robustly good actions. Like, if it just occurred to you to take out a large loan to buy a fancy car, you probably shouldn’t do it because most of the time it’s a poor choice. Whereas if you have been thinking about it for a month, you might be sure enough that you are in the rare situation where it will pay off. 

On this particular topic, it feels like people are going with the specific galaxy-brained inside-view terrible-if-wrong model off the bat, then not thinking about it more. 

Cheems mindset/can’t do attitude

Suppose you have a friend, and you say ‘let’s go to the beach’ to them. Sometimes the friend is like ‘hell yes’ and then even if you don’t have towels or a mode of transport or time or a beach, you make it happen. Other times, even if you have all of those things, and your friend nominally wants to go to the beach, they will note that they have a package coming later, and that it might be windy, and their jacket needs washing. And when you solve those problems, they will note that it’s not that long until dinner time. You might infer that in the latter case your friend just doesn’t want to go to the beach. And sometimes that is the main thing going on! But I think there are also broader differences in attitudes: sometimes people are looking for ways to make things happen, and sometimes they are looking for reasons that they can’t happen. This is sometimes called a ‘cheems attitude’, or I like to call it (more accessibly) a ‘can’t do attitude’.

My experience in talking about slowing down AI with people is that they seem to have a can’t do attitude. They don’t want it to be a reasonable course: they want to write it off. 

Which both seems suboptimal, and is strange in contrast with historical attitudes to more technical problem-solving. (As highlighted in my dialogue from the start of the post.)

It seems to me that if the same degree of can’t-do attitude were applied to technical safety, there would be no AI safety community because in 2005 Eliezer would have noticed any obstacles to alignment and given up and gone home.

To quote a friend on this, what would it look like if we *actually tried*?


This has been a miscellany of critiques against a pile of reasons I’ve met for not thinking about slowing down AI progress. I don’t think we’ve seen much reason here to be very pessimistic about slowing down AI, let alone reason for not even thinking about it.

I could go either way on whether any interventions to slow down AI in the near term are a good idea. My tentative guess is yes, but my main point here is just that we should think about it.

A lot of opinions on this subject seem to me to be poorly thought through, in error, and to have wrongly repelled the further thought that might rectify them. I hope to have helped a bit here by examining some such considerations enough to demonstrate that there are no good grounds for immediate dismissal. There are difficulties and questions, but if the same standards for ambition were applied here as elsewhere, I think we would see answers and action.


Thanks to Adam Scholl, Matthijs Maas, Joe Carlsmith, Ben Weinstein-Raun, Ronny Fernandez, Aysja Johnson, Jaan Tallinn, Rick Korzekwa, Owain Evans, Andrew Critch, Michael Vassar, Jessica Taylor, Rohin Shah, Jeffrey Heninger, Zach Stein-Perlman, Anthony Aguirre, Matthew Barnett, David Krueger, Harlan Stewart, Rafe Kennedy, Nick Beckstead, Leopold Aschenbrenner, Michaël Trazzi, Oliver Habryka, Shahar Avin, Luke Muehlhauser, Michael Nielsen, Nathan Young and quite a few others for discussion and/or encouragement.


1 I haven’t heard this in recent times, so maybe views have changed. An example of earlier times: Nick Beckstead, 2015: “One idea we sometimes hear is that it would be harmful to speed up the development of artificial intelligence because not enough work has been done to ensure that when very advanced artificial intelligence is created, it will be safe. This problem, it is argued, would be even worse if progress in the field accelerated. However, very advanced artificial intelligence could be a useful tool for overcoming other potential global catastrophic risks. If it comes sooner—and the world manages to avoid the risks that it poses directly—the world will spend less time at risk from these other factors….

I found that speeding up advanced artificial intelligence—according to my simple interpretation of these survey results—could easily result in reduced net exposure to the most extreme global catastrophic risks…”

2 This is closely related to Bostrom’s Technological completion conjecture: “If scientific and technological development efforts do not effectively cease, then all important basic capabilities that could be obtained through some possible technology will be obtained.” (Bostrom, Superintelligence, pp. 228, Chapter 14, 2014)

Bostrom illustrates this kind of position (though apparently rejects it; from Superintelligence, found here): “Suppose that a policymaker proposes to cut funding for a certain research field, out of concern for the risks or long-term consequences of some hypothetical technology that might eventually grow from its soil. She can then expect a howl of opposition from the research community. Scientists and their public advocates often say that it is futile to try to control the evolution of technology by blocking research. If some technology is feasible (the argument goes) it will be developed regardless of any particular policymaker’s scruples about speculative future risks. Indeed, the more powerful the capabilities that a line of development promises to produce, the surer we can be that somebody, somewhere, will be motivated to pursue it. Funding cuts will not stop progress or forestall its concomitant dangers.”

This kind of thing is also discussed by Dafoe and Sundaram, Maas & Beard

3 (Some inspiration from Matthijs Maas’ spreadsheet, from Paths Untaken, and from GPT-3.)

4 From a private conversation with Rick Korzekwa, who may have read https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1139110/ and an internal draft at AI Impacts, probably forthcoming.

5 More here and here. I haven’t read any of these, but it’s been a topic of discussion for a while.

6 “To aid in promoting secrecy, schemes to improve incentives were devised. One method sometimes used was for authors to send papers to journals to establish their claim to the finding but ask that publication of the papers be delayed indefinitely.26,27,28,29 Szilárd also suggested offering funding in place of credit in the short term for scientists willing to submit to secrecy and organizing limited circulation of key papers.30” – Me, previously

7 ‘Lock-in’ of values is the act of using powerful technology such as AI to ensure that specific values will stably control the future.

8 And also in Britain:

‘This paper discusses the results of a nationally representative survey of the UK population on their perceptions of AI…the most common visions of the impact of AI elicit significant anxiety. Only two of the eight narratives elicited more excitement than concern (AI making life easier, and extending life). Respondents felt they had no control over AI’s development, citing the power of corporations or government, or versions of technological determinism. Negotiating the deployment of AI will require contending with these anxieties.’

9 Or so worries Eliezer Yudkowsky—
In MIRI announces new “Death With Dignity” strategy:

  • “… this isn’t primarily a social-political problem, of just getting people to listen.  Even if DeepMind listened, and Anthropic knew, and they both backed off from destroying the world, that would just mean Facebook AI Research destroyed the world a year(?) later.”

In AGI Ruin: A List of Lethalities:

  • “We can’t just “decide not to build AGI” because GPUs are everywhere, and knowledge of algorithms is constantly being improved and published; 2 years after the leading actor has the capability to destroy the world, 5 other actors will have the capability to destroy the world.  The given lethal challenge is to solve within a time limit, driven by the dynamic in which, over time, increasingly weak actors with a smaller and smaller fraction of total computing power, become able to build AGI and destroy the world.  Powerful actors all refraining in unison from doing the suicidal thing just delays this time limit – it does not lift it, unless computer hardware and computer software progress are both brought to complete severe halts across the whole Earth.  The current state of this cooperation to have every big actor refrain from doing the stupid thing, is that at present some large actors with a lot of researchers and computing power are led by people who vocally disdain all talk of AGI safety (eg Facebook AI Research).  Note that needing to solve AGI alignment only within a time limit, but with unlimited safe retries for rapid experimentation on the full-powered system; or only on the first critical try, but with an unlimited time bound; would both be terrifically humanity-threatening challenges by historical standards individually.”

10 I’d guess real Luddites also thought the technological changes they faced were anti-progress, but in that case were they wrong to want to avoid them?

11 I hear this is an elaboration on this theme, but I haven’t read it.

12 Leopold Aschenbrenner partly defines ‘Burkean Longtermism’ thus: “We should be skeptical of any radical inside-view schemes to positively steer the long-run future, given the froth of uncertainty about the consequences of our actions.”

New to LessWrong?

New Comment
183 comments, sorted by Click to highlight new comments since: Today at 4:24 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

If the balance of opinion of scientists and policymakers (or those who had briefly heard arguments) was that AI catastrophic risk is high, and that this should be a huge social priority, then you could do a lot of things. For example, you could get budgets of tens of billions of dollars for interpretability research, the way governments already provide tens of billions of dollars of subsidies to strengthen their chip industries. Top AI people would be applying to do safety research in huge numbers. People like Bill Gates and Elon Musk who nominally take AI risk seriously would be doing stuff about it, and Musk could have gotten more traction when he tried to make his case to government.

My perception based on many areas of experience is that policymakers and your AI expert survey respondents on the whole think that these risks are too speculative and not compelling enough to outweigh the gains from advancing AI rapidly (your survey respondents state those are much more likely than the harms). In particular, there is much more enthusiasm for the positive gains from AI than your payoff matrix suggests (particularly among AI researchers), and more mutual fear (e.g. the CCP does not wan... (read more)

I think this comment is overstating the case for policymakers and the electorate actually believing that investing in AI is good for the world. I think the answer currently is "we don't know what policymakers and the electorate actually want in relation to AI" as well as "the relationship of policymakers and the electorate is in the middle of shifting quite rapidly, so past actions are not that predictive of future actions".

I really only have anecdata to go on (though I don't think anyone has much better), but my sense from doing informal polls of e.g. Uber drivers, people on Twitter, and perusing a bunch of Subreddits (which, to be clear, is a terrible sample) is that indeed a pretty substantial fraction of the world is now quite afraid of the consequences of AI, both in a "this change is happening far too quickly and we would like it to slow down" sense, and in a "yeah, I am actually worried about killer robots killing everyone" sense. I think both of these positions are quite compatible with pushing for a broad slow down. There is also a very broad and growing "anti-tech" movement that is more broadly interested in giving less resources to the tech sector, whose aims are at leas... (read more)

I ended up doing some quick google searches for AI opinion polls

I collected such polls here, if you want to see more. Most people say they want to regulate AI.

I agree there is some weak public sentiment in this direction (with the fear of AI takeover being weaker). Privacy protections and redistribution don't particularly favor measures to avoid AI apocalypse.

 I'd also mention this YouGov survey:

But the sentiment looks weak compared to e.g. climate change and nuclear war,  where fossil fuel production and nuclear arsenals continue, although there are significant policy actions taken in hopes of avoiding those problems. The sticking point is policymakers and the scientific community. At the end of the Obama administration the President asked scientific advisors what to make of Bostrom's Superintelligence, and concluded not to pay attention to it because it was not an immediate threat. If policymakers and their advisors and academia and the media think such public concerns are confused, wrongheaded, and not politically powerful they won't work to satisfy them against more pressing concerns like economic growth and national security. This is a lot worse than the situation for climate change, which is why it seems better regulation requires that the expert and elite debate play out differently, or the hope that later circumstances ... (read more)

But the sentiment looks weak compared to e.g. climate change and nuclear war,  where fossil fuel production and nuclear arsenals continue,

That seems correct to me, but on the other hand, I think the public sentiment against things like GMOs was also weaker than the one that we currently have against climate change, and GMOs got slowed down regardless. Also I'm not sure how strong the sentiment against nuclear power was relative to the one against climate change, but in any case, nuclear power got hindered quite a bit too.

I think one important aspect where fossil fuels are different from GMOs and nuclear power is that fossil fuel usage is firmly entrenched across the economy and it's difficult, costly, and slow to replace it. Whereas GMOs were a novel thing and governments could just decide to regulate them and slow them down without incurring major immediate costs. As for nuclear power, it was somewhat entrenched in that there were many existing plants, but society could make the choice to drastically reduce the progress of building new ones - which it did.

Nuclear arsenals don't quite fit this model - in principle, one could have stopped expanding them, but they did keep growi... (read more)

I'll shill here and say that Rethink Priorities is pretty good at running polls of the electorate if anyone wants to know what a representative sample of Americans think about a particular issue such as this one. No need to poll Uber drivers or Twitter when you can do the real thing!

I'd very much like to see this done with standard high-quality polling techniques, e.g. while airing counterarguments (like support for expensive programs that looks like majority but collapses if higher taxes to pay for them is mentioned). In particular, how the public would react given different views coming from computer scientists/government commissions/panels.

I think that could be valuable. It might be worth testing quite carefully for robustness - to ask multiple different questions probing the same issue, and see whether responses converge. My sense is that people's stated opinions about risks from artificial intelligence, and existential risks more generally, could vary substantially depending on framing. Most haven't thought a lot about these issues, which likely contributes. I think a problem problem with some studies on these issues is that researchers over-generalise from highly framing-dependent survey responses.

That makes a lot of sense. We can definitely test a lot of different framings. I think the problem with a lot of these kinds of problems is that they are low saliency, and thus people tend not to have opinions already, and thus they tend to generate an opinion on the spot. We have a lot of experience polling on low saliency issues though because we've done a lot of polling on animal farming policy which has similar framing effects.

I would definitely vote in favor of a grant to do this on the LTFF, as well as the SFF, and might even be interested in backstopping it with my personal funds or Lightcone funds.
6Peter Wildeford1y
Cool - I'll follow up when I'm back at work.
I think that's exactly right.
Hi! Did this ever happen or is it being worked on currently? I am interested in doing some work related to this but I don't want to be redundant. 
I found this thread interesting and useful, but I feel a key point has been omitted thus far (from what I've read):  * Public, elite, and policymaker beliefs and attitudes related to AI risk aren't just a variable we (members of the EA/longtermist/AI safety communities) have to bear in mind and operate in light of, but instead also a variable we can intervene on.  * And so far I'd say we have (often for very good reasons) done significantly less to intervene on that variable than we could've or than we could going forward.  * So it seems plausible that actually these people are fairly convincible if exposed to better efforts to really explain the arguments in a compelling way. We've definitely done a significant amount of this kind of work, but I think we've often (a) deliberately held back on doing so or on conveying key parts of the arguments, due to reasonable downside risk concerns, and (b) not prioritized this. And I think there's significantly more we could do if we wanted to, especially after a period of actively building capacity for this.  Important caveats / wet blankets: * I think there are indeed strong arguments against trying to shift relevant beliefs and attitudes in a more favorable direction, including not just costs and plausibly low upside but also multiple major plausible downside risks.[1]  * So I wouldn't want anyone to take major steps in this direction without checking in with multiple people working on AI safety/governance first.  * And it's not at all obvious to me we should be doing more of that sort of work. (Though I think whether, how, & when we should is an important question and I'm aware of and excited about a couple small research projects that are happening on that.)  All I really want to convey in this comment is what I said in my first paragraph: we may be able to significantly push beliefs and opinions in favorable directions relative to where they are now or would be n future by default.  1. ^ Due to time

I think I would have totally agreed in 2016. One update since then is that I think progress scales way less than resources than I used to think it did. In many historical cases, a core component of progress driven by a small number of people (which is reflected in citation counts, who is actually taught in textbooks), and introducing lots of funding and scaling too fast can disrupt that by increasing the amount of fake work.

$1B in safety well-spent is clearly more impactful than $1B less in semiconductors, it's just that "well-spent" is doing a lot of work, someone with a lot of money is going to have lots of people trying to manipulate their information environment to take their stuff.

Reducing especially dangerous tech progress seems more promising than reducing tech broadly, however since these are dual use techs, creating knowledge about which techs are dangerous can accelerate development in these sectors (especially the more vice signalling / conflict orientation is going on). This suggests that perhaps an effective way to apply this strategy is to recruit especially productive researchers (identified using asymmetric info) to labs where they work on something less dangerous.

In gain of function research and nuclear research, progress requires large expensive laboratories; AI theory progress doesn't require that, although large scale training does (though, to a lesser extent than GOF or nuclear).

There are plenty of movements out there (ethics & inclusion, digital democracy, privacy, etc.) who are against current directions of AI developments, and they don’t need the AGI risk argument to be convinced that current corporate scale-up of AI models is harmful.

Working with them, redirecting AI developments away from more power-consolidating/general AI may not be that much harder than investing in supposedly “risk-mitigating” safety research.

Do you think there is a large risk of AI systems killing or subjugating humanity autonomously related to scale-up of AI models?

A movement pursuing antidiscrimination or privacy protections for applications of AI that thinks the risk of AI autonomously destroying humanity is nonsense seems like it will mainly demand things like the EU privacy regulations, not bans on using $10B of GPUs instead of $10M in a model. It also seems like it wouldn't pursue measures targeted at the kind of disaster it denies, and might actively discourage them (this sometimes happens already). With a threat model of privacy violations restrictions on model size would be a huge lift and the remedy wouldn't fit the diagnosis in a way that made sense to policymakers. So I wouldn't expect privacy advocates to bring them about based on their past track record, particularly in China where privacy and digital democracy have not had great success.

If it in fact is true that there is a large risk of almost everyone alive today being killed or subjugated by AI, then establishing that as scientific consensus seems like it would supercharge a response dwarfing current efforts for things like privacy rules, which would ... (read more)

A movement pursuing antidiscrimination or privacy protections for applications of AI that thinks the risk of AI autonomously destroying humanity is nonsense seems like it will mainly demand things like the EU privacy regulations,  not bans on using $10B of GPUs instead of $10M in a model". 

I can imagine there being movements that fit this description, in which case I would not focus on talking with them or talking about them. 

But I have not been in touch with any movements matching this description. Perhaps you could share specific examples of actions from specific movements you have in mind? 

For the movements I have in mind (and am talking with), the description does not match at all:

  1. AI ethics and inclusion movements go a lot further than stopping people from building AI that eg. make discriminatory classifications/recommendations associated with marginalised communities – they want Western corporations to stop consolidating power through AI development and deployment while pushing their marginalised communities further out of the loop (rendering them voiceless). 
  2. Digital democracy groups and human-centric AI movements go a lot further than wanting to regul
... (read more)

I agree that some specific leaders you cite have expressed distaste for model scaling, but it seems not to be a core concern. In a choice between more politically feasible measures that target concerns they believe are real vs concerns they believe are imaginary and bad, I don't think you get the latter. And I think arguments based on those concerns get traction on measures addressing the concerns, but less so on secondary wishlist items of leaders .

I think that's the reason privacy advocacy in legislation and the like hasn't focused on banning computers in the past (and would have failed if they tried). For example:

If privacy and data ownership movements take their own claims seriously (and some do), they would push for banning the training of ML models on human-generated data or any sensor-based surveillance that can be used to track humans' activities. 

AGI working with AI generated data or data shared under the terms and conditions of web services can power the development of highly intelligent  catastrophically dangerous systems, and preventing AI from reading published content doesn't seem close to the core motives there, especially for public support on privacy. So ... (read more)

I intend to respond to the rest tomorrow. Some of your interpretations of writings by Timnit Gebru and Glen Weyl seem fair to me (though would need to ask them to confirm). I have not look much into Jaron Lanier’s writings on AGI so that prompts me to google that. Perhaps you can clarify the other reasons why the changes in learning would be stable in preserving “good properties”? I’ll respond to your nuances regarding how to interpret your long-term-evaluating error correcting code after that.
re: Leaders of movements being skeptical of the notion of AGI. Reflecting more, my impression is that Timnit Gebru is skeptical about the sci-fiy descriptions of AGI, and even more so about the social motives of people working on developing (safe) AGI. She does not say that AGI is an impossible concept or not actually a risk. She seems to question the overlapping groups of white male geeks who have been diverting efforts away from other societal issues, to both promoting AGI development and warning of AGI x-risks.  Regarding Jaron Lanier, yes, (re)reading this post I agree that he seems to totally dismiss the notion of AGI, seeing it more a result of a religious kind of thinking under which humans toil away at offering the training data necessary for statistical learning algorithms to function without being compensated.  
Returning on error correction point: Feel free to still clarify the other reasons why the changes in learning would be stable in preserving “good properties”. Then I will take that starting point to try explain why the mutually reinforcing dynamics of instrumental convergence and substrate-needs convergence override that stability. Fundamentally though, we'll still be discussing the application limits of error correction methods.  Three ways to explain why: * Any workable AI-alignment method involves receiving input signals, comparing input signals against internal references, and outputting corrective signals to maintain alignment of outside states against those references (ie. error correction).  * Any workable AI-alignment method involves a control feedback loop – of detecting the actual (or simulating the potential) effects internally and then correcting actual (or preventing the potential) effects externally (ie. error correction). * Eg. mechanistic interpretability is essentially about "detecting the actual (or simulating the potential) effects internally" of AI. * The only way to actually (slightly) counteract AGI convergence on causing "instrumental" and "needed" effects within a more complex environment is to simulate/detect and then prevent/correct those environmental effects (ie. error correction).   ~ ~ ~ Which brings us back to why error correction methods, of any kind and in any combination, cannot ensure long-term AGI Safety. I reread your original post and Christiano's comment to understand your reasoning better and see how I could limits of applicability of error correction methods.  I also messaged Forrest (the polymath) to ask for his input.  The messages were of a high enough quality that I won't bother rewriting the text. Let me copy-paste the raw exchange below (with few spelling edits). Remmelt 15:37 @Forrest, would value your thoughts on the way Carl Schulman is thinking about error correcting code, perhaps to pass on on th
I addressed claims of similar forms at least 3 times times already on separate occasions (including in the post itself). Suggest reading this: https://www.lesswrong.com/posts/bkjoHFKjRJhYMebXr/the-limited-upside-of-interpretability?commentId=wbWQaWJfXe7RzSCCE “The fact that mechanistic interpretability can possibly be used to detect a few straightforwardly detectable misalignment of the kinds you are able to imagine right now does not mean that the method can be extended to detecting/simulating most or all human-lethal dynamics manifested in/by AGI over the long term. If AGI behaviour converges on outcomes that result in our deaths through less direct routes, it really does not matter much whether the AI researcher humans did an okay job at detecting "intentional direct lethality" and "explicitly rendered deception".”

This is like saying there's no value to learning about and stopping a nuclear attack from killing you because you might get absolutely no benefit from not being killed then, and being tipped off about a threat trying to kill you, because later the opponent might kill you with nanotechnology before you can prevent it.

Removing intentional deception or harm greatly increases the capability of AIs that can be worked with without getting killed, to further improve safety measures. And as I said actually being able to show a threat to skeptics is immensely better for all solutions, including relinquishment, than controversial speculation.

No, it's not like that.  It's saying that if you can prevent a doomsday device from being lethal in some ways and not in others, then it's still lethal. Focussing on some ways that you feel confident that you might be able to prevent the doomsday device from being lethal is IMO distracting dangerously from the point, which is that people should not built the doomsday device in the first place.
If mechanistic interpretability methods cannot prevent that interactions of AGI necessarily converge on total human extinction beyond theoretical limits of controllability, it means that these (or other "inspect internals") methods cannot contribute to long-term AGI safety.  And this is not idle speculation, nor based on prima facie arguments. It is based on 15 years of research by a polymath working outside this community. In that sense, it would not really matter that mechanistic interpretability can do an okay job at detecting that a power-seeking AI was explicitly plotting to overthrow humanity. That is, except for the extremely unlikely case you pointed to that such intentions are detected and on time, and humans all coordinate at once to impose an effective moratorium on scaling or computing larger models. But this is actually speculation, whereas that OpenAI promoted Olah's fascinating Microscope-generated images as them making progress on understanding and aligning scalable ML models is not speculation.  Overall, my sense is that mechanistic interpretability is used to align-wash capability progress towards AGI, while not contributing to safety where it predominantly matters.
Exactly this kind of thinking is what I am concerned about. It implicitly assumes that you have a (sufficiently) comprehensive and sound understanding of the ways humans would get killed at a given level of capability, and therefore can rely on that understanding to conclude that capabilities of AIs can be greatly increased without humans getting killed. How do you think capability developers would respond to that statement? Will they just stay on the safe side, saying "Well those alignment researchers say that mechanistic interpretability helps remove intentional deception or harm, but I'm just going to stay on the safe side and not scale any further". No, they are going to use your statement to promote the potential safety of their scalable models, and remove whatever safety margin they can justify themselves taking and feel justified taking for themselves. Not considering unknown unknowns is going to get us killed. Not considering what safety problems may be unsolvable is going to get us killed.  Age-old saying: "It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so."
Sorry if I missed it earlier in the thread, but who is this "polymath"?
Forrest Landry.  From Math Expectations, a depersonalised post Forrest wrote of his impressions of a conversation with a grant investigator where the grant investigator kept looping back on the expectation that a "proof" based on formal reasoning must be written in mathematical notation. We did end up receiving the $170K grant. I usually do not mention Forrest Landry's name immediately for two reasons: 1. If you google his name, he comes across like a spiritual hippie. Geeks who don't understand his use of language take that as a cue that he must not know anything about computational science, mathematics or physics (wrong – Forrest has deep insights into programming methods and eg. why Bell's Theorem is a thing) . 2. Forrest prefers to work on the frontiers of research, rather than repeating himself in long conversations with tech people who cannot let go off their own mental models and quickly jump to motivated counterarguments that he heard and addressed many times before. So I act as a bridge-builder, trying to translate between Forrest speak and Alignment Forum speak.  1. Both of us prefer to work behind the scenes. I've only recently started to touch on the arguments in public. 2. You can find those arguments elaborated on here.  Warning: large inferential distance; do message clarifying questions – I'm game!
No, it's not like that.  It's saying that if you can prevent a doomsday device from being lethal in some ways and not in others, then it's still lethal. Focussing on some ways that you feel confident that you might be able to prevent the doomsday device from being lethal is IMO distracting dangerously from the point, which is that people should not built the doomsday device in the first place.

As requested by Remmelt I'll make some comments on the track record of privacy advocates, and their relevance to alignment.

I did some active privacy advocacy in the context of the early Internet in the 1990s, and have been following the field ever since. Overall, my assessment is that the privacy advocacy/digital civil rights community has had both failures and successes. It has not succeeded (yet) in its aim to stop large companies and governments from having all your data. On the other hand, it has been more successful in its policy advocacy towards limiting what large companies and governments are actually allowed to do with all that data.

The digital civil rights community has long promoted the idea that Internet based platforms and other computer systems must be designed and run in a way that is aligned with human values. In the context of AI and ML based computer systems, this has led to demands for AI fairness and transparency/explainability that have also found their way into policy like the GDPR, legislation in California, and the upcoming EU AI Act. AI fairness demands have influenced the course of AI research being done, e.g. there has been research on defining i... (read more)

This is insightful for me, thank you! Also, I stand corrected then on my earlier comment on that privacy and digital ownership advocates would/should care about models being trained on their own/person-tracking data such to restrict the scaling of models. I’m guessing I was not tracking well then what people in at least the civil rights spaces Koen moves around in are thinking and would advocate for.

A movement pursuing antidiscrimination or privacy protections for applications of AI that thinks the risk of AI autonomously destroying humanity is nonsense seems like it will mainly demand things like the EU privacy regulations, not bans on using $10B of GPUs instead of $10M in a model.

This is a very spicy take, but I would (weakly) guess that a hypothetical ban on ML trainings that cost more than $10M would make AGI timelines marginally shorter rather than longer, via shifting attention and energy away from scaling and towards algorithm innovation.

Very interesting! Recently, US started to regulate export of computing power to China. Do you expect this to speed up AGI timeline in China, or do you expect regulation to be ineffective, or something else?

Reportedly, NVIDIA developed A800, which is just A100, to keep the letter but probably not the spirit of the regulation. I am trying to follow closely how A800 fares, because it seems to be an important data point on feasibility of regulating computing power.

1Nathan Helm-Burger1y
I strongly agree with Steven about this. Personally, I expect it'll be non-impactful in either direction. I think the majority of research groups already have sufficient compute available to make dangerous algorithmic progress, and they are not so compute-resource-rich that their scaling efforts are distracting them from more dangerous pursuits. I think the groups who would be more dangerous if they weren't 'resource drunk' are mainly researchers at big companies.
I think the two camps are less orthogonal than your examples of privacy and compute reg portray. There's room for plenty of excellent policy interventions that both camps could work together to support. For instance, increasing regulatory requirements for transparency on algorithmic decision-making (and crucially, building a capacity both in regulators and in the market supporting them to enforce this) is something that I think both camps would get behind (the xrisk one because it creates demand for interpretability and more and the other because eg. it's easier to show fairness issues) and could productively work on together. I think there are subculture clash reasons the two camps don't always get on, but that these can be overcome, particularly given there's a common enemy (misaligned powerful AI). See also this paper Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI Ethics and Society I know lots of people who are uncertain about how big the risks are, and care about both problems, and work on both (I am one of these - I care more about AGI risk, but I think the best things I can do to help avert it involve working with the people you think aren't helpful).
Seems reason regarding public policy. But what about 1. private funders of AGI-relevant research 2. researchers doing AGI-relevant research? Seems like there's a lot of potential reframings that make it more feasible to separate safe-ish research from non-safe-ish research. E.g. software 2.0: we're not trying to make a General Intelligence, we're trying to replace some functions in our software with nets learned from data. This is what AlphaFold is like, and I assume is what ML for fusion energy is like. If there's a real category like this, a fair amount of the conflict might be avoidable? 

Most AI companies and most employees there seem not to buy risk much, and to assign virtually no resources to address those issues. Unilaterally holding back from highly profitable AI when they won't put a tiny portion of those profits into safety mitigation again looks like an ask out of line with their weak interest. Even at the few significant companies with higher percentages of safety effort, it still looks to me like the power-weighted average of staff is extremely into racing to the front, at least to near the brink of catastrophe or until governments buy risks enough to coordinate slowdown.

So asks like investing in research that could demonstrate problems with higher confidence, or making models available for safety testing, or similar still seem much easier to get from those companies than stopping (and they have reasonable concerns that their unilateral decision might make the situation worse by reducing their ability to do helpful things, while regulatory industry-wide action requires broad support).

As with government, generating evidence and arguments that are more compelling could be super valuable, but pretending you have more support than you do yields incorrect recommendations about what to try.


looks to me like the power-weighted average of staff is extremely into racing to the front, at least to near the brink of catastrophe or until governments buy risks enough to coordinate slowdown.

Can anyone say confident why? Is there one reason that predominates, or several? Like it's vaguely something about status, money, power, acquisitive mimesis, having a seat at the table... but these hypotheses are all weirdly dismissive of the epistemics of these high-powered people, so either we're talking about people who are high-powered because of the managerial revolution (or politics or something), or we're talking about researchers who are high-powered because they're given power because they're good at research. If it's the former, politics, then it makes sense to strongly doubt their epistemics on priors, but we have to ask, why can they meaningfully direct the researchers who are actually good at advancing capabilities? If it's the latter, good researchers have power, then why are their epistemics suddenly out the window here? I'm not saying their epistemics are actually good, I'm saying we have to understand why they're bad if we're going to slow down AI through this central route. 

There are a lot of pretty credible arguments for them to try, especially with low risk estimates for AI disempowering humanity, and if their percentile of responsibility looks high within the industry.

One view is that the risk of AI turning against humanity is less than the risk of a nasty eternal CCP dictatorship if democracies relinquish AI unilaterally. You see this sort of argument made publicly by people like Eric Schmidt, and 'the real risk isn't AGI revolt, it's bad humans' is almost a reflexive take for many  in online discussion of AI risk. That view can easily combine with the observation that there has been even less takeup of AI safety in China thus far than in liberal democracies, and mistrust of CCP decision-making and honesty, so it also reduces accident risk.

With respect to competition with other companies in democracies, some labs can correctly say that they have taken action that signals they are more into taking actions towards safety or altruistic values (including based on features like control by non-profit boards or % of staff working on alignment), and will have vastly more AI expertise, money, and other resources to promote those goals in the future by... (read more)

Thank you, this seems like a high-quality steelman (I couldn't judge if it passes an ITT). 


1Gerald Monroe1y
Taking an extreme perspective here: do future generations of people not alive and who no one alive now would meet have any value? One perspective is no they don't. From that perspective "humanity" continues only as some arbitrary random numbers from our genetics. Even Clippy probably keeps at least one copy of the human genome in a file somewhere so it's the same case. That is, there is no difference between the outcomes of: 1. we delay AI a few generations and future generations of humanity take over the galaxy 2. we fall to rampant AIs and their superintelligent descendants take over the galaxy If you could delay AI long enough you would be condemning the entire population of the world to death from aging, or essentially the same case where the rampant AI kills the entire world.
0Nathan Helm-Burger1y
Carl S. One view is that the risk of AI turning against humanity is less than the risk of a nasty eternal CCP dictatorship if democracies relinquish AI unilaterally. You see this sort of argument made publicly by people like Eric Schmidt, and 'the real risk isn't AGI revolt, it's bad humans' is almost a reflexive take for many in online discussion of AI risk. That view can easily combine with the observation that there has been even less takeup of AI safety in China thus far than in liberal democracies, and mistrust of CCP decision-making and honesty, so it also reduces accident risk. My thought: seems like a convincing demonstration of risk could be usefully persuasive.
I'll make an even stronger statement: So long as the probabilities of a technological singularity isn't too low, they can still rationally keep working on it even if they know the risk is high, because the expected utility is much greater still.
This comment employs an oddly common failure mode of ignoring intermediate successes that align with market incentives, like "~N% of AI companies stop publishing their innovations on Arxiv for free".
Those are good points. There are some considerations that go in the other direction. Sometimes it's not obvious what's a "failure to convince people" vs. "a failure of some people to be convincible." (I mean convincible by object-level arguments as opposed to convincible through social cascades where a particular new view reaches critical mass.)  I believe both of the following:  * Persuasion efforts haven't been exhausted yet: we can do better at reaching not-yet-safety-concerned AI researchers. (That said, I think it's at least worth considering that we're getting close to exhausting low-hanging fruit?) * Even so, "persuasion as the main pillar of a strategy" is somewhat likely to be massively inadequate because it's difficult to change the minds and culture of humans in general (even if they're smart), let alone existing organizations. Another point that's maybe worth highlighting is that the people who could make large demands don't have to be the same people who are best-positioned for making smaller asks. (This is Katja's point about there not being a need for everyone to coordinate into a single "we.") The welfarism vs. abolitionism debate in animal advocacy and discussion of the radical flank effect seems related. I also agree with a point lc makes in his post on slowing down AI. He points out that there's arguably a "missing mood" around the way most people in EA and the AI alignment community communicate with safety-unconcerned researchers. The missing sense of urgency probably lowers the chance of successful persuasion efforts? Lastly, it's a challenge that there's little consensus in the EA research community around important questions like "How hard is AI alignment?," "How hard is alignment conditional on <5 years to TAI?," and "How long are TAI timelines?" (Though maybe there's quite some agreement on the second one and the answer is at least, "it's not easy?") I'd imagine there would at least be quite a strong EA expert consensus on the followi

Thank you, this is a good post.

My main point of disagreement is that you point to successful coordination in things like not eating sand, or not wearing weird clothing. The upside of these things is limited, but you say the upside of superintelligence is also limited because it could kill us.

But rephrase the question to "Should we create an AI that's 1% better than the current best AI?" Most of the time this goes well - you get prettier artwork or better protein folding prediction, and it doesn't kill you. So there's strong upside to building slightly better AIs, as long as you don't cross the "kills everyone" level. Which nobody knows the location of. And which (LW conventional wisdom says) most people will be wrong about.

We successfully coordinate a halt to AI advancement at the first point where more than half of the relevant coordination power agrees that the next 1% step forward is in expectation bad rather than good. But "relevant" is a tough qualifier, because if 99 labs think it's bad, and one lab thinks it's good, then unless there's some centralizing force, the one lab can go ahead and take the step. So "half the relevant coordination power" has to include either every la... (read more)

I loved the link to the "Resisted Technological Temptations Project", for a bunch of examples of resisted/slowed technologies that are not "eating sand", and have an enormous upside: https://wiki.aiimpacts.org/doku.php?id=responses_to_ai:technological_inevitability:incentivized_technologies_not_pursued:start

  • GMOs, in some countries
  • Nuclear power, in some countries
  • Genetic engineering of humans
  • Geoengineering, many actors
  • Chlorofluorocarbons, many actors, 1985-present
  • Human challenge trials
  • Dietary restrictions, in most (all?) human cultures [restrict much more than sand, often quite good stuff!]

I would tentatively add:

  • organ donor markets (at least for kidneys)
  • drug development in general (see all of Scott's posts on the FDA slowing things down, I would love to see an AIA slowing things down)
Agreed. My main objection to the post is that it considers the involved agents to be optimizing for far future world-states. But I'd say that most people (including academics and AI lab researchers) mostly only think of the next 1% step in front of their nose. The entire game theoretic framing in the arms race etc section seems wrong to me.
This seems to suggest "should we relax nuclear power regulation 1% less expensive to comply?" as a promising way to fix economics of nuclear power, and I don't buy that at all. Maybe it's different because Chernobyl happened, and the movie like The China Syndrome was made about nuclear accident? That sounds very hopeful to me but doesn't seem true to me. It implies slowing down AI will be easy, it just needs Chernobyl-sized disaster and a good movie about it. Chernobyl disaster was nearly harmless compared to COVID-19, and even COVID-19 was hardly an existential threat. If slowing down AI is this easy we probably shouldn't waste time worrying about it before Chernobyl.
The difference between regulation and research is that the former has a large amount of friction, making it about as hard to push a 1% regulation through as a 10% one. In contrast, the incremental 1% improvements in the development of capabilities is just what happens by default, as research organizations follow their charter.

Katja, many thanks for writing this, and Oliver, thanks for this comment pointing out that everyday people are in fact worried about AI x-risk.  Since around 2017 when I left MIRI to rejoin academia, I have been trying continually to point out that everyday people are able to easily understand the case for AI x-risk, and that it's incorrect to assume the existence of AI x-risk can only be understood by a very small and select group of people.  My arguments have often been basically the same as yours here: in my case, informal conversations with Uber drivers, random academics, and people at random public social events.  Plus, the argument is very simple: If things are smarter than us, they can outsmart us and cause us trouble.  It's always seemed strange to say there's an "inferential gap" of substance here.

However, for some reason, the idea that people outside the LessWrong community might recognize the existence of AI x-risk — and therefore be worth coordinating with on the issue — has felt not only poorly received on LessWrong, but also fraught to even suggest.  For instance, I tried to point it out in this previous post:

“Pivotal Act” Intentions: Negative... (read more)

The question feels leading enough that I don't really know how to respond. Many of these sentences sound pretty crazy to me, so I feel like I primarily want to express frustration and confusion that you assign those sentences to me or "most of the LessWrong community".

However, for some reason, the idea that people outside the LessWrong community might recognize the existence of AI x-risk — and therefore be worth coordinating with on the issue — has felt not only poorly received on LessWrong, but also fraught to even suggest. For instance, I tried to point it out in this previous post:

I think John Wentworth's question is indeed the obvious question to ask. It does really seem like our prior should be that the world will not react particularly sanely here.

I also think it's really not true that coordination has been "fraught to even suggest". I think it's been suggested all the time, and certain coordination plans seem more promising than others. Like, even Eliezer was for a long time apparently thinking that Deepmind having a monopoly on AGI development was great and something to be protected, which very much involves coordinating with people outside of the LessWrong community.

T... (read more)

I think mostly I expect us to continue to overestimate the sanity and integrity of most of the world, then get fucked over like we got fucked over by OpenAI or FTX. I think there are ways to relating to the rest of the world that would be much better, but a naive update in the direction of "just trust other people more" would likely make things worse.

Again, I think the question you are raising is crucial, and I have giant warning flags about a bunch of the things that are going on (the foremost one is that it sure really is a time to reflect on your relation to the world when a very prominent member of your community just stole 8 billion dollars of innocent people's money and committed the largest fraud since Enron), [...]

I very much agree with the sentiment of the second paragraph. 

Regarding the first paragraph, my own take is that (many) EAs and rationalists might be wise to trust themselves and their allies less.[1]

The main update of the FTX fiasco (and other events I'll describe later) I'd make is that perhaps many/most EAs and rationalists aren't very good at character judgment.  They probably trust other EAs and rationalists too readily because they are part of... (read more)

Thanks, Oliver.  The biggest update for me here — which made your entire comment worth reading, for me — was that you said this: I'm surprised that you think that, but have updated on your statement at face value that you in fact do.  By contrast, my experience around a bunch common acquaintances of ours has been much the same as Katja's, like this: In fact I think I may have even heard the world "delusional" specifically applied to people working on AI governance (though not by you) for thinking that coordination on AI regulation is possible / valuable / worth pursuing in service of existential safety. As for the rest of your narrative of what's been happening in the world, to me it seems like a random mix of statements that are clearly correct (e.g., trying to coordinate with people who don't care about honestly or integrity will get you screwed) and other statements that seem, as you say, and I agree that for the purpose of syncing world models, Anyway, cheers for giving me some insight into your thinking here.
Oliver, see also this comment; I tried to @ you on it, but I don't think LessWrong has that functionality?  

Critch, I agree it’s easy for most people to understand the case for AI being risky. I think the core argument for concern—that it seems plausibly unsafe to build something far smarter than us—is simple and intuitive, and personally, that simple argument in fact motivates a plurality of my concern. That said:

  • I think it often takes weirder, less intuitive arguments to address many common objections—e.g., that this seems unlikely to happen within our lifetimes, that intelligence far superior to ours doesn’t even seem possible, that we’re safe because software can’t affect physical reality, that this risk doesn’t seem more pressing than other risks, that alignment seems easy to solve if we just x, etc.
  • It’s also remarkably easy to convince many people that aliens visit Earth on a regular basis, that the theory of evolution via natural selection is bunk, that lottery tickets are worth buying, etc. So while I definitely think some who engage with these arguments come away having good reason to believe the threat is likely, for values of “good” and “believe” and “likely” at least roughly similar those common around here, I suspect most update something more like their professed belie
... (read more)
8Ben Pace1y
This is a candidate for the most surprising sentence in the whole comments section! I'd be interested in knowing more about why you believe this. One sort of thing I'd be quite interested in is things you've seen government ops teams do fast (even if they're small things, accomplishments that would surprise many of us in this thread that they could be done so quickly).
Recruitment - in my experience often a weeks long process from start to finish, well oiled and systematic and using all the tips from the handbook on organizational behaviour on selection, often with feedback given too. By comparison, some tech companies can take several months to hire, with lots of ad hoc decision-making, no processes around biases or conflicts of interest, and no feedback. Happy to give more examples if you want by DM. I should say my sample size is tiny here - I know one gov dept in depth, one tech company in depth and a handful of other tech companies and gov depts not fully from the inside but just from talking with friends that work there, etc.
This is an important optimistic update, because it implies alignment might be quite easier than we think, given that even under unfavorable circumstances, reasonable progress still gets done. I think that this isn't an error in rationality, and instead very different goals drive EAs/LWers compared to AI researchers. A low chance of high utility and a high chance of death is pretty rational to take, assuming you only care about yourself. And this is the default, absent additional assumptions. From an altruistic perspective, it's insane to take this risk, especially if you care about the future. Thus, differing goals are at play.

However, for some reason, the idea that people outside the LessWrong community might recognize the existence of AI x-risk — and therefore be worth coordinating with on the issue — has felt not only poorly received on LessWrong, but also fraught to even suggest.

I object to this hyperbolic and unfair accusation. The entire AI Governance field is founded on this idea; this idea is not only fine to suggest, but completely uncontroversial accepted wisdom. That is, if by "this idea" you really mean literally what you said -- "people outside the LW community might recognize the existence of AI x-risk and be worth coordinating with on the issue." Come on.

I am frustated by what appears to me to be constant straw-manning of those who disagree with you on these matters. Just because people disagree with you doesn't mean there's a sinister bias at play. I mean, there's usually all sorts of sinister biases at play at all sides of every dispute, but the way to cut through them isn't to go around slinging insults at each other about who might be biased, it's to stay on the object level and sort through the arguments.

This makes sense to me if you feel my comment is meant as a description of you or people-like-you.  It is not, and quite the opposite.  As I see it, you are not a representative member of the LessWrong community, or at least, not a representative source of the problem I'm trying to point at.  For one thing, you are willing to work for OpenAI, which many (dozens of) LessWrong-adjacent people I've personally met would consider a betrayal of allegiance to "the community".  Needless to say, the field of AI governance as it exists is not uncontroversially accepted by the people I am reacting to with the above complaint.  In fact, I had you in mind as a person I wanted to defend by writing the complaint, because you're willing to engage and work full-time (seemingly) in good faith with people who do not share many of the most centrally held views of "the community" in question, be it LessWrong, Effective Altruism, or the rationality community.   If it felt otherwise to you, I apologize.

It would help if you specified which subset of "the community" you're arguing against.  I had a similar reaction to your comment as Daniel did, since in my circles (AI safety researchers in Berkeley), governance tends to be well-respected, and I'd be shocked to encounter the sentiment that working for OpenAI is a "betrayal of allegiance to 'the community'".

To be clear, I do think most people who have historically worked on "alignment" at OpenAI have probably caused great harm! And I do think I am broadly in favor of stronger community norms against working at AI capability companies, even in so called "safety positions". So I do think there is something to the sentiment that Critch is describing.

6Daniel Kokotajlo1y
Agreed! But the words he chose were hyperbolic and unfair. Even an angrier more radical version of Habryka would still endorse "the idea that people outside the LessWrong community might recognize the existence of AI risk."
Separately from my other reply explaining that you are not the source of what I'm complaining about here, I thought I'd add more color to explain why I think my assessment here is not "hyperbolic".  Specifically, regarding your claim that reducing AI x-risk through coordination is "not only fine to suggest, but completely uncontroversial accepted wisdom", please see the OP.  Perhaps you have not witnessed such conversations yourself, but I have been party to many of these: In other words, I've seen people in AI governance being called or treated as "delusional" by loads of people (1-2 dozen?) core to the LessWrong community (not you).  I wouldn't say by a majority, but by an influential minority to say the least, and by more people than would be fair to call "just institution X" for any X, or "just person Y and their friends" for any Y.  The pattern is strong enough that for me, pointing to governance as an approach to existential safety on LessWrong indeed feels fraught, because influential people (online or offline) will respond to the idea as "delusional" as Katja puts it.  Being called delusional is stressful, and hence "fraught". @Oliver, the same goes for your way of referring to sentences you disagree with as "crazy", such as here. Generally speaking, on the LessWrong blog itself I've observed too many instances of people using insults in response to dissenting views on the epistemic health of the LessWrong community, and receiving applause and karma for doing so, for me to think that there's not a pattern or problem here. That's not to say I think LessWrong has this problem worse than other online communities (i.e., using insults or treating people as 'crazy' or 'delusional' for dissenting or questioning the status quo); only that I think it's a problem worth addressing, and a problem I see strongly at play on the topic of coordination and governance.
Just to clarify, the statements that I described as crazy were not statements you professed, but statements that you said I or "the LessWrong community" believe. I am not sure whether that got across (since like, in that context it doesn't really make sense to say I described sentences I disagree with as crazy, since like, I don't think you believe those sentences either, that's why you are criticizing them).
It did not get accross! Interesting. Procedurally I still object to calling people's arguments "crazy", but selfishly I guess I'm glad they were not my arguments? At a meta level though I'm still concerned that LessWrong culture is too quick to write off views as "crazy". Even the the "coordination is delusional"-type views that Katja highlights in her post do not seem "crazy" to me, more like misguided or scarred or something, in a way that warrants a closer look but not being called "crazy".
Oops, yeah, sorry about that not coming across. Seems plausible that LessWrong culture is too quick to write off views as "crazy", though I have a bunch of conflicting feeling here. Might be worth going into at some point. I do think there is something pretty qualitatively different about calling a paraphrase or an ITT of my own opinions "crazy" than to call someone's actual opinion crazy. In-general my sense is for reacting to paraphrases it's less bad for the social dynamics to give an honest impression and more important to give a blunt evocative reaction, but I'll still try to clarify more in the future when I am referring to the meat of my interlocutors opinion vs. their representation of my opinion.

That particular statement was very poorly received, with a 139-karma retort from John Wentworth arguing,

What exactly is the model by which some AI organization demonstrating AI capabilities will lead to world governments jointly preventing scary AI from being built, in a world which does not actually ban gain-of-function research?

I’m not sure what’s going on here

So, wait, what’s actually the answer to this question? I read that entire comment thread and didn’t find one. The question seems to me to be a good one!

  • The GoF analogy is quite weak.
  • "What exactly" seems a bit weird type of question.  For example, consider nukes: it was hard to predict what exactly is the model by which governments will not blow everyone up after use of nukes in Japan. But also: while the resulting equilibrium is not great, we haven't died in nuclear WWIII so far. 

The GoF analogy is quite weak.

As in my comment here, if you have a model that simultaneously both explains the fact that governments are funding GoF research right now, and predicts that governments would nevertheless react helpfully to AGI, I’m very interested to hear it. It seems to me that defunding GoF is a dramatically easier problem in practically every way.

The only responses I can think of right now are (1) “Basically nobody in or near government is working hard to defund GoF but people in or near government will be working hard to spur on a helpful response to AGI” (really? if so, what’s upstream of that supposed difference?) or (2) “It’s all very random—who happens to be in what position of power and when, etc.—and GoF is just one example, so we shouldn’t generalize too far from it” (OK maybe, but if so, then can we pile up more examples into a reference class to get a base rate or something? and what are the interventions to improve the odds, and can we also try those same interventions on GoF?)

I think it’s worth updating on the fact that the US government has already launched a massive, disruptive, costly, unprecedented policy of denying AI-training chips to China. I’m not aware of any similar-magnitude measure happening in the GoF domain.

IMO that should end the debate about whether the government will treat AI dev the way it has GoF - it already has moved it to a different reference class.

Some wild speculation on upstream attributes of advanced AI’s reference class that might explain the difference in the USG’s approach:
a perception of new AI as geoeconomically disruptive; that new AI has more obvious natsec-relevant use-cases than GoF; that powerful AI is more culturally salient than powerful bio (“evil robots are scarier than evil germs”).

Not all of these are cause for optimism re: a global ASI ban, but (by selection) they point to governments treating AI “seriously”.

One big difference is GoF currently does not seem that dangerous to governments. If you look on it from a perspective not focusing on the layer of individual humans as agents, but instead states, corporations, memplexes and similar creatures as the agents, GoF maybe does not look that scary? Sure, there was covid, but while it was clearly really bad for humans, it mostly made governments/states relatively stronger.  Taking this difference into account, my model was and still is governments will react to AI.  This does not imply reacting in a helpful way, but I think whether the reaction will be helpful, harmful or just random is actually one of the higher variance parameters, and a point of leverage.  (And the common-on-LW stance governments are stupid and evil and you should mostly ignore them is unhelpful in both understanding and influencing the situation.) 
Personally I haven't thought about how strong the analogy to GoF is, but another thing that feels worth noting is that there may be a bunch of other cases where the analogy is similarly strong and where major government efforts aimed at risk-reduction have occurred. And my rough sense is that that's indeed the case, e.g. some of the examples here. In general, at least for important questions worth spending time on, it seems very weird to say "You think X will happen, but we should be very confident it won't because in analogous case Y it didn't", without also either (a) checking for other analogous cases or other lines of argument or (b) providing an argument for why this one case is far more relevant evidence than any other available evidence. I do think it totally makes sense to flag the analogous case and to update in light of it, but stopping there and walking away feeling confident in the answer seems very weird. I haven't read any of the relevant threads in detail, so perhaps the arguments made are stronger than I imply here, but my guess is they weren't. And it seems to me that it's unfortunately decently common for AI risk discussions on LessWrong to involve this pattern I'm sketching here.  (To be clear, all I'm arguing here is that these arguments often seem weak, not that their conclusions are false.) (This comment is raising an additional point to Jan's, not disagreeing.) Update: Oh, I just saw Steve Byrnes also the following in this thread, which I totally agree with:
This would be useful if the main problem was misuse, and while this problem is arguably serious, there is another problem, called the alignment problem, that doesn't care who uses AGI, only that it exists. Biotech is probably the best example of technology being slowed down in the manner required, and suffice it to say it only happened because eugenics and anything related to that became taboo after WW2. I obviously don't want a WW3 to slow down AI progress, but the main criticism remains: The examples of tech that were slowed down in the manner required for alignment required massive death tolls, ala a pivotal act.
The analogy I had in mind is not so much in exact nature of the problem, but in the aspect it's hard to make explicit precise models of such situations in advance.  In case of nukes, consider the fact that smartest minds of the time, like von Neumann or Feynman, spent decent amount of time thinking about the problems, had clever explicit models, and were wrong - in case of von Neumann to the extent that if US followed his advice, they would have launched nuclear armageddon.

I think it's uncharitable to psychoanalyze why people upvoted John's comment; his object-level point about GoF seems good and merits an upvote IMO. Really, I don't know what to make of GoF. It's not just that governments have failed to ban it, they haven't even stopped funding it, or in the USA case they stopped funding it and then restarted I think. My mental models can't explain that. Anyone on the street can immediately understand why GoF is dangerous. GoF is a threat to politicians and national security. GoF has no upsides that stand up to scrutiny, and has no politically-powerful advocates AFAIK. And we’re just getting over a pandemic which consumed an extraordinary amount of money, comfort, lives, and attention for the past couple years, and which was either a direct consequence of GoF research, or at the very least the kind of thing that GoF research could have led to. And yet, here we are, with governments funding GoF research right now. Again, I can’t explain this, and pending a detailed model that can, the best I can do right now is say “Gee I guess I should just be way more cynical about pretty much everything.”

Anyway, back to your post, if Option 1 is unilateral pivotal... (read more)

8the gears to ascension1y
I also have found that almost everyone I talk to outside the field of AI has found it obvious that AI could kill us all. They also find it obvious that AI is about to surpass us, and are generally not surprised by my claims of a coming discontinuity; in contrast, almost anyone working in ai thinks I'm crazy. I suspect that people think I'm claiming I can do it, when in fact I'm trying to tell them they are about to do it. it's really frustrating! also, the majority of opinion in the world doesn't come from AI researchers. That said. I cannot state this hard enough: THE COMING DISCONTINUITY WILL NOT WAIT BEHIND REGULATION. I know of multiple groups who already know what they need to in order to figure it out! regulation will not stop them unless it is broad enough to somehow catch every single person who has tried to participate in creating it, and that is not going to happen, no matter how much the public wishes for it. I don't believe any form of pivotal act could save humanity. anything that attempts to use control to prevent control will simple cause a cascade of escalatory retaliations, starting with whatever form of attack is used to try to stop ai progress, escalating from accelerationists, escalating from attempted shutdown, possibly an international war aided by ai happening in parallel, and ending with the ai executing the last one. Your attempts to slow the trickle of sand into the gravity well of increasing thermal efficiency will utterly fail. there are already enough GPUs in the world, and it only takes one. we must solve alignment so hard that it causes the foom, nothing else could possibly save us. The good news is, alignment is capabilities in a deep way. Solving alignment at full strength would suddenly stabilize AI in a way that makes it much stronger at a micro level, and would simultaneously allow for things like "hey, can you get the carbon out of the air please?" without worry about damaging those inside protected boundaries.
No, you're misunderstanding John Wentworth's comment and then applying that straw man to the rest of less wrong based on the comment's upvote total. It's not that laypeople's can't understand the dangers inherent in engineered viruses, and that leads to governments continuing to finance and leak them. You can probably convince your Uber driver that lab leaks are bad, too. It's a lack of ability to translate that understanding into positive regulatory and legal outcomes, instead of completely net negative ones.
3Roman Leventov1y
Probably this opinion of LWers is shaped by their experience communicating with outsiders. Almost all my attempts to communicate AI x-risk to outsiders, from family members to friends to random acquaintances, have not been understood for sure. Your experience (talking to random people at social events, walking away from you with the thought "AI x-risk is indeed a thing!", and starting to worry about it in the slightest afterwards) is highly surprising to me. Maybe there is a huge bias in this regard in the Bay Area, where even normal people generally understand and appreciate the power of technology more than in other places, or have had some similar encounters before, or it's just in the zeitgeist of the place. (My experience is outside the US, primarily with Russians and some Europeans.) All that being said, ChatGPT (if people have experienced it first-hand) and especially GPT-4 could potentially make communication of the AI x-risk case much easier.
I've had >50% hit rate for "this person now takes AI x-risk seriously after one conversation" from people at totally non-EA parties (subculturally alternative/hippeish, in not particularly tech-y parts of the UK). I think it's mostly about having a good pitch (but not throwing it at them until there is some rapport, ask them about their stuff first), being open to their world, modeling their psychology, and being able to respond to their first few objections clearly and concisely in a way they can frame within their existing world-model. Edit: Since I've been asked in DM: My usual pitch been something like this. I expect Critch's version is very useful for the "but why would it be a threat" thing but have not tested it as much myself. I think being open and curious about them + being very obviously knowledgeable and clear thinking on AI x-risk is basically all of it, with the bonus being having a few core concepts to convey. Truth-seek with them, people can detect when you're pushing something in epistemically unsound ways, but tend to love it if you're going into the conversation totally willing to update but very knowledgeable.

Survey about this question (I have a hypothesis, but I don't want to say what it is yet): https://forms.gle/1R74tPc7kUgqwd3GA

Nit: it shouldn't offer "submit another response" at the end. You can turn this off in the form settings, and leaving it on for forms that are only intended to receive one response per person feels off and maybe leads someone to think that filling it out multiple times is expected. (Wouldn't normally be worth pointing out, but you create a decent number of surveys that are seen by a lot of people and changing this setting when creating them would be better)
2Ben Pace1y
Filled out!

'Nuclear power' seems to me like a weird example because we selectively halted the development of productive use of nuclear power while having comparatively little standing in the way of development of destructive use of nuclear power. If a similar story holds, then we'll still see militarily relevant AIs (deliberately doing adversarial planning of the sort that could lead to human extinction) while not getting many of the benefits along the way.

That... doesn't seem like much of a coordination success story, to me.

8Zach Stein-Perlman1y
Isn't NPT a canonical example in international relations of coordination success? In the 60s people thought many states would acquire nukes in the next few decades, but a treaty essentially stopped new states from acquiring nukes and set up a structure for helping states use nuclear power non-militarily. (Then many states gradually, individually, domestically chose not to pursue nuclear power much, for reasons specific to nuclear power.)

Yes, because the standards for success for nuclear are much lower than they are for AI. Not only did 5 states acquire weapons before the treaty was signed, around four have acquired them since, and this didn't stop the arms race accumulation of thousands of weapons. This turned out to be enough (so far).

In worlds where nukes ignite the atmosphere the first time you use them, there would have been a different standard of coordination necessary to count as 'success'. (Or in worlds where we counted the non-signatory states, many of which have nuclear weapons, as failures.)

The point here is that the NPT is insufficient. With the alignment problem, it doesn't matter much if one state has it or many individuals have it, it only matters if no one has it. A better example is arguably biotech, and this only happened because WW2 torched the idea of human eugenics, thus indirectly slowing down biotech by preventing it's funding.
2Not Relevant1y
The NPT framework, if it could be implemented, would be sufficient. The goal of the NPT is to enable countries to mutually verify that no additional country has acquired a nuclear weapon, while still enabling the spread of nuclear power to many more states. It has been pretty successful at this, with just a few new states gaining nuclear weapons over the last 50 years, whereas many more can enrich uranium/operate power plants. It happens that the number of nuclear-armed countries at the NPT’s signing was nonzero, but if it had been 0, then the goal of the NPT would’ve been “no one anywhere can develop a nuclear weapon”. A separate Q is “could we have implemented the NPT without Hiroshima, if scientists had strong evidence it would ignite the atmosphere?” People can have reasonable disagreements here; I think it’s lame not to try.
The unfortunate answer is likely not, assuming the cold war happens like it did historically. Both sides were very much going to get nuclear weapons and escalate as soon as they were able to. You really need almost Alien Space Bats or random quantum events to prevent the historical outcome of several states getting nuclear weapons. Now w imagine those nuclear weapons were intelligent and misaligned, and the world probably goes up in flames. Not assuredly, but well over 50% probability per year.

The main concrete proposals / ideas that are mentioned here or I can think of are:

  1. Work to spread good knowledge regarding AGI risk / doom stuff among AI researchers.
    1. I think everyone is in favor of this, particularly when it’s tied to the less-adversarial takeaway message “there is a big problem, and more safety research is desperately needed”. When it’s tied to the more-adversarial takeaway message “capabilities research should be slowed down”, I think that can be tactically bad, as I think people generally don’t tend to be receptive to people telling them that they and all their scientific idols are being reckless. I think it’s good to be honest and frank, but in the context of outreach, we can be strategic about what we emphasize, and I think emphasizing the “there is a big problem, and more safety research is desperately needed” message is generally the better approach. It also helps that the “there is a big problem, and more safety research is desperately needed” message is pretty clearly correct and easy to argue whereas “capabilities research is harmful” has comparatively more balanced arguments on both sides, and moreover if people can be sufficiently bought into the “there
... (read more)
In many cases, they're right, and in fact they're working on AI (broadly construed) that's (1) narrow, (2) pretty unlikely to contribute to AGI, and (3) potentially scientifically interesting or socially/technologically useful, and therefore good to pursue. "We" may have a tactical need to be discerning ourselves in who, and what intentions, we criticize. 
Nitpick: My impression is that at least some versions of this outreach are very controversial in the community, as suggested by e.g. the lack of mass advocacy efforts. [Edit: "lack of" was an overstatement. But these are still much smaller than they could be.]
2Steven Byrnes1y
For example, Eliezer Yudkowsky went on the Sam Harris podcast in 2018, Stuart Russell wrote an op-ed in the New York Times, Nick Bostrom wrote a book, … I dunno, do you have examples? Nobody is proposing to play a commercial about AGI doom during the Superbowl or whatever, but I think that’s less “we are opposed to the general public having an understanding of why AGI risk is real and serious” and more “buying ads would not accomplish that”, I think?

you are underestimating the degree of unilateralist's curse driving ai progress by quite a bit. it looks like scaling is what does it, but that isn't actually true, the biggest-deal capabilities improvements come from basic algorithms research that is somewhat serially bottlenecked until we improve on a scaling law, and then the scaling is what makes a difference. progress towards dethroning google by strengthening underlying algorithms until they work on individual machines has been swift, and the next generations of advanced basic algorithms are already here, eg https://github.com/BlinkDL/RWKV-LM - most likely, a single 3090 can train a GPT3 level model in a practical amount of time. the illusion that only large agents can train ai is a falsehood resulting from how much easier the current generation of models are to train than previous ones, such that simply scaling them up works at all. but it is incredibly inefficient - transformers are a bad architecture, and the next things after them are shockingly stronger. if your safety plan doesn't take this into account, it won't work.

that said - there's no reason to think we're doomed just because we can't slow down. we need simply speed up safety until it has caught up with the leading edge of capabilities. safety should always be thinking first about how to make the very strongest model's architecture fit in with existing attempts at safety, and safety should take care not to overfit on individual model architectures.

there's no reason to think we're doomed just because we can't slow down. we need simply speed up safety until it has caught up with the leading edge of capabilities.

I want to emphasize this; if doubling the speed of safety is cheaper than halving the rate of progress, and you have limited resources, then you always pick doubling the speed of safety in that scenario.

In the current scenario on earth, trying to slow the rate of progress makes you an enemy of the entire AI industry, whereas increasing the rate of safety does not. Therefore, increasing the rate of safety is the default strategy, because it's the option that won't get you an enemy of a powerful military-adjacent industry (which already has/had other enemies, and plenty of experience building its own strategies for dealing with them).

I want to emphasize this; if doubling the speed of safety is cheaper than halving the rate of progress, and you have limited resources, then you always pick doubling the speed of safety in that scenario.


In the current scenario on earth, trying to slow the rate of progress makes you an enemy of the entire AI industry, whereas increasing the rate of safety does not. Therefore, increasing the rate of safety is the default strategy, because it's the option that won't get you an enemy of a powerful military-adjacent industry (which already has/had other enemies, and plenty of experience building its own strategies for dealing with them).

You're going to have be less vague in order for me to take you seriously. I understand that you apparently have private information, but I genuinely can't figure out what you'd have me believe the CIA is constantly doing to people who oppose its charter in extraordinarily indirect ways like this. If I organize a bunch of protests outside DeepMind headquarters, is the IC going to have me arrested? Stazi-like gaslighting? Pay a bunch of NYT reporters to write mean articles about me and my friends?

My experience in talking about slowing down AI with people is that they seem to have a can’t do attitude. They don’t want it to be a reasonable course: they want to write it off.

There are a lot of really good reasons why someone would avoid touching the concept of "suppressing AI research" with a ten-foot pole; depending on who they are and where they work, it's tantamount to advocating for treason. Literal treason, with some people. It's the kind of thing where merely associating with people who advocate for it can cost you your job, and certainly promotions.

A lot of this is considered infohazardous, and frankly, I've already said too much here. But it's very legitimate, and even sensible, to have very, very strong and unquestioned misgivings about large numbers of people they're associated with being persuaded to do something as radical as playing a zero-sum game against the entire AI industry.

1Ben Pace1y
I agree this is an ongoing dynamic, and I'm glad you brought it up, but I have to disagree with “good reasons”. Something being suppressed by the state does not make it false. If anything it is good reason to believe it might be true. As Katja points out in the OP: I would like to see the AI industry solve all disease, create novel art forms, and take over the world. I would like it to happen in a safe way that does not literally kill everyone. This is not the same as being in a zero-sum game with the industry.
I agree on this of course. But the issue is that powerful people can jump to conclusions on AI safety in 6 hour timelines, whereas the AI industry converging on understanding alignment is more like 6 year timeline. If AI safety is the #1 public opinion threat to the AI industry at any given time, or appears that way, then then that could result in AI safety being marginalized for decades. This system revolves around a very diverse mix of reasonable and unreasonable people.. What I'm getting at is that it's a very delicate game, and there's no way to approach "slowing down AI", trying to impeding the government and military's top R&D priorities is basically hitting the problem with a sledgehammer. And it can hit back, orders of magnitude harder.
I didn't realize the US military was secretly authoring all of the actually important R&D happening at DeepMind and OpenAI?

This was counter to the prevailing narrative at the time, and I think did some of the work of changing the narrative. It's of historical significance, if nothing else.

I think it's a bit hard to tell how influential this post has been, though my best guess is "very". It's clear that sometime around when this post was published there was a pretty large shift in the strategies that I and a lot of other people pursued, with "slowing down AI" becoming a much more common goal for people to pursue.

I think (most of) the arguments in this post are good. I also think that when I read an initial draft of this post (around 1.5 years ago or so), and had a very hesitant reaction to the core strategy it proposes, that I was picking up on something important, and that I do also want to award Bayes points to that part of me given how things have been playing out so far. 

I do think that since I've seen people around me adopt strategies to slow down AI, I've seen it done on a basis that feels much more rhetorical, and often directly violates virtues and perspectives that I hold very dearly. I think it's really important to understand that technological progress has been the central driving force behind humanity's success, and that indeed this should establish a huge prior against stopping almost any kind of technological development.

In contrast to that, the m... (read more)

There are things that are robustly good in the world, and things that are good on highly specific inside-view models and terrible if those models are wrong. Slowing dangerous tech development seems like the former, whereas forwarding arms races for dangerous tech between world superpowers seems more like the latter.

It may seem the opposite to some people. For instance, my impression is that for many adjacent to the US government, "being ahead of China in every technology" would be widely considered robustly good, and nobody would question you at all if you said that was robustly good. Under this perspective the idea that AI could pose an existential risk is a "highly specific inside-view model" and it would be terrible if we acted on the model and it is wrong.

I don't think your readers will mostly think this, but I actually think a lot of people would, which for me makes this particular argument seem entirely subjective and thus suspect.

4Ben Pace1y
I'm confused, of course the people in government in every country thinks that they should have more global power, but this doesn't seem like something everyone (i.e. including people in all of the other countries) would agree is robustly good, and I don't think you should think so either (for any country, be it Saudi Arabia, France, or South Korea). I am not aware of a coherent perspective that says "slowing down dangerous tech development" is not robustly good in most situations (conditional on our civilization's inability to "put black balls back into the urn", a la Bostrom). Your argument sounds to me like "A small group with a lot of political power disagrees with your claim therefore it cannot be accepted as true." Care to make a better argument?
I think the claim being made is that the "dangerous" part of "slowing down dangerous tech development" is the analogous "highly specific inside-view model" which would be terrible to act on, if it were wrong.  That seems valid to me.  Obviously I believe that highly specific inside-view model is not wrong, but, you know, that's how it goes.
2Ben Pace1y
I... don't think that ThomasW is himself challenging the assumption. He's saying "Read this from the perspective of someone in the US government, doesn't seem so compelling now does it?" I'm not sure what about the post gave the impression that it was aimed to be persuasive to (for example) the US Secretary of State, but I am confident Katja did not write it for them.
The claim being made is something like the following: 1) AGI is a dangerous technology. 2) It is robustly good to slow down dangerous technologies. 3) Some people might say that you should not actually do this because of [complicated unintelligible reason]. 4) But you should just do the thing that is more robustly good. I argue that many people (yes, you're right, in ways that conflict with one another) believe the following: 1) X is a dangerous country. 2) It is robustly good to always be ahead of X in all technologies, including dangerous ones. 3) Some people might say that you should not actually do this because of [complicated unintelligible reason]. This doesn't make very much sense. 4) But you should just do the thing that is more robustly good.   My point is that which argument is the obvious, robust one, and which one is the weird inside view one depends on your perspective. Therefore, it doesn't seem like (4) is a very good generalized argument. For example, if I were one of these powerful people, I think it would be wrong for me to be convinced to "focus on the robustly good measures, not the weird inside view measures" because it would lead me to do bad things like trying to advance AI capabilities. As a result, the argument seems suspect to me. It feels like it only works for this community, or people who are already very concerned by AI x-risk. In comparison, there are specific arguments like "AGI is dangerous" and "slowing down dangerous technologies is actually robustly good" (some of these were presented in this post) that I think are, ironically, must more robustly good, because they don't seem to have negative effects as reliably when presented to people who hold beliefs I think are wrong. Edit: I no longer endorse this comment. It claims too much, specifically that any reasoning procedure is suspect if it leads to people who believe false premises taking bad actions. I think what I was really trying to get at in my original comment w
2Ben Pace1y
This is kind of a strange comment to me. The argument, and indeed the whole post, is clearly written to people in the ecosystem ("my impression is that for people worried about extinction risk from artificial intelligence, strategies under the heading ‘actively slow down AI progress’ have historically been dismissed and ignored"), for which differential technological progress is a pretty common concept and relied upon in lots of arguments. It's pretty clear that this post is written to point out an undervalued position to those people.  Sometimes I feel like people in the AI x-risk ecosystem who interface with policy and DC replace their epistemologies with a copy of the epistemology they find in various parts of the policy-control machine in DC, in order to better predict them and perform the correct signals — asking themselves what people in DC would think, rather than what they themselves would think. I don't know why you think this post was aimed at those people, or why you point out that the post is making false inferences about its audience when the post is pretty clear that it's primary audience is the people directly in the ecosystem ("The conversation near me over the years has felt a bit like this").
I just do not think that the post is written for people who think "slowing down AI capabilities is robustly good." If people thought that, then why do they need this post? Surely they don't need somebody to tell them to think about it? So it seems to me like the best audience for this post would be those (including those at some AI companies, or those involved in policy, which includes people reading this post) who currently think something else, for example that the robustly good thing is for their chosen group to be ahead so that they can execute whatever strategy they think they alone can do correctly. The people I've met who don't want to think about slowing down AI capabilities just don't seem to think that slowing down AI progress would be robustly good, because that just wouldn't be a consistent view! They often seem to have some view that nothing is robustly good, or maybe some other thing ("get more power") is robustly good. Such people just won't really be swayed by the robust priors thing, or maybe they'd be swayed in the other direction.
2Ben Pace1y
I see. You're not saying "staffers of the US government broadly won't find this argument persuasive", you're saying "there are some people in the AI x-risk ecosystem who don't think slowing down is robustly good, and won't find this particular argument persuasive". I have less of a disagreement with that sentence.  I'll add that: * I think most of the arguments in the post are relevant to those people, and Katja only says that these moods are "playing a role" which does not mean all people agree with them. * You write "If people thought that, then why do they need this post? Surely they don't need somebody to tell them to think about it?". Sometimes people need help noticing the implications of their beliefs, due to all sorts of motivated cognitions. I don't think the post relies on that and it shouldn't be the primary argument, but I think it's honestly helpful for some people (and was a bit helpful for me to read it).
Yeah, I agree with all this.
2Ben Pace1y
Thread success!

Thanks for writing!

I want to push back a bit on the framing used here. Instead of the framing "slowing down AI," another framing we could use is, "lay the groundwork for slowing down in the future, when extra time is most needed." I prefer this latter framing/emphasis because:

  • An extra year in which the AI safety field has access to pretty advanced AI capabilities seems much more valuable for the field's progress (say, maybe 10x) than an extra year with current AI capabilities, since the former type of year would give the field much better opportunities to test safety ideas and more clarity about what types of AI systems are relevant.
    • One counterargument is that AI safety will likely be bottlenecked by serial time, because discarding bad theories and formulating better ones takes serial time, making extra years early on very useful. But my very spotty understanding of the history of science suggests that it doesn't just take time for bad theories to get replaced by better ones--it takes time along with the accumulation of lots of empirical evidence. This supports the view that late-stage time is much more valuable than early-stage time.
  • Slowing down in the future seems much mor
... (read more)

I’ve copied over and lightly edited some comments I left on a draft. Note I haven’t reread the post in detail; sorry if these were addressed somewhere.

Writing down quick thoughts after reading the intro and before reading the rest:

I have two major reasons to be skeptical of actively slowing down AI (setting aside feasibility):

1. It makes it easier for a future misaligned AI to take over by increasing overhangs, both via compute progress and algorithmic efficiency progress. (This is basically the same sort of argument as "Every 18 months, the minimum IQ necessary to destroy the world drops by one point.")

2. Such strategies are likely to disproportionately penalize safety-conscious actors.

(As a concrete example of (2), if you build public support, maybe the public calls for compute restrictions on AGI companies and this ends up binding the companies with AGI safety teams but not the various AI companies that are skeptical of “AGI” and “AI x-risk” and say they are just building powerful AI tools without calling it AGI.)

For me personally there's a third reason, which is that (to first approximation) I have a limited amount of resources and it seems better to spend that on the "use good... (read more)

FYI I think your first skepticism was mentioned in the safety from speed section; she concludes that section: She mentions your second skepticism near the top, but I don't see anywhere she directly addresses it.
2Rohin Shah1y
Thanks, that's good to know.

The conversation near me over the years has felt a bit like this: 

Some people: AI might kill everyone. We should design a godlike super-AI of perfect goodness to prevent that.

Others: wow that sounds extremely ambitious

Some people: yeah but it’s very important and also we are extremely smart so idk it could work

[Work on it for a decade and a half]


Some people: ok that’s pretty hard, we give up

Others: oh huh shouldn’t we maybe try to stop the building of this dangerous AI? 

Some people: hmm, that would involve coordinating numerous people—we may be arrogant enough to think that we might build a god-machine that can take over the world and remake it as a paradise, but we aren’t delusional

There's a sleight of hand going on with the "we" here. "We" as in LessWrong are not building the godlike AI, the trillion dollar technocapitalist machine is doing that. "We" are a bunch of nerds off to the side, some of whom are researching ways to point said AIs at specific targets. If "we" tried to start an AGI company we'd indeed end up in fifth place and speed up timelines by six weeks (generously).

That's the case now, but note the "work on it for a decade and a half". 15 years ago, MIRI was working on building the godlike AI directly and was one of the only actors doing so, as there wasn't a trillion dollar technocapitalist machine pouring money into machine learning yet.

I think there's a major internal tension in the picture you present (though the tension is only there with further assumptions). You write:

Obstruction doesn’t need discernment

I don’t buy it. If all you want is to slow down a broad area of activity, my guess is that ignorant regulations do just fine at that every day (usually unintentionally). In particular, my impression is that if you mess up regulating things, a usual outcome is that many things are randomly slower than hoped. If you wanted to speed a specific thing up, that’s a very different story, and might require understanding the thing in question.

The same goes for social opposition. Nobody need understand the details of how genetic engineering works for its ascendancy to be seriously impaired by people not liking it. Maybe by their lights it still isn’t optimally undermined yet, but just not liking anything in the vicinity does go a long way.

And you write:

Technological choice is not luddism

Some technologies are better than others [citation not needed]. The best pro-technology visions should disproportionately involve awesome technologies and avoid shitty technologies, I claim. If you think AGI is highly likely to dest

... (read more)
The tension wouldn't be there if obstruction isn't bottlenecked on discernment because discernment is easy / not too hard, but I don't think you made that argument.  If discernment is not too hard, that's potentially a dangerous thing: by being all discerning in a very noticeable way, you're painting a big target on "here's the dangerous [cool!] research". Which is what seems to have already happened with AGI.  This is also a general problem with "just make better arguments about AI X-risk". You can certainly make such arguments without spreading ideas about how to advance capabilities, but still, the most pointed arguments are like "look, in order to transform the world you have to do XYZ, and XYZ is dangerous because ABC". You could maybe take the strategy of, whenever a top researcher makes a high-level proposal for how to make AGI, you can criticize that like "leaving aside whether or not that leads to AGI, if it led to AGI, here's how that would go poorly".  (I acknowledge that I'm being very "can't do" in emphasis, but again, I think this pathway is crucial and worth thinking through... and therefore I want to figure out the best ways to do it!)

Curated.  I am broadly skeptical of existing "coordination"-flavored efforts, but this post prompted several thoughts:

  1. I have very incomplete models of what those efforts are (much more so than my models of technical alignment efforts, which are also substantially incomplete)
  2. It seems likely to me that some people have better models of both existing efforts, and potential future efforts, which have not been propagated very well, and it'd be good for that to change

I think this post does a good job of highlighting representative objections to various proposed strategies and then demonstrating why those objections should not be considered decisive (or even relevant).  It is true that we will not solve the problem of AI killing everyone by slowing it down, but that does not mean we should give up on trying to find +EV strategies for slowing it down, since a successful slowdown, all else equal, is good.

The world has a lot of experience slowing down technological progress already, no need to invent new ways. 

Fusion research slowed to a crawl in the 1970s and so we don't have fusion power. Electric cars have been delayed by a century. IRB is successful at preventing many promising avenues. The FDA/CDC killed most of the novel drug and pandemic prevention research. The space industry is only now catching up to where it was 50 years ago. 

Basically, stigma/cost/red tape reliably and provably does the trick. 

Requires huge specialized equipment. Some AI requires huge equipment, but (1) you can do a lot with a little, and (2) the equipment is heavily economically incentivized for other reasons (all the other uses of compute).    Why was this? I'd've thought it's basically battery tech, blocked on materials tech. Is that not right?   Wasn't this basically un-investment? That's not a lever we can easily pull with AI.    Now we're talking! How the heck does that work? I'm surprised enough that actually I'd guess it's not that hard for a private investor to do research, it's just that the research wouldn't be allowed to be applied (enforceable because very public). Is that true? If not, why not?
With fusion it was mostly defunding, just like with space exploration: from https://21sci-tech.com/Articles_2010/Winter_2009/Who_Killed_Fusion.pdf Not sure if this is a possibility with AI. Electric cars and transport in general were apparently killed by the gas automobile industry. Battery tech was just enough for the daily commute, and there were options and workarounds. I am not an expert on how the government regulation kills innovation, there is probably enough out there, including by Zvi and Scott Alexander.

It seems very odd to have a discussion of arms race dynamics that is purely theoretical exploration of possible payoff matrices, and does not include a historically informed discussion of what seems like the obviously most analogous case, namely nuclear weapons research during the Second World War.

US nuclear researchers famously (IIRC, pls correct me if wrong!) thought there was a nontrivial chance their research would lead to human extinction, not just because nuclear war might do so but because e.g. a nuclear test explosion might ignite the atmosphere. T... (read more)

The world has actually coordinated on some impressive things, e.g. nuclear non-proliferation

Given the success of North Korea, I am both impressed by the world's coordination on nuclear weapon and depressed that even the impressive coordination is not enough. I feel similarly for the topic the nuclear weapon is metaphor for.

Continuing to think about the metaphor: in terms of existential risk, North Korean nuclear weapon is probably less damaging than United States missile defense, due to reasons related to second strike. It is probable people feel bad about... (read more)

China has treaties obliging it to defend North Korea if it is attacked. Whether it actually would do so is another matter.

I suspect that part of what is going on is that many in the AI safety community are inexperienced with and uncomfortable with politics and have highly negative views about government capabilities.   

Another potential (and related) issue, is that people in the AI safety community think that their comparative advantage doesn't lie in political action (which is likely true) and therefore believe they are better off pursuing their comparative advantage (which is likely false).  

As a datapoint, I remember briefly talking with Eliezer in July 2021, where I said "If only we could make it really cringe to do capabilities/gain-of-function work..." (I don't remember which one I said). To which, I think he replied "That's not how human psychology works." 

I now disagree with this response. I think it's less "human psychology" and more "our current sociocultural environment around these specific areas of research." EG genetically engineering humans seems like a thing which, in some alternate branches, is considered "cool" and "exciting", while being cringe in our branch. It doesn't seem like a predestined fact of human psychology that that field had to end up being considered cringe. 

Your own examples of technologies that aren't currently pursued but have huge upsides are a strong case against this proposition. These lines of research have some risks, but if there was sufficient funding and coordination, they could be tremendously valuable. Yet the status quo is to simply ban them without investing much at all in building a safe infrastructure to pursue them.

If you should succeed in achieving the political will needed to "slow down tech," it will come from idiots, fundamentalists, people with useless jobs, etc. It will not be a coaliti... (read more)

I completely agree and this seems good? I very much want to ally with unproductive rent-seekers and idiots to reduce existential risk. Thanks a lot, unproductive rent-seekers and idiots! (though I most certainly shouldn't call them that to ally with them). I don't understand how this is in any way a strong case against the proposition.
Idiots and lunatics tend to get out of control and f**k things up, is the problem.

Someone used CRISPR on babies in China

Nitpick: the CRISPR was on the embryos, not the babies.

Although I don't think what He Jiankui did was scientifically justified (the benefit of HIV resistance wasn't worth the risk – he should have chosen PCSK9 instead of CCR5), I think the current norms against human genetic enhancement really are stifling a lot of progress. 

Agreed. On the other hand, what I read suggests He Jiankui was bottlenecked on parental consent. For his first-in-human trial, he couldn't recruit any parents interested in editing PCSK9, but some parents, themselves HIV patients, whose contacts were relatively easily acquired from HIV support group, really really cared about (as you pointed out, and I agree, incorrectly) editing CCR5, and were easily recruited. It sometimes happens recruiting participants is the limiting factor in doing trials, and I think it was the case here.

(1) The framing of all this as military technology (and the DoD is the single largest purchasing agent on earth) reminds me of nuclear power development. Molten Salt reactors and Pellet Bed reactors are both old tech which would have created the dream of safe, small-scale nuclear power. However, in addition to not melting down and working at relatively small scales, they also don't make weapons-usable materials. Thus they were shunted in favor of the kinds of reactors we mostly have now. In an alternative past without the cold war driving us to make ne... (read more)

I arrogantly think I could write a broadly compelling and accessible case for AI risk

I recently asked whether such a resource exists and didn't find one that meets what I'm looking for. Currently trying to write my own version.

Thank you for writing this post! I agree completely, which is perhaps unsurprising given my position stated back in 2020. Essentially, I think we should apply the precautionary principle for existentially risky technologies: do not build unless safety is proven.

A few words on where that position has brought me since then.

First, I concluded back then that there was little support for this position in rationalist or EA circles. I concluded as you did, that this had mostly to do with what people wanted (subjective techno-futurist desires), and less with what ... (read more)

Broadly agree, but: I think we're very confused about the social situation. (Again, I agree this argues to work on deconfusing, not to give up!) For example, one interpretation of the propositions claimed in this thread


if  they are true, is that AI being dangerous is more powerful in terms of moving money and problem-solving juice as a recruitment tool rather than a dissuasion tool. I.e. in certain contexts, it's beneficial towards the goal of getting funding to include in your pitch "th... (read more)

More generally, that twitter thread is an exemplar of a broader thing which is the Vortex of Silicon Valley Craziness, which is mostly awesome, often very silly, and also tinged with craziness. And I have an uncertain vague sense that this tinge of craziness is some major part of the tailwind pushing AGI research? Or more specifically, the tailwind investing capital into AGI research.

This post seems like it was quite influential. This is basically a trivial review to allow the post to be voted on.

I see one critical flaw here.

Why does anyone assume ANY progress will be made on alignment if we don't have potentially dangerous AGIs in existence to experiment with?

A second issue is that at least the current model for chatGPT REQUIRES human feedback to get smarter, and the greater the scale of userbase the smarter it can potentially become.

Other systems designed to scale to AGI may have to be trained this way: initial training from test environments and static human text, but refinement from interaction with live humans, where the company with the most ... (read more)

6Steven Byrnes1y
I’m obviously biased, but I think we should assume this based on what we see with our eyes—we can look around and note that more than zero progress on alignment is being made right now. If you think that “What Paul Christiano is doing right now is just totally useless, he might as well switch fields, do some cool math or whatever, and have a more relaxing time until real-deal AGIs show up, it would make no difference whatsoever”, and you also think that same thing about Scott Garrabrant, Venessa Kosoy, John Wentworth, Anthropic, Redwood Research, Conjecture, me (cf. here & here), etc. etc.—well, you’re obviously entitled to believe that, but I would be interested to hear a more detailed argument if you have time, not just analogizing to other fields. (Although, I do think that if the task was “make ANY progress on heavier-than-air flight before we have any actual airplanes”, this task would be easily achievable, because “any” is a very low bar! You could do general research towards stiff and light structures, towards higher-power-to-weight-ratio engines, etc.) For example, Eliezer Yudkowsky is on the very skeptical end of opinions about ongoing AGI safety research, but he seems to strongly believe that doing interpretability research right now is marginally helpful, not completely useless.
3Gerald Monroe1y
Fair. Any is an unfair comparison. On the other hand every past technology humans made, whether or not they researched it for decades first or rushed it out with young engineers, I am not actually sure it made any difference. There is no way to falsify this but pretty much every technology built had crippling, often lethal to humans flaws in the first versions. My point is there is immense information gain from actually fully constructing and testing a technology, and further large gains from deployment to scale. While if you don't have any of that the possibility space is much larger. For example some propose llms as they currently exist could exhibit rampant behavior. This may be true or completely false because the RLHF step discouraged models that can exhibit such traits or some other reason. Prior to fission reactors existing nuclear scientists may have been concerned about prompt criticality detonating power reactors. This has only happened once, possibly twice.
2Steven Byrnes1y
Hmm, Fermi invented the idea of control rods before building the first-ever nuclear reactor, and it worked as designed to control the nuclear reaction. So that’s at least one good example that we can hope to follow. I’m not sure what your last paragraph is referring to. For that first nuclear reactor, the exponential growth happened pretty much exactly as Fermi had calculated in advance, IIRC. OK anyway, there’s a category of AGI safety work that we might call “Endgame Safety”, where we’re trying to do all the AGI safety work that we couldn’t (or didn’t) do ahead of time, in the very last moments before (or even after) people are actually playing around with the kind of powerful AGI algorithms that could get irreversibly out of control. I think we both agree that lots of the essential AGI safety work is in the category of “Endgame Safety”. I don’t know what the fraction is, but it seems that you and I are both agreeing that the fraction is not literally 100%. (For my part, I wouldn’t be too surprised if Endgame Safety were 90% of the total useful person-hours of AGI safety, but I hope that lots of important conceptual / deconfusion work can be done further ahead, since those things sometimes take lots of wall-clock time.)  And as long as the fraction (AGI endgame safety work) / (all AGI safety work) is not literally 100%—i.e., as long as there is any AGI safety research whatsoever that we can do ahead of time—then we now have the core of an argument that slowing down AGI would be helpful. For example, if AGI happens in 5 years, we can be frantically doing Endgame Safety starting in 5 years. And if AGI happens in 50 years, we can be frantically doing Endgame Safety starting in 50 years. What does it matter? Endgame Safety is going to be a frantic rush either way. But in the latter case, we can have more time to nail down everything that’s not Endgame Safety. And we can also have more time to do other useful things like outreach / field-building—to get from the cu
3Gerald Monroe1y
Steven, how many months before the Chicago pile construction started did Fermi's design team do the work on the control rods?  There's also a large difference between the idea of control rods - we have lots of ideas how to do AGI control mechanisms and no doubt some of them do work - and an actual machined control rod with enough cadmium/boron/etc to work.   In terms of labor hours, going from idea to working rod was probably >99% of the effort.  Even after discovering empirically which materials act as neutron absorbers.  

we have lots of ideas how to do AGI control mechanisms and no doubt some of them do work

I think AGI safety is in a worse place than you do.

It seems that you think that we already have at least one plan for Safe & Beneficial AGI that has no problems that are foreseeable at this point, they’ve been red-teamed to death and emerged unscathed with the information available, and we’re not going to get any further until we’re deeper into the implementation.

Whereas I think that we have zero plans for which we can say “given what we know now, we have strong reason to believe that successfully implementing / following this plan would give us Safe & Beneficial AGI”.

I also think that, just because you have code that reliably trains a deceptive power-seeking AGI, sitting right in front of you and available to test, doesn’t mean that you know how to write code that reliably trains a non-deceptive corrigible AGI. Especially when one of the problems we’re trying to solve right now is the issue that it seems very hard to know whether an AGI is deceptive / corrigible / etc.

Maybe the analogy for me would be that Fermi has a vague idea “What if we use a rod made of neutron-absorbing material?”... (read more)

3Gerald Monroe1y
Yes, I do.  I've spoken privately with an engineer on AI safety at OpenAI and he agrees this can work.   It's an extension of existing industrial safety mechanisms, including the ones that are used on autonomous cars today.   The current method is:     There are 2 systems controlling a car operating on autopilot.  One has a sophisticated policy to control the car, using a pipeline consisting of neural networks and software modules.  The other is a microcontroller which has a dead simple policy to order a braking force increase over time.    Each timestep, the microcontroller software is looking for information from the telemetry stream the higher level system emits that may indicate a failure.  This means timeouts, corrupt packets, or if a packet contains specific values indicating an error or low confidence.    Waymo extends this approach to 3 systems, based on their publicly released information.  One high level system, one mid level system that is approximately a lane control system on a separate power source, and the microcontroller.  The mid level system is a neural network attached to one camera, and it emits steering commands to bring the vehicle to a safe stop. This obviously can be extended to n-levels of system.  There can be multiple AI systems, each able to take over control, where the lower level systems use more robust and simple policies that are more likely to work in a larger variety of situations.  Every time the higher level system becomes empirically uncertain - a module authored by humans is what measures this, one way to do this is to see if the inputs to the high level system fall into the latent space of the training set - control shifts to the first system that has sufficient confidence, devolving to a microcontroller. The 'ah ha' I want you to have is that we know the AI policy is safe within the space of the training simulator.  We measured it.  The training simulator is supposed to be a reasonably accurate facsimile of the real wor
2Steven Byrnes1y
For the purpose of the present discussion, I note that if your plan needs interpretability, then that would be a cause for concern, and a reason for slowing down AGI. The state of interpretability is currently very bad, and there seem to be lots of concrete ways to make progress right now. Separately, I don’t think your plan (as I understand it) has any hope of addressing the hardest and most important AGI safety problems. But I don’t want to spend the (considerable) time to get into a discussion about that, so I’ll duck out of that conversation, sorry. (At least for now.)
1Gerald Monroe1y
That is unfortunately not a helpful response. If this simple plan - which is already what is in use in the real world in actual AI systems today - won't work this is critical information! What is the main flaw? It costs you little to mentioned the biggest problem.
I agree with this, and therefore I'm both more optimistic, and think that we should be not alarmed at the pace of progress. Or in other words, I disagree with the idea of slowing down AGI progress.
Yeah, this is a big problem I have with alignment people. They forget that if we don't have iteration, we don't solve the problem, so all efforts should focus on making things paralleliziable. It's a problem I had with MIRI's early work, and today we need to set ourselves up for much more empirical evidence. This could be a reason to support capabilities advances.
6Gerald Monroe1y
They argue there is some unknown point of capabilities at which the system explodes and we all die. If that's the rules of the universe we happen to find ourselves in though there probably is no winning anyways though. Sort of how if the laws of physics were slightly different and the first nuclear test did ignite the atmosphere. Were atmospheric gas fissionable things would be very different. It's a very similar criticality argument. Early AGIs that try bad stuff may "quench" because the world lacks sufficient easily remotely hackable nanoforges and fleets of armed killer robots ready to deploy. So they instead steal a few bitcoins, kill a few people, then are caught and shut down. If instead the AGI finds an exploit to get criticality then we all die. I am concerned the AGI might create a cult of personality or a religion and get support from large numbers of gullible humans. These humans, despite the AGI openly killing people and acting completely selfishly, might give it the resources to develop a way to kill us all.

One big obstacle you didn't mention: you can make porn with that thing. It's too late to stop it.

More seriously, I think this cat may already be out of the bag. Even if the scientific community and the american military-industrial complex and the chinese military-industrial complex agreed to stop AI research, existing models and techniques are already widely available on the internet.

Even if there is no official AI lab anywhere doing AI research, you will still have internet communities pooling compute together for their own research projects (especially i... (read more)

I'm not convinced this line of thinking works from the perspective of the structure of the international system. For example, not once are international security concerns mentioned in this post. 

My post here draws out some fundamental flaws in this thinking:

I very strongly agree with this post. Thank you very much for writing it!

I think to reach a general agreement on not doing certain stupid things, we need to better understand and define what exactly those things are that we shouldn't do. For example, instead of talking about slowing down the development of AGI, which is a quite fuzzy term, we could talk about preventing uncontrollable AI. Superintelligent self-improving AGI would very likely be uncontrollable, but there could be lesser forms of AI that could be uncontrollable, and thus very dangerous, as w... (read more)

Bravo! I especially agree wrt people giving too many galaxy brain takes for why current ML labs speeding along is good.

I believe this post is one of the best to grace the front page of the LW forum this year. It provides a reasoned counterargument to prevailing wisdom in an accessible way, and has the potential to significantly update views. I would strongly support pinning this post to the front page to increase critical engagement.


Thank you for calling attention to this. It always seems uncontroversial that some things could speed up AGI timelines, yet it is assumed that very little can be done to slow them down. The actual hard part is figuring out what could, in practice, slow down timelines with certainty.

Finding ways to slow down timelines is exactly why I wrote this post on Foresight for AGI Safety Strategy: Mitigating Risks and Identifying Golden Opportunities.

  • AI is pretty safe: unaligned AGI has a mere 7% chance of causing doom, plus a further 7% chance of causing short term lock-in of something mediocre
  • Your opponent risks bad lock-in: If there’s a ‘lock-in’ of something mediocre, your opponent has a 5% chance of locking in something actively terrible, whereas you’ll always pick good mediocre lock-in world (and mediocre lock-ins are either 5% as good as utopia, -5% as good)
  • Your opponent risks messing up utopia: In the event of aligned AGI, you will reliably achieve the best outcome, whereas your opponent has a
... (read more)

So, a number of issues stand out to me, some have been noted by others already, but:

My impression is that there are also less endorsable or less altruistic or more silly motives floating around for this attention allocation.

A lot of this list looks to me like the sort of heuristics where, societies that don't follow them inevitably crash, burn and become awful. A list of famous questions where the obvious answer is horribly wrong, and there's a long list of groups who came to the obvious conclusion and became awful, and it's become accepted wisdom to not d... (read more)

I could be wrong, but I’d guess convincing the ten most relevant leaders of AI labs that this is a massive deal, worth prioritizing, actually gets you a decent slow-down. I don’t have much evidence for this.

Delay is probably finite by default 

Convincing OpenAI, Anthropic, and Google seems moderately hard - they've all clearly already considered the matter carefully, and it's fairly apparent from their actions what conclusions they've reached about pausing now. Then convincing Meta, Mistral, Cohere, and the next dozen-or-so would-be OpenAI replacement... (read more)

"I arrogantly think I could write a broadly compelling and accessible case for AI risk"

Please do so. Your current essay is very good, so chances are your "arrogant" thought is correct.

Edit: I think this is too pessimistic about human nature, but maybe we should think about this more before publishing a "broadly compelling and accessible case for AI risk".


Thank you for writing this. On your section 'Obstruction doesn't need discernment' - see also this post that went up on LW a while back called The Regulatory Option: A response to near 0% survival odds. I thought it was an excellent post, and it didn't get anywhere near the attention it deserved, in my view.

People have been writing stories about the dangers of artificial intelligences arguably since Ancient Greek time (Hephaistos built artificial people, including Pandora), certainly since Frankenstein. There are dozens of SF movies on the theme (and in the Hollywood ones, the hero always wins, of course). Artificial intelligence trying to take over the world isn't a new idea, by scriptwriter standard it's a tired trope. Getting AI as tightly controlled as nuclear power or genetic engineering would not, politically, be that hard -- it might take a decade or t... (read more)

Very useful post and discussion! Let's ignore the issue that someone in capabilities research might be underestimating the risk and assume they have appropriately assessed the risk. Let's also simplify to two outcomes of bliss expanding in our lightcone and extinction (no value). Let's also assume that very low values of risk are possible but we have to wait a long time. It would be very interesting to me to hear how different people (maybe with a poll) would want the probability of extinction to be below before activating the AGI. Below are my super rough... (read more)

Chiming in on toy models of research incentives:  Seems to me like a key feature is that you start with an Arms Race then, after some amount of capabilities accumulate, transitions to the Suicide Race.  But players have only vague estimates of where that threshold is, have widely varying estimates, and may not be able to communicate estimates effectively or persuasively.  Players have a strong incentive to push right up to the line where things get obviously (to them) dangerous, and with enough players, somebody's estimate is going to be wro... (read more)

I'd like to ask a few questions about slowing down AGI as they may turn out to be cruxes for me.

  1. How popular/unpopular is AI slowdown? Ideally, we'd get AI slowdown/AI progress/Neutral as choices in a poll. I also ideally would like different framings of the problem, to test how well frames affect people's choices. But I do want at least a poll on how popular/unpopular AI slowdown is.

  2. How much does the government want AI to be slowed down? Is Trevor's story about the US government not willing to countenance AI slowdown correct, and instead speed it up

... (read more)

I am interested in getting feedback on whether it seems worthwhile to advocate for better governance mechanisms (like prediction markets) in the hopes that this might help civilization build common knowledge about AI risk more quickly, or might help civilization do a more "adequate" job of slowing AI progress by, restricting unauthorized access to compute resources. Is this a good cause for me to work on, or is it too indirect and it would be better to try and convince people about AI risk directly? See a more detailed comment here: https://www.lesswrong... (read more)

But empirically, the world doesn’t pursue every technology—it barely pursues any technologies.

While it's true that not all technologies are pursued, history shows that humans have actively sought out transformative technologies, such as fire, swords, longbows, and nuclear energy. The driving factors have often been novelty and utility. I expect the same will happen with AI. 

Very nice exploration of this idea. But I don't see mention, in the main post, and couldn't find by simple search, in the comments, any mention of my biggest concern about this notion of an AI development slowdown (or complete halt, as EY recently called for in Time):

Years of delay would put a significant kink in the smooth exponential progress charts of e.g. Ray Kurzweil. Which is a big deal, in my opinion, because:

(1) They're not recognised as laws of nature. But these trends, across so many technologies and the entire history of complexity, in the unive... (read more)