(This seems to me to be what many people imagine will happen to the pieces of the AGI puzzle other than the piece they’re most familiar with, via some sort of generalized Gell-Mann amnesia: the tech folk know that the technical arena is in shambles, but imagine that policy has the ball, and vice versa on the policy side. But whatever.)
Just wanted to say that this is also my impression, and has been for months. Technical & Policy both seem to be planning on the assumption that the other side will come through in the end, but the things they imagine this involving seem to be things that the other side thinks is pretty unlikely.
Some people like to tell themselves that surely we'll get an AI warning shot and that will wake people up; but this sounds to me like wishful thinking from the world where the world has a competent response to the pandemic warning shot we just got.
When I think "AI warning shots", the warning shot I'm usually imagining involves the death of 10-50% of working-age and politically relevant people, if not from the shot itself then the social and political upheaval that happened afterwards. The "warning" in "warning shot" is "warning" that the relevant decision makers (congresspeople, etc.) die if the problem remains unsolved, not a few million miscellaneous grandmothers, whose early deaths can safely be ignored in favor of writing more blogs about the culture war. Thus this category of events generally doesn't include flash crashes, some kind of "semi"-advanced computer worm, or an industrial accident that costs us a hundred billion dollars or an economic depression but is then resolved by some heroic engineers, unless one of those happens to instill a fear of AI systems as being personally dangerous to lawmakers.
One specific (unlikely) example, that I myself can elaborate upon in detail, and I think could come before actually existentially threatening systems, is an AI just destroying the internet by doing what regular hackers do on a smaller timetable. It would be within the capabilities of existing not-super-impressive humans, with 100 subjective person-years to burn, to find or copy a couple dozen or so different zero-day bugs ala EternalBlue, write an excellent worm that trivially evades most commercial IDS, and release it and several delayed-release backup versions with different exploits and signatures for another bag of common network services when the initial wave starts to become ineffective. An AGI that wasn't smart enough to build nanobots could still pretty successfully shut down or commandeer an arbitrarily large number of web servers and banks and (internet-adjacent) electrical grids and factories and oil pipelines and self-driving cars all around the same time, and keep them disabled long enough to the point that humans replace them with things that don't use computers because fighting the worms will take too long and people have begun starving. Communities of worms built this way would be capable of consistently destroying almost all important unairgapped computers in an obfuscated and hard-to-debug way, well after the incident response teams working to guard these things manages to figure out that bootkit #3 gets onto the network from IOT toaster #2 on the next office floor. This technique destroys or commandeers lots of "airgapped" subnets too because most things people claim are airgapped outside of a highly specific national security context are only "airgapped" and accidentally connect to public DNS or regularly have some system administrator walking in an "inspected" USB drive with the latest version of Debian or whatever. Maybe some makeshift bombs go off or some second-world nations' poorly protected drones get commandeered too, but I expect not nukes, because those are actually consistently Airgapped.
If it's extremely unlikely to happen (which like any overly detailed story is probably true), I don't think it's because of a reason like "if an AI can do that it can definitely build nanobots". It's not even that difficult to accomplish. A team of I and maybe three other people I personally know could do it right now with maybe ~50% success rate, if we had an absurd amount of prep time inside of a DragonBallZ turnstyle to do it with. The primary reason people haven't launched the worm-nuke already isn't because people aren't smart enough, it's because they don't have the time, there's no motivation for anyone to do anything like this, they fear arrest if they tried to gather a team to coordinate it, and most of all the territory is changing too quickly. And these things - relative speed, cooperative replicas, single-mindedness towards weird goals - are the first critical advantages over regular people I expect early intelligent systems to have. Transformers already write code at an absurd pace compared to living breathing humans, and despite the age-old protestations about how we represent a small window on the intelligence spectrum, so far SOTA models are pretty much steadily climbing across the village idiot to Einstein spectrum in lockstep with log(training cost). Assuming we solve the context window problem a cluster of replicas of DeepMind's private "Codex" implementation will be able to do this in 5-10 years, and it's not clear to me without further assumptions that the cluster would also therefore be capable of doing something instrumentally, existentially threatening on its first critical try.
Needless to say, we are not prepared for all computers to suddenly stop working or act destructively for extended periods of time, in the same way that we are not prepared to go back to hunting and foraging even if that could somehow in theory sustain modern populations. And if it's somehow obvious (say, partly based on the timing of a very public release or an announcement) that DeepMind launched the thing that ended up killing 20% of the populace and/or sends us back to a 1970 standard of living, every former employee of Google's is getting declared guilty by association, whether they're the "ML capabilities engineers" or not. At minimum Demis Hassabis and a bunch of key figures in Google leadership get executed or permanently imprisoned, because the party(ies) directing the rioters or army during ensuing martial law rationally and predictably exploit that opportunity to mobilize a base and look competent and tough. That might happen even if there's a 50% chance they did it because Google is not China and is not a politically inconvenient scapegoat for any major faction in western politics. And this is a sample of the class of events I suggest would have sufficiently shaken things up that coordination on the alignment problem or the delayed-AGI-problem might be possible, not some weaksauce covid-19 pseudo-emergency.
I agree that events of that magnitude would wake people up. I just don't think we'll get events of that magnitude until it's too late.
Me neither, but I wanted to outline a Really Bad, detailed, pre-nanofactory scenario, since the last few times I've talked to people about this they kept either underestimating its consequences or asserting without basis that it was impossible. Also see the last paragraph.
And, lest you wonder what sort of single correlated already-known-to-me variable could make my whole argument and confidence come crashing down around me, it's whether humanity's going to rapidly become much more competent about AGI than it appears to be about everything else.
I conclude from this that we should push on making humanity more competent at everything that affects AGI outcomes, including policy, development, deployment, and coordination. In other times I'd think that's pretty much impossible, but on my model of how AI goes our ability to increase our competence at reasoning, evidence, argumentation, and planning is sufficiently correlated with getting closer to AGI that it's only very hard.
I imagine you think that this is basically impossible, i.e. not worth intervening on. Does that seem right?
If so, I'd guess your reasons are something like this:
weaksauce Overton-abiding stuff about 'improving public epistemology by setting GPT-4 loose on Twitter to provide scientifically literate arguments about everything' will be cool but will not actually prevent Facebook AI Research from destroying the world six months later, or some eager open-source collaborative from destroying the world a year later if you manage to stop FAIR specifically.
Does that sound right? Are there other important reasons?
I expect the most critical reason has to do with takeoff speed; how long do we have between when AI is powerful enough to dramatically improve our institutional competence and when it poses an existential risk?
If the answer is less than e.g. 3 years (hard to imagine large institutional changes happening faster than that, even with AI help), then improving humanity's competence is just not a tractable path to safety.
I'm really glad that this post is addressing the disjunctivity of AI doom, as my impression is that it is more of a crux than any of the reasons in https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities.
Still, I feel like this post doesn't give a good argument for disjunctivity. To show that the arguments for a scenario with no outside view are likely, it takes more than just describing a model which is internally disjunctive. There needs to be some reason why we should strongly expect there to not be some external variables that could cause the model not to apply.
Some examples of these, in addition to the competence of humanity, are that deep learning could hit a wall for decades, Moore's Law could come to a halt, some anti-tech regulation could cripple AI research, or alignment could turn out to be easy (which itself contains several disjunctive possibilities). I haven't thought about these, and don't claim that any of them are likely, but the possibility of these or other unknown factors invalidating the model prevents me from updating to a very high P(doom). Some of this comes from it just being a toy model, but adding more detail to the model isn't enough to notably reduce the possibility of the model being wrong from unconsidered factors.
A statement I'm very confident in is that no perpetual motion machines will be developed in the next century. I could make some disjunctive list of potential failure modes a perpetual motion machine could encounter, and thus conclude that their development is unlikely, but this wouldn't describe the actual reason a perpetual motion machine is unlikely. The actual reason is that I'm aware of certain laws of physics which prevent any perpetual motion machines from working, including ones with mechanisms wildly beyond my imagination. The outside view is another tool I can use to be very confident: I’m very confident that the next flight I take won’t crash, not because of my model of planes, but because any crash scenario which non-negligible probability would have caused some of the millions of commercial flights every year to crash, and that hasn’t happened. Avoiding AGI doom is not physically impossible and there is no outside view against it, and without some similarly compelling reason I can’t see how very high P(doom) can be justified.
Depends on what you mean by very high. If you mean >95% I agree with you. If you mean >50% I don't.
Deep learning hits a wall for decades: <5% chance. I'm being generous here. Moore's law comes to a halt: Even if the price of compute stopped falling tomorrow, it would only push my timelines back a few years. (It would help a lot for >20 year timeline scenarios, but it wouldn't be a silver bullet for them either.) Anti-tech regulation being sufficiently strong, sufficiently targeted, and happening sufficiently soon that it actually prevents doom: This one I'm more optimistic about, but I still feel like it's <10% chance by default. Alignment turning out to be easy: I'm also somewhat hopeful about this one but still I give it <10% chance.
Analogy: Suppose it was 2015 and the question we were debating was "Will any humans be killed by poorly programmed self-driving cars?" A much lower-stakes question but analogous in a bunch of ways.
You could trot out a similar list of maybes to argue that the probability is <95%. Maybe deep learning will hit a wall and self-driving cars won't be built, maybe making them recognize and avoid pedestrians will turn out to be easy, etc. But it would be wrong to conclude that the probability was therefore <50%.
I'm definitely only talking about probabilities in the range of >90%. >50% is justifiable without a strong argument for the disjunctivity of doom.
I like the self-driving car analogy, and I do think the probability in 2015 that a self-driving car would ever kill someone was between 50% and 95% (mostly because of a >5% chance that AGI comes before self-driving cars).
Seeing "most of it doesn't seem to me to be even trying by their own lights to engage with what look to me like the lethal problems." makes it seem to me that you are confused. The correct lesson isn't that they're unwilling to deal with the serious parts, it's that their evidence is not trivially compatible with your conclusions. In other words, that the balance of evidence they've seem indicates things either aren't so serious, or that there is plenty of time. Reading the link there, it seems like you simply don't like their solutions because you think it is much harder than they do, which should mostly lower your faith in the little bit of evidence we have on AI meaning what you think it does, rather than you putting all the weight of evidence on them not doing the important stuff. You seem very overconfident.
I personally disagree with you both on the difficulty of alignment and on how rapidly AI will become important, with most of the evidence being for the latter ending up a slow thing. Without a singularity beforehand, just increasing the amount of resources spent is quickly becoming infeasible even for the largest companies. The rate at which computing power is going down in cost is itself slowing down quite a lot despite exponential resources being spent on keeping up its scaling. Current narrow AIs need to get exponentially larger for each noticeable gain in ability, so we're pretty much relying on algorithmic enhancements to get high-end AI soon, and at the same time needing to make it more general. Stochastic Gradient Descent is an extremely well understood algorithm, and it is unlikely there is much to be gained from simply coding it better, so we pretty much need to replace the idea of transformers to make quick progress, or possibly even neural-nets themselves. Beyond that, we need to gain a great deal more generality to make an AGI. Once we do (assuming we do), it will still need all of these impediments to just go away for there to be a quick takeoff. These aren't proof of it not happening, but they should vastly lower your confidence.
You need to include a very large term for 'Super-AGI will not happen for obvious reasons' and another very large term for 'Super-AGI doesn't happen for reasons I am unaware of' in the near term (also that it never will), to your currently small term for 'we'll figure out how to avoid misalignment.'.
This is the world where the US federal government's response to COVID was to ban private COVID testing,
Private COVID testing, in early 2020, ran the risk of diverting millions of people away from public testing sites, which, in early 2020, may have been the only way for the military to get data about the pandemic (which was poorly understood at the time, and at the time it was right for biodefense officials to be paranoid about it).
confiscate PPE bought by states,
The military had no reason to trust state governments to use all the PPE they ordered in a manner that was optimal for national security, especially if the military was always the first to get solid info about COVID.
and warn citizens not to use PPE.
They wanted citizens to choose not to buy PPE, which again makes sense with the limited stockpiles of early 2020.
I don't mean to give anyone any hope where hope isn't due. Nate's arguments about incompetence and non-coordination generally hold pretty strongly, including with COVID, even if these aren't the best examples (resource/information hoarding).
However, I think this highlights a common problem among alignment thinkers where Government is assumed to be totally incompetent, when in reality there is an extremely complicated mix of competence disguised as incompetence and incompetence disguised as competence (and, of course, plenty of undisguised incompetence). This especially differs depending on the area; for example, in China during the Cultural Revolution, the nuclear weapons program was relatively untouched by the mass purges that devastated almost everything else in the government and severely disrupted the military.
Goodhart's Law is not to be taken lightly, and there's tons of people in the National Security Establishment who take Goodhart's Law extremely seriously. Maybe not plenty, certainly not enough, but definitely a lot.
Nope, I don't buy it. Having read Zvi's Covid posts and having a sense of how much better policy was possible (and was already being advocated at the time), I just don't buy a framing where government Covid policy can be partly considered as competent. I'm also dubious of treating "the military" as a monolithic black box entity with coherent behavior and goals, rather than consisting of individual decision-makers who follow the same dysfunctional incentives as everyone else.
If you have sources that e.g. corroborate sane military policy during the early Covid months, feel free to provide it, but for now I'm unconvinced.
I don't trust Zvi's COVID posts after sharing one of them with people who were much more clueful than me and getting embarrassed by the resulting critique.
However I also don't trust the government response to have been even close to optimal.
I would appreciate this kind of reply, except I can't update on it if the critique isn't public.
For now, I don't think basic notions like "all governments were incompetent on covid" are particularly easy to dispute?
To provide two examples:
During COVID, Denmark and a number of other European countries suspended dispensing the AstraZeneca vaccine over worries that it would lead to blood clots. Zvi yelled that all of them were "ludicrously stupid several times over" for doing that, because he did not believe the blood clots were real. However, according to Denmark's official calculations, given the amount of alternative vaccines they had access to, the relative laxness of their COVID restrictions, etc., it was actually worthwhile to use the other vaccines instead of AstraZeneca. The calculations didn't look obviously wrong to me, though I got busy before I could check them properly. As far as I understand, it's currently the general scientific consensus that it can indeed cause blood clots once in a while?
For now, I don't think basic notions like "all governments were incompetent on covid" are particularly easy to dispute?
To provide two examples:
- The only country I know of to a do human challenge trial for Covid was the UK, and as I understand it, that trial only began in February 2021.
- I'm not aware of any attempts to ban gain-of-function research, let alone of any bans that were actually implemented.
These seem like reasonably plausible examples of cases where countries could do better, but given the blood clot mistake I wouldn't want to assume that there aren't problems I've missed.
because he did not believe the blood clots were real
I'd need a source on that. From what I recall, the numbers were small and preliminary but plausibly real, but orders of magnitude below the danger of Covid (which IIRC incidentally also causes blood clots). So one could call the suspension penny-wise but pound-foolish, or some such. Not to mention that IIRC the suspension resulted in a dip in Covid vaccinations, so it's not clear that it was even the right call in retrospect. I also recall hearing the suspension justified as necessary to preserve trust in the vaccines, when the result seemed to be the opposite.
And remember, even if the calculus was correct for Denmark, didn't this ultimately result in a temporary suspension of this vaccine throughout Europe?
Based on just what we've discussed here, I don't at all think it warrants being considered an error on Zvi's part.
And with that being said, I don't particularly care about policy details like this, when governments failed on the big stuff like human challenge trials or accelerating vaccine discovery or production. We're discussing all this in the context of AI Safety, after all, and if governments don't get the big details right, it doesn't matter if they're decent or terrible at comparatively minor issues. The latter is ultimately just bike-shedding. (E.g. I can see governments convening panels on algorithmic bias, but at best they'll be irrelevant to the core issue at hand, and at worst they'll distract from the important stuff.)
I'd need a source on that. From what I recall, the numbers were small and preliminary but plausibly real, but orders of magnitude below the danger of Covid (which IIRC incidentally also causes blood clots).
https://www.lesswrong.com/posts/82kxkCdHeSb8smZZq/covid-3-18-an-expected-quantity-of-blood-clots
And remember, even if the calculus was correct for Denmark, didn't this ultimately result in a temporary suspension of this vaccine throughout Europe?
I'm not sure about the specifics and I'm about to go to bed so I can't check. But I think the other countries suspended them after Norway did out of a precautionary principle (i.e. if they are not good for Norway then we should probably re-evaluate whether they are good for us), and then often reinstated them after reevaluating.
It seems like a good policy to me for a country to pay attention to findings that a medicine may be surprisingly dangerous.
I also recall hearing the suspension justified as necessary to preserve trust in the vaccines, when the result seemed to be the opposite.
I feel like it would be exactly the kind of mustache-twirling consequentialism that Eliezer calls out as un-genre-savvy to continue using a vaccination that your calculations say aren't worth it in terms of naive cost/benefit because switching out the vaccine policy might harm trust. Like I'd rather not have the government decide that I'm too panicky to allow showing doubt.
A dimilar point applies to the suspension in much of Europe. While it might've been consequentially better on some level of analysis for Norway/whoever to take into account what effects their suspension would have on other countries, it feels too manipulative to me, and I'd rather they focus on their job.
I wasn't disputing that Zvi mentioned the blood clot story, I was disputing your characterisation of it. Quoting from literally the first two paragraphs from your link:
And even if all the observed clots were extra, all were caused by the vaccine, all were fatal, and that represented the overall base rate, and we ignore all population-level benefits and economic issues, the vaccine would still be worth using purely for personal health and safety by multiple orders of magnitude.
The WHO and EMA said there was no evidence there was an issue.
This is not consistent with your characterisation that Zvi made an error because "he did not believe the blood clots were real".
I feel like it would be exactly the kind of mustache-twirling consequentialism that Eliezer calls out as un-genre-savvy to continue using a vaccination that your calculations say aren't worth it in terms of naive cost/benefit because switching out the vaccine policy might harm trust. Like I'd rather not have the government decide that I'm too panicky to allow showing doubt.
I'm not disputing that. The assumption and argument here is that the calculations come out positive: we have world A where people get a few more blood clots from a very rare side effect of a vaccine, but a lot fewer blood clots as a Covid symptom, versus world B where people get no blood clots from the vaccine but a lot more people get blood clots as a Covid symptom. That's what I meant by calling the policy of choosing world B penny-wise but pound-foolish.
(EDIT: And in a yet more competent civilization, you wouldn't even have to make the tradeoff in the first place, because you could just tell your patients "by the way, we've discovered another rare side-effect" instead of suspending the vaccine outright.)
I wasn't disputing that Zvi mentioned the blood clot story, I was disputing your characterisation of it. Quoting from literally the first two paragraphs from your link:
You are right also said that it was not worth it to deal with even if they were real (which contradicts what the Danish calculations showed, as I mentioned), but the argument he lead with was that they were not real:
There was always going to be something that happened to correlate with vaccination days to some extent, somewhere, over some time period. The number of blood clots experienced after vaccination wasn’t even higher than the base rate you would otherwise expect.
(I had actually initially misremembered him as leading with the argument that they were not worth dealing with, but when I wrote my original post I decided to go back and double-check and found that he was focusing on them not being real instead. So I rewrote it before posting it to say that he didn't believe them.)
I'm not disputing that. The assumption and argument here is that the calculations come out positive: we have world A where people get a few more blood clots from a very rare side effect of a vaccine, but a lot fewer blood clots as a Covid symptom, versus world B where people get no blood clots from the vaccine but a lot more people get blood clots as a Covid symptom. That's what I meant by calling the policy of choosing world B penny-wise but pound-foolish.
What I'm saying is that the Danish government did these sorts of calculations at the time and found the trade-off to be worthwhile; the COVID death rate at the time was extremely low, the restrictions were very loose, and the available alternatice vaccines were plentiful, so avoiding AstraZeneca would according to their calculations have lowish costs, such that it is worth it.
Meanwhile, I don't see any calculations by Zvi showing that it Denmark was obviously stupid; it seems like Zvi was mostly just pattern-matching (as did I when I originally shared Zvi's article).
Mossad was allegedly pretty successful procuring large amounts of PPE from hostile countries: https://www.tandfonline.com/doi/full/10.1080/08850607.2020.1783620. They also had covert contact tracing, and one way or another their case counts seemed pretty low until Omicron.
The first few weeks of COVID lockdowns went extremely well: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7675749/
It feels like it is often assumed that the best way to prevent AGI ruin is through AGI alignment, but this isn't obvious to me. Do you think that we need to use AGI to prevent AGI ruin?
Here's a proposal (there are almost certainly better ones): Because of the large amount of compute required to create AGI, governments creating strict regulation to prevent AGI from being created. Of course, the compute to create AGI probably goes down every year, but this buys lots of time, during which one might be able to enact more careful AI regulation, or pull off state-sponsored AGI powered pivotal act project.
It seems very unlikely that one AI organization will be years ahead of everyone else on the road to AGI, so one of the main policy challenges is to make sure that all of the organizations that could deploy an AGI and cause ruin somehow decide to not do this. The challenge of making all of these organizations not deploy AGI seems easier to pull off than trying to prevent AGI via government regulations, but potentially not by that much, and the benefit of not having to solve alignment seems very large.
The key downside of this path that I see is that it strictly cuts off the path of using the AGI to perform the pivotal act, because government regulation would prevent that AGI from being built. And this government prevented AGI means that we are still on the precipice -- later AGI ruin might happen that is more difficult to prevent, or another x-risk could happen. But it's not clear to me which path gives a higher likelihood of success.
Because of the large amount of compute required to create AGI, governments creating strict regulation to prevent AGI from being created. Of course, the compute to create AGI probably goes down every year, but this buys lots of time
Military-backed hackers can effortlessly get access/hijack compute elsewhere, which means that state-backed AI development is not going to be constrained by regulation at all, at least not anything like that. This is one of the big reasons why EY has made high-profile statements about the concept of eliminating all compute, even though that concept is considered heretical by all the decisionmakers in the AI domain.
It's also the only reason why people talk about "slowing down AI progress" through sweeping, stifling industry regulations, instead of banning specific kinds of AI, although that is actually even more heretical; because it could conceivably happen in English-speaking countries without an agreement that successfully sets up enduring regulation in Russia and China. Trust problems in the international area is already astronomically complex by default, because there are large numbers of agents (e.g. spies) and they inherently strive for maximum nontransparency and information asymmetry.
Politically, it would be easier to enact a policy requiring complete openness about all research, rather than to ban it.
Such a policy would have the side effect of also slowing research progress, since corporations and governments rely on secrecy to gain advantages.
They rely on secrecy to gain relative advantages, but absolutely speaking, openness increases research speed; it increases the amount of technical information available to every actor.
You have oversimplified vision on rationality of humanity. You see decisions that are harmful for humanity and conclude that they are irrational. But this logic only works under the assumption that humanity is one individual. Decisions that are harmful for humanity are in most cases beneficial to the decision-making person, and therefore they are not irrational - they are selfish. This gives us much more hope, because persuading a rational selfish person with logic is totally possible.
This is not what a community that cares about solving these problems looks like. If the main bottleneck to our survival is the human race suddenly becoming much more competent, why aren't rationalists more evangelistic than Christians? Why is this an insular community?
I don't think it's very insular as such. Many of the people here are highly active elsewhere as well. There are multiple organizations involving people here who are active in various communities, but this is not their hub for coordination. There are a few AI safety organizations, addressing existential risks is an EA cause area, and various other projects.
One major problem is that most people here - on the evidence of their posts - are terrible at marketing compared with their competence in other areas. This is also true in most other communities, but for various reasons the skill sets involved are especially visibly lacking here. (I absolutely include myself in this)
I wish they were just terrible at marketing. It's worse. There's an active anti-marketing, anti-rhetoric bias here. I think that in trying so hard to (rightfully!) escape from the corrosive effect of rampant signalling on most people's epistemology, LessWrong has swung the pendulum the other way and become afraid of signalling in general. I think nothing short of a vehement marketing campaign aimed directly and specifically at groups on all sides of the political and even religious spectrum, across multiple countries (and not just English speaking ones - Europe, India, and China too), has any chance of improving the rationality of industrialized society enough to slow down AI advancement.
Also btw, when I say "this is an insular community" I mean the entire EA sphere. Most people on the planet with the resources to affect the world's trajectory significantly have never heard of effective altruism or AI safety to begin with. That's bad.
Being a highly infectious memeplex actively trades off against high average competence. We don't need a million footsoldiers who chant the right tribal words without deep understanding, we need the best and brightest in rooms where there's no one to bring the dynamic down.
There are efforts to evangalise to specific target groups, like ML researchers or people at the International Math Olympiads. These are encouraged, though they could be scaled better.
All the component pieces including plenty of compute are already laying on the table in plain sight. Mostly it’s down to someone understanding how to stitch them together. My optimism relies on AI getting soft failure modes that steer it away from bad things rather than banning the outright.
Note: As usual, Rob Bensinger helped me with editing. I recently discussed this model with Alex Lintz, who might soon post his own take on it (edit: here).
Some people seem to be under the impression that I believe AGI ruin is a small and narrow target to hit. This is not so. My belief is that most of the outcome space is full of AGI ruin, and that avoiding it is what requires navigating a treacherous and narrow course.
So, to be clear, here is a very rough model of why I think AGI ruin is likely. (>90% likely in our lifetimes.)[1]
My real models are more subtle, take into account more factors, and are less articulate. But people keep coming to me saying "it sounds to me like you think humanity will somehow manage to walk a tightrope, traverse an obstacle course, and thread a needle in order to somehow hit the narrow target of catastrophe, and I don't understand how you're so confident about this". (Even after reading Eliezer's AGI Ruin post—which I predominantly agree with, and which has a very disjunctive character.)
Hopefully this sort of toy model will at least give you some vague flavor of where I’m coming from.
Simplified Nate-model
The short version of my model is this: from the current position on the game board, a lot of things need to go right, if we are to survive this.
In somewhat more detail, the following things need to go right:
(I could also add a list of possible disasters from misuse, conditional on us successfully navigating all of the above problems. But conditional on us clearing all of the above hurdles, I feel pretty optimistic about the relevant players’ reasonableness, such that the remaining risks seem much more moderate and tractable to my eye. Thus I’ll leave out misuse risk from my AGI-ruin model in this post; e.g., the ">90% likely in our lifetimes" probability is just talking about misalignment risk.)
One way that this list is a toy model is that it’s assuming we have an actual alignment problem to face, under some amount of time pressure. Alternatives include things like getting (fast, high-fidelity) whole-brain emulation before AGI (which comes with a bunch of its own risks, to be clear). The probability that we somehow dodge the alignment problem in such a way puts a floor on how low models like the above can drive the probabilities of success down (though I’m pessimistic enough about the known-to-me non-AGI strategies that my unconditional p(ruin) is nonetheless >90%).
Some of these bullets trade off against each other: sufficiently good technical solutions might obviate the need for good AGI-team dynamics or good global-scale coordination, and so on. So these factors aren't totally disjunctive. But this list hopefully gives you a flavor for how it looks to me like a lot of separate things need to go right, simultaneously, in order for us to survive, at this point. Saving the world requires threading the needle; destroying the world is the default.
Correlations and general competence
You may object: "But Nate, you've warned of the multiple-stage fallacy; surely here you're guilty of the dual fallacy? You can't say that doom is high because three things need to go right, and multiply together the lowish probabilities that all three go right individually, because these are probably correlated."
Yes, they are correlated. They're especially correlated through the fact that the world is derpy.
This is the world where the US federal government's response to COVID was to ban private COVID testing, confiscate PPE bought by states, and warn citizens not to use PPE. It's a world where most of the focus on technical AGI alignment comes from our own local community, takes up a tiny fraction of the field, and most of it doesn't seem to me to be even trying by their own lights to engage with what look to me like the lethal problems.
Some people like to tell themselves that surely we'll get an AI warning shot and that will wake people up; but this sounds to me like wishful thinking from the world where the world has a competent response to the pandemic warning shot we just got.
So yes, these points are correlated. The ability to solve one of these problems is evidence of ability to solve the others, and the good news is that no amount of listing out more problems can drive my probability lower than the probability that I'm simply wrong about humanity's (future) competence. Our survival probability is greater than the product of the probability of solving each individual challenge.
The bad news is that we seem pretty deep in the competence-hole. We are not one mere hard shake away from everyone snapping to our sane-and-obvious-feeling views. You shake the world, and it winds up in some even stranger state, not in your favorite state.
(In the wake of the 2012 US presidential elections, it looked to me like there was clearly pressure in the US electorate that would need to be relieved, and I was cautiously optimistic that maybe the pressure would force the left into some sort of atheistish torch-of-the-enlightenment party and the right into some sort of libertarian individual-rights party. I, uh, wasn't wrong about there being pressure in the US electorate, but, the 2016 US presidential elections were not exactly what I was hoping for. But I digress.)
Regardless, there's a more general sense that a lot of things need to go right, from here, for us to survive; hence all the doom. And, lest you wonder what sort of single correlated already-known-to-me variable could make my whole argument and confidence come crashing down around me, it's whether humanity's going to rapidly become much more competent about AGI than it appears to be about everything else.
(This seems to me to be what many people imagine will happen to the pieces of the AGI puzzle other than the piece they’re most familiar with, via some sort of generalized Gell-Mann amnesia: the tech folk know that the technical arena is in shambles, but imagine that policy has the ball, and vice versa on the policy side. But whatever.)
So that's where we get our remaining probability mass, as far as I can tell: there's some chance I'm wrong about humanity's overall competence (in the nearish future); there's some chance that this whole model is way off-base for some reason; and there's a teeny chance that we manage to walk this particular tightrope, traverse this particular obstacle course, and thread this particular needle.
And again, I stress that the above is a toy model, rather than a full rendering of all my beliefs on the issue. Though my real model does say that a bunch of things have to go right, if we are to succeed from here.
In stark contrast to the multiple people I’ve talked to recently who thought I was arguing that there's a small chance of ruin, but the expected harm is so large as to be worth worrying about. No.