One piece of advice I gave to EAs of various stripes in early 2021 was: do everything you can to make the government sane around biorisk, in the wake of the COVID pandemic, because this is a practice-run for AI.
I said things like: if you can't get the world to coordinate on banning gain-of-function research, in the wake of a trillions-of-dollars tens-of-millions-of-lives pandemic "warning shot", then you're not going to get coordination in the much harder case of AI research.
Biolabs are often publicly funded (rather than industry-funded). The economic forces arrayed behind this recklessly foolish and impotent research consists of “half-a-dozen researchers thinking it’s cool and might be helpful”. (While the work that would actually be helpful—such as removing needless bureaucracy around vaccines and investing in vaccine infrastructure—languishes.) Compared to the problem of AI—where the economic forces arrayed in favor of “ignore safety and rush ahead” are enormous and the argument for expecting catastrophe much murkier and more abstract—the problem of getting a sane civilizational response to pandemics (in the wake of a literal pandemic!) is ridiculously easier.
And—despite valiant effort!—we've been able to do approximately nothing.
We're not anywhere near global bans on gain-of-function research (or equivalent but better feats of coordination that the people who actually know what they're talking about when it comes to biorisk would tell you are better targets than gain-of-function research).
The government continues to fund research that is actively making things worse, while failing to put any serious funding towards the stuff that might actually help.
I think this sort of evidence has updated a variety of people towards my position. I think that a variety of others have not updated. As I understand the counter-arguments (from a few different conversations), there are two main reasons that people see this evidence and continue to hold out hope for sane government response:
1. Perhaps the sorts of government interventions needed to make AI go well are not all that large, and not that precise.
I confess I don't really understand this view. Perhaps the idea is that AI is likely to go well by default, and all the government needs to do is, like, not use anti-trust law to break up some corporation that's doing a really good job at AI alignment just before they succeed? Or perhaps the idea is that AI is likely to go well so long as it's not produced first by an authoritarian regime, and working against authoritarian regimes is something governments are in fact good at?
I'm not sure. I doubt I can pass the ideological Turing test of someone who believes this.
2. Perhaps the ability to cause governance to be sane on some issue is tied very directly to the seniority of the government officials advising sanity.
EAs only started trying to affect pandemic policy a few years ago, and aren't very old or recognized among the cacophony of advisors. But if another pandemic hit in 20 years, the sane EA-ish advisors would be much more senior, and a lot more would get done. Similarly, if AI hits in 20 years, sane EA-ish advisors will be much more senior by then. The observation that the government has not responded sanely to pandemic near-misses, is potentially screened-off by the inexperience of EAs advising governance.
I have some sympathy for the second view, although I'm skeptical that sane advisors have significant real impact. I'd love a way to test it as decisively as we've tested the "government (in its current form) responds appropriately to warning shots" hypotheses.
On my own models, the "don't worry, people will wake up as the cliff-edge comes more clearly into view" hypothesis has quite a lot of work to do. In particular, I don't think it's a very defensible position in isolation anymore. The claim "we never needed government support anyway" is defensible; but if you want to argue that we do need government support but (fortunately) governments will start behaving more reasonably after a warning shot, it seems to me like these days you have to pair that with an argument about why you expect the voices of reason to be so much louder and more effectual in 2041 than they were in 2021.
(Which is then subject to a bunch of the usual skepticism that applies to arguments of the form "surely my political party will become popular, claim power, and implement policies I like".)
See also: the law of continued failure, and Rob Bensinger's thoughts on the topic.
Disclaimer: writing quickly.
Consider the following path:
A. There is an AI warning shot.
B. Civilization allocates more resources for alignment and is more conservative pushing capabilities.
C. This reallocation is sufficient to solve and deploy aligned AGI before the world is destroyed.
I think that a warning shot is unlikely (P(A) < 10%), but won't get into that here.
I am guessing that P(B | A) is the biggest crux. The OP primarily considers the ability of governments to implement policy that moves our civilization further from AGI ruin, but I think that the ML community is both more important and probably significantly easier to shift than government. I basically agree with this post as it pertains to government updates based on warning shots.
I anticipate that a warning shot would get most capabilities researchers to a) independently think about alignment failures and think about the alignment failures that their models will cause, and b) take the EA/LessWrong/MIRI/Alignment sphere's worries a lot more seriously. My impression is that OpenAI seems to be much more worried about misuse risk than accident risk: if alignment is easy, then the composition of the lightcone is primarily determined by the values of the AGI designers. Right now, there are ~100 capabilities researchers vs ~30 alignment researchers at OpenAI. I think a warning shot would dramatically update them towards worry towards worry about accident risk, and therefore I anticipate that OpenAI would drastically shift most of their resources to alignment research. I would guess P(B|A) ~= 80%.
P(C | A, B) primarily depends on alignment difficulty, of which I am pretty uncertain, and also how large the reallocation in B is, which I am anticipating to be pretty large. The bar for destroying the world gets lower and lower every year, but this would give us a lot more time, but I think we get several years of AGI capabiliity before we deploy it. I'm estimating P(C | A, B) ~= 70%, but this is very low resilience.
I don't want to derail this thread, but I do really want to express my disbelief at this number before people keep quoting it. I definitely don't know 30 people at OpenAI who are working on making AI not kill everyone, and it seems kind of crazy to assert that there are (and I think assertions that there are are the result of some pretty adversarial dynamics I am sad about).
... (read more)