In this post, I will discuss the cruel trade-off between misuse concerns and X-risks (accidental risks) regarding racing. It is important to note that this post does not advocate for one risk being more plausible than the other, nor does it make any normative statements. The purpose is to analyze the outcome of a particular worldview.
In one sentence, the key claim is: “If an entity in power to develop AGI is mostly worried about misuse AND think that they’re the best entity (morally speaking) in their reference class, it is good to race. This is bad for accidental risks”.
Epistemic status: I’ve written up this post pretty quickly, after a conversation where it didn’t seem clear to someone. I'm confident about the general claim, less about specific claims.
And to be clear, here the cause of the racing is beliefs on the world, not secretely evil intentions.
On the other hand, for those who’re worried about accidental risks and who are pessimistic about our chances of solving those issues in a short amount of time (“alignment is hard”), racing is one of, if not the worst thing:
It’s important to note that except that point (we should race really fast), most other measures are the same to solve misuse problems and accidental risks problems, i.e. auditing, licensing, developing models in close source, getting compute governance right, working to make models robust to jailbreaks etc.
"as fast as possible" includes constraints like "make your model non trivial to break to prevent misuse". The main problem is just that preventing misuse requires a priori much less engineering and intervention on the model itself than preventing accidents.
To repurpose a quote from The Cincinnati Enquirer: The saying "AI X-risk is just one damn cruelty after another," is a gross overstatement. The damn cruelties overlap.
When I saw the title, I thought, "Oh no. Of course there would be a tradeoff between those two things, if for no other reason than precisely because I hadn't even thought about it and I would have hoped there wasn't one." Then as soon as I saw the question in the first header, the rest became obvious.
Thank you so much for writing this post. I'm glad I found it, even if months later. This tradeoff has a lot of implications for policy and outreach/messaging, as well as how I sort and internalize news in those domains.
Without having thought about it enough for an example: It sounds correct to me that in some contexts, appreciating both kinds of risk drives response in the same direction (toward more safety overall). But I have to agree now that in at least some important contexts, they drive in opposite directions.