Epistemic status: Written for Blog Post Day III. I don't get to talk to people "in the know" much, so maybe this post is obsolete in some way.
I think that at some point at least one AI project will face an important choice between deploying and/or enlarging a powerful AI system, or holding back and doing more AI safety research.
(Currently, AI projects face choices like this all the time, except they aren't important in the sense I mean it, because the AI isn't potentially capable of escaping and taking over large parts of the world, or doing something similarly bad.)
Moreover, I think that when this choice is made, most people in the relevant conversation will be insufficiently concerned/knowledgeable about AI risk. Perhaps they will think: "This new AI design is different from the classic models, so the classic worries don't arise." Or: "Fear not, I did [insert amateur safety strategy]."
I think it would be very valuable for these conversations to end with "OK, we'll throttle back our deployment strategy for a bit so we can study the risks more carefully," rather than with "Nah, we're probably fine, let's push ahead." This buys us time. Say it buys us a month. A month of extra time right after scary-powerful AI is created is worth a lot, because we'll have more serious smart people paying attention, and we'll have more evidence about what AI is like. I'd guess that a month of extra time in a situation like this would increase the total amount of quality-weighted AI safety and AI policy work by 10%. That's huge.
One way to prepare for these conversations is to raise awareness about AI risk and technical AI safety problems, so that it's more likely that more people in these conversations are more informed about the risks. I think this is great.
However, there's another way to prepare, which I think is tractable and currently neglected:
1. Identify some people who might be part of these conversations, and who already are sufficiently concerned/knowledgeable about AI risk.
2. Help them prepare for these conversations by giving them resources, training, and practice, as needed:
Perhaps it would be good to have an Official List of all the AI safety strategies, so that whatever rationale people give for why this AI is safe can be compared to the list. (See this prototype list.)
Perhaps it would be good to have an Official List of all the AI safety problems, so that whatever rationale people give for why this AI is safe can be compared to the list, e.g. "OK, so how does it solve outer alignment? What about mesa-optimizers? What about the malignity of the universal prior? I see here that your design involves X; according to the Official List, that puts it at risk of developing problems Y and Z..." (See this prototype list.)
Perhaps it would be good to have various important concepts and arguments re-written with an audience of skeptical and impatient AI researchers in mind, rather than the current audience of friends and LessWrong readers.
2b. Training & practice:
Maybe the person is shy, or bad at public speaking, or bad at keeping cool and avoiding fluster in high-stakes discussions. If so, some coaching and practice could go a long way. Maybe they have the opposite problems, frequently coming across as overconfident, arrogant, aggressive, or paranoid. If so someone should tell them this and help them tone it down.
In general it might be good to do some role-play exercises or something, to prepare for these conversations. As an academic, I've seen plenty of mock-dissertation-defense sessions and mock-job-talk-question-sessions, which seem to help. And maybe there are ways to get even more realistic practice, e.g. by trying to convince your skeptical friends that their favorite AI design might kill them if it worked.
Note that most of part 2 can be done without having done part 1. This is important in case we don't know anyone who might be part of one of these conversations, which is true for many and perhaps most of us.
Why do I think this is tractable? Well, seems like the sort of thing that people producing AI safety research can do on the margin, just by thinking more about their audience and maybe recording their work (or other people's work) on some Official List. Moreover people who don't do (or even read) AI safety research can contribute to this, e.g. by reading the literature on how to practice for situations like this, and writing up the results.
Why do I think this is neglected? Well, maybe it isn't. In fact I'd bet that some people are already thinking along these lines. It's a pretty obvious idea. But just in case it is neglected, I figured I'd write this. Moreover, the Official Lists I mentioned don't exist, and I think they would if people were taking this idea seriously. Finally--and this more than anything else is what caused me to write this post--I've heard one or two people explicitly call this out as something that they don't think is an important use case for the alignment research they were doing. I disagreed with them, and here we are. If this is a bad idea, I'd love to know why.