Genuine question: if AI capabilities research stopped today and larger models stopped being trained, wouldn't AI alignment research effectively be halted?
I'm assuming that the primary goal of AI alignment research is to prevent AGI and ASI from being existential risks. My main question is, how can methods for AGI/ASI alignment can be discovered before AGI/ASI exists?
AI alignment results tend to be either positive ("we succeeded in making Claude more honest") or negative ("we got ChatGPT to kill someone").
One clear benefit to a pause would be time for policy to catch up. However, this might be like trying to draw a map for terrain that doesn't exist yet. It would be like the Allies drawing up a nuclear treaty with the Axis powers before there was consensus that the nuclear bomb was actually possible.[2] It would be nice if everyone stopped and worked out a plan for global cooperation, but such a plan can only stabilize and achieve buy-in with the major players once both the underlying dangers and distribution of power are clear enough to all the players involved.
A research pause could definitely still be a net good for humanity, but at present I don't understand what this time would buy. If these conclusions make sense, they would maybe favor a slowdown (for safety to keep pace with capabilities) rather than a pause. But they are based on my rudimentary knowledge, and I would like to hear what more knowledgeable people have to say.
I haven't read many papers, so please contest this if you have strong evidence against it. Here I'm specifically thinking of Anthropic's sparse autoencoders paper.
Not in a counterfactual sense about the outcome of the war. My point is that attempting such a treaty would have been unsuccessful and wouldn't have found substantive support on either side.