So, LLM-powered systems can do research now. That includes basic safety research. And it looks like they can have good research taste after a bit of fine-tuning. And they haven't tried to take over the world yet, as far as I know. At some point in the past, I expected that AIs that could do any scientific work at all would have to be smart enough to be takeover-capable. Evidently not!
Making a machine god benevolent is probably harder than coming up with a new jailbreaking technique and writing a half-lucid report about it. It could still be that there's no such thing as an alignment MVP. Let's assume that there is such a thing; that is, it's possible to make an AI that safely and effectively does alignment research (at sufficient speed and scale to render human efforts obsolete).[1]
In a such-a-thing world, most dignity points in the wild come from ensuring that frontier AI companies actually develop and deploy SafeAlignmentSolver-1.0 and actually get it to solve alignment and actually implement its solution before building something they can't destroy. I can't imagine that there will be many "research fleet manager" positions available, and they probably should not be given to rookies. There are other things that people can do to try to shove the world into a state where mismanagement is less likely, but learning about MechInterp probably isn't one of them.
I mostly agree that, in a such-a-thing world, short timelines don't entirely devalue research being done now - but sufficiently short timelines devalue the training of new human researchers. If SafeAlignmentSolver-1.0 is deployed tomorrow, there isn't much point in running MATS this summer! SafeAlignmentSolver-1.0 probably will not be deployed tomorrow, but, at some point, we'll see AI inventing and testing new control protocols, for example. Then training humans to do the same thing might be needless, or even useless, though there could be a period of weeks to years when it makes sense for humans to keep working alongside the machines.
One has to think about how one's research efforts are going to affect the world, and whether or not they'll have a chance to do so at all. When do we stop upskilling? What signs should potential new researchers look for before they say, "Actually, I won't be able to contribute on the technical side before the machines are doing it all, and I should look for dignity points somewhere else"?
When should we expect that SafeAlignmentSolver-1.0 is coming within, say, twelve months? What about something weaker, like BenchmarkDesigner-v1 within twelve months?
Why make this assumption? Because, if an alignment MVP is impossible, the titular question is easy to answer in theory: "Once alignment is solved, or AI takeover has happened." Yes, there are other considerations in practice, such as when it's time to temporarily hold off on solving technical problems and switch over to governance, but that's not really what this post is about.
I'm not making any claims today about whether the assumption is actually true or false.