As a writing exercise, I'm writing an AI Alignment Hot Take Advent Calendar - one new hot take, written every day for 25 days. Or until I run out of hot takes, which seems likely.
This was waiting around in the middle of my hot-takes.txt file, but it's gotten bumped up because of Rob and Eliezer - I've gotta blurt it out now or I'll probably be even more out of date.
The idea of using AI research to help us be better at building AI is not a new or rare idea. It dates back to prehistory, but some more recent proponents include OpenAI members (e.g. Jan Leike) and the Accelerating Alignment group. We've got a tag for it. Heck, this even got mentioned yesterday!
So a lot of this hot take is really about my own psychology. For a long time, I felt that sure, building tools to help you build friendly AI was possible in principle, but it wouldn't really help. Surely it would be faster just to cut out the middleman and understand what we want from AI using our own brains.
If I'd turned on my imagination, rather than reacting to specific impractical proposals that were around at the time, I could have figured out how augmenting alignment research is a genuine possibility a lot sooner, and started considering the strategic implications.
Part of the issue is that plausible research-amplifiers don't really look like the picture I have in your head of AGI - they're not goal-directed agents who want to help us solve alignment. If we could build those and trust them, we really should just cut out the middleman. Instead, they can look like babble generators, souped-up autocomplete, smart literature search, code assistants, and similar. Despite either being simulators or making plans only in a toy model of the world, such AI really does have the potential to transform intellectual work, and I think it makes a lot of sense for there to be some people doing work to make these tools differentially get applied to alignment research.
Which brings us to the dual-use problem.
It turns out that other people would also like to use souped-up autocomplete, smart literature search, code assistants, and similar. They have the potential to transform intellectual work! Pushing forward the state of the art on these tools lets you get them earlier, yet it also helps other people get them earlier too, even if you don't share your weights.
Now, maybe the most popular tools will help people make philosophical progress, and accelerating development of research-amplifying tools will usher in a brief pre-singularity era of enlightenment. But - lukewarm take - that seems way less likely than such tools differentially favoring engineering over philosophy on a society-wide scale, making everything happen faster and be harder to react to.
So best of luck to those trying to accelerate alignment research, and fingers crossed for getting the differential progress right, rather than oops capabilities.
I have a short form about something similar: