Research request (alignment strategy): Deep dive on "making AI solve alignment for us"

Mostly agree. For some more starting points, see posts with the AI-assisted alignment tag. I recently did a rough categorization of strategies for AI-assisted alignment here.

If this strategy is promising, it likely recommends fairly different prioritisation from what the alignment community is currently doing.

Not totally sure about this, my impression (see chart here) is that much of the community already considers some form of AI-assisted alignment to be our best shot. But I'd still be excited for more in-depth categorization and prioritization of strategies (e.g. I'd be interested in "AI-assisted alignment" benchmarks that different strategies could be tested against). I might work on something like this myself.

[-]Tor Økland Barstad3y20

Cool that you have an interest in this topic, and want to contribute towards making progress on it!

I myself am writing on a sequence called AGI-assisted alignment, which also looks into this kind of thing (it's a work in progress).

If I were to recommend just one post from that sequence it would be Alignment with argument-networks and assessment-predictions.

[-]JanB3yΩ220

There is now also this write-up by Jan Leike: https://www.lesswrong.com/posts/FAJWEfXxws8pMp8Hk/link-why-i-m-optimistic-about-openai-s-alignment-approach

Moderation Log