Discussion on utilizing AI for alignment

elifland

This is a linkpost for www.foxy-scout.com/utilizing-ai-for-alignment/

Summarized/re-worded from a discussion I had with John Wentworth. John notes that everything he said was off-the-cuff, and he doesn’t always endorse such things on reflection.

Eli: I have a question about your post on Godzilla Strategies. For background, I disagree with it; I’m most optimistic about alignment proposals that utilize weaker AIs to align stronger AIs. I had the thought that we are already using Godzilla to fight Mega-Godzilla^[1]. For example, every time an alignment researcher does a Google Search BERT is being used. I recently found out Ought may make a version of Elicit for searching through alignment research, aided by GPT-3. These already improve alignment researchers’ productivity. It seems extremely important to me that as AI advances we continue to apply it to differentially help with alignment as much as possible.

John: I agree that we can use weak AIs to help some with alignment. The question is whether we can make non-scary AIs that help us enough to solve the problem.

Eli: It seems decently likely that we can have non-scary AIs contribute at least 50% of the productivity on alignment research.^[2]

John: We are very likely not going to miss out on alignment by a 2x productivity boost, that’s not how things end up in the real world. We’ll either solve alignment or miss by a factor of >10x.

Eli: (after making some poor counter-arguments) Yes, I basically agree with this. The reason I’m still relatively optimistic about Godzilla strategies is (a) I’m less confident about how much non-scary AIs can help, like 90%+ productivity outsourced to AI doesn’t feel that crazy to me even if I have <50% credence in it and (b) I’m less optimistic about the tractability of non-Godzilla strategies, so Godzilla strategies still seem pretty good even if they are absolutely not that promising.

John: Makes sense, agree these are the cruxes.

This point is heavily inspired by Examples of AI Increasing AI Progress. ↩︎
See also Richard’s comment that I really liked related to this. ↩︎

We are very likely not going to miss out on alignment by a 2x productivity boost, that’s not how things end up in the real world. We’ll either solve alignment or miss by a factor of >10x.

Why is this true?

Most problems that people work on in research are roughly the right difficulty, because the ambition level is adjusted to be somewhat challenging but not unachievable. If it's too hard then the researcher just moves on to another project. This is the problem selection process we're used to, and might bias our intuitions here.

On the other hand, we want to align AGI because it's a really important problem, and have no control over the difficulty of the problem. And if you think about the distribution of difficulties of all possible problems, it would be a huge coincidence if the problem of aligning AGI, chosen for its importance and not its difficulty, happened to be within 2x difficulty of the effort we end up being able to put in.

That argument makes sense, thanks

We are very likely not going to miss out on alignment by a 2x productivity boost, that’s not how things end up in the real world. We’ll either solve alignment or miss by a factor of >10x.

Why is this true?

That argument makes sense, thanks

16

Discussion on utilizing AI for alignment

16

16

16