So what do we do? One classic answer is that we get as far as we can before encountering the hard problems, then we use whatever model we have at that point as an automated alignment researcher to do the research necessary to tackle the hard parts of alignment. I think this is a very good plan, and we should absolutely do this, but I don’t think it obviates us from the need to work on the hard parts of alignment ourselves.
I was surprised that none of the below mentioned just how catastrophic it could be to try to push this "as far as we can" before using whatever model we have... (read more)
I was surprised that none of the below mentioned just how catastrophic it could be to try to push this "as far as we can" before using whatever model we have... (read more)