So what do we do? One classic answer is that we get as far as we can before encountering the hard problems, then we use whatever model we have at that point as an automated alignment researcher to do the research necessary to tackle the hard parts of alignment. I think this is a very good plan, and we should absolutely do this, but I don’t think it obviates us from the need to work on the hard parts of alignment ourselves.
I was surprised that none of the below mentioned just how catastrophic it could be to try to push this "as far as we can" before using w...
I found your 4th point particularly well-explained and intuitive to me! Thanks for that. I was a bit skeptical of this post going in but I enjoyed it by the end, even though I didn't read as unbelievably thoroughly as I suppose I could have (I have not read many of the sub/linked posts fully).
I didn't find that this post explained why "alignment is hard" discourse seems alien to my human intuitions, but rather that it explained the differing fundamental views on why its hard. Nothing about this seemed alien to my human intuitions?
Also, mostly unrelated, I ... (read more)