I'm writing a post comparing some high-level approaches to AI alignment in terms of their false positive risk. Trouble is, there's no standard agreement on what various high-level approaches to AI alignment there are today, either in terms of what constitutes these high-level approaches or where to draw the line in categorizing various specific approaches.
So, I'll open it up as a question to get some feedback before I get too far along. What do you consider to be the high-level approaches to AI alignment?
(I'll supply my own partial answer below.)
Thanks. Your post specifically is pretty helpful because it helps with one of the things that was tripping me up, which is what standard names people call different methods. Your names do a better job of capturing them than mine did.