Things roll downhill
(Response to this challenge) I've read two things recently on similar strategies: Zvi's post on ChatGPT, and Scott Alexander's post on Redwood Research. They both seem to have a similar strategy, to train AI to not do misaligned things by giving it a bunch of examples of misaligned things, and...
Dec 6, 202219