[Advanced Intro to AI Alignment] 2. What Values May an AI Learn? — 4 Key Problems
2.1 Summary In the last post, I introduced model-based RL, which is the frame we will use to analyze the alignment problem, and we learned that the critic is trained to predict reward. I already briefly mentioned that the alignment problem is centrally about making the critic assign high value...