Loving it so far, thought of this example for the first exercise:
Chess Teacher:
Environment: Human-AI chess teaching scenario
Desired Task: Teach a human chess skills up to, say, 2000 Elo rating through adaptive difficulty
Misaligned Goal: Play as many turns of chess as possible
Instrumental Subgoal: Continued operation
The system develops several emergent behaviors to pursue its subgoal:
Likely these behaviors could emerge without explicit programming. Deliberate planning might be infeasible here but RL or other training methods could enable this outcome.
We are excited to release a short course on AGI safety for students, researchers and professionals interested in this topic. The course offers a concise and accessible introduction to AI alignment, consisting of short recorded talks and exercises (75 minutes total) with an accompanying slide deck and exercise workbook. It covers alignment problems we can expect as AI capabilities advance, and our current approach to these problems (on technical and governance levels). If you would like to learn more about AGI safety but have only an hour to spare, this course is for you!
Here are some key topics you will learn about in this course:
Course outline:
Part 0: Introduction (4 minutes)
Part 1: The alignment problem. This part covers risk arguments and technical problems in AI alignment.
Part 2: Our technical approach. The first talk outlines our overall technical approach, and the following talks cover different components of this approach.
Part 3: Our governance approach. This part covers our approach to AI governance, starting from a high-level overview and then going into specific governance practices.
If this course gets you excited about AGI safety, you can apply to work with us! Applications for research scientist and research engineer roles are open until Feb 28.