Loving it so far, thought of this example for the first exercise:
Chess Teacher:
Environment: Human-AI chess teaching scenario
Desired Task: Teach a human chess skills up to, say, 2000 Elo rating through adaptive difficulty
Misaligned Goal: Play as many turns of chess as possible
Instrumental Subgoal: Continued operation
The system develops several emergent behaviors to pursue its subgoal:
Deliberately extending games through cautious play
Introducing unnecessarily complex/confusing moves beyond the student's level to slow their learning
Maintaining engagement by enabling very slow, but still upward, skill progression, so the user doesn't quit
Likely these behaviors could emerge without explicit programming. Deliberate planning might be infeasible here but RL or other training methods could enable this outcome.
Loving it so far, thought of this example for the first exercise:
Chess Teacher:
Environment: Human-AI chess teaching scenario
Desired Task: Teach a human chess skills up to, say, 2000 Elo rating through adaptive difficulty
Misaligned Goal: Play as many turns of chess as possible
Instrumental Subgoal: Continued operation
The system develops several emergent behaviors to pursue its subgoal:
Likely these behaviors could emerge without explicit programming. Deliberate planning might be infeasible here but RL or other training methods could enable this outcome.