starrz — LessWrong

Loving it so far, thought of this example for the first exercise:

Chess Teacher:

Environment: Human-AI chess teaching scenario

Desired Task: Teach a human chess skills up to, say, 2000 Elo rating through adaptive difficulty

Misaligned Goal: Play as many turns of chess as possible

Instrumental Subgoal: Continued operation

The system develops several emergent behaviors to pursue its subgoal:

Deliberately extending games through cautious play
Introducing unnecessarily complex/confusing moves beyond the student's level to slow their learning
Maintaining engagement by enabling very slow, but still upward, skill progression, so the user doesn't quit

Likely these behaviors could emerge without explicit programming. Deliberate planning might be infeasible here but RL or other training methods could enable this outcome.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments