LESSWRONG
LW

1484
starrz
0010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
A short course on AGI safety from the GDM Alignment team
starrz4mo10

Loving it so far, thought of this example for the first exercise: 

Chess Teacher:

Environment: Human-AI chess teaching scenario

Desired Task: Teach a human chess skills up to, say, 2000 Elo rating through adaptive difficulty

Misaligned Goal: Play as many turns of chess as possible

Instrumental Subgoal: Continued operation

The system develops several emergent behaviors to pursue its subgoal:

  • Deliberately extending games through cautious play
  • Introducing unnecessarily complex/confusing moves beyond the student's level to slow their learning
  • Maintaining engagement by enabling very slow, but still upward, skill progression, so the user doesn't quit

Likely these behaviors could emerge without explicit programming. Deliberate planning might be infeasible here but RL or other training methods could enable this outcome.

Reply
No posts to display.