Can We Change the Goals of a Toy RL Agent?
This post is a write-up of preliminary research in which I investigated whether we could intervene upon goals in a toy RL agent. Whilst I was unsuccessful in locating and retargeting a goal-directed reasoning capability, we found evidence of partially-retargetable goal-specific reflexes. Produced as part of the ML Alignment &...
Jun 15, 202520