x

LESSWRONG
LW

Julian_R — LessWrong

Julian_R

Julian_R

Message

41

1

10

9y

Julian_R

41

9y

;

Reinforcement Learning Goal Misgeneralization: Can we guess what kind of goals are selected by default?

When training a Reinforcement Learning (RL) AI, we observe it to behave well on training data. But there exist different goals that are compatible with the behaviour we see, and we ask which of these goals the AI might have adopted. Is there some implicit Occam's Razor selecting simple goals?...

Oct 25, 2022•15

Julian_R's Shortform

Oct 23, 2020•1