Reinforcement Learning Goal Misgeneralization: Can we guess what kind of goals are selected by default?
When training a Reinforcement Learning (RL) AI, we observe it to behave well on training data. But there exist different goals that are compatible with the behaviour we see, and we ask which of these goals the AI might have adopted. Is there some implicit Occam's Razor selecting simple goals?...
Oct 25, 202215