x

LESSWRONG
LW

scrdest — LessWrong

scrdest

scrdest

Message

1

1

2y

scrdest

scrdest has not written any posts yet.

Message

1 karma

1 comment

Member for 2 years

Replying toGoodhart's Law in Reinforcement Learning

Goodhart's Law in Reinforcement Learning

This seems to me like a formalisation of Scott Alexander's The Tails Coming Apart As Metaphor For Life post.

Given a function and its approximation, following the approximate gradient in Mediocristan is good enough, but the extremes are highly dissimilar.

I wonder what impact complex reward functions have. If you have a pair of approximate rewards, added together, could they pull the system closer to the real target by cancelling each other out?

2

0