LESSWRONG
LW

scrdest
1010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Goodhart's Law in Reinforcement Learning
scrdest2y20

This seems to me like a formalisation of Scott Alexander's The Tails Coming Apart As Metaphor For Life post.

Given a function and its approximation, following the approximate gradient in Mediocristan is good enough, but the extremes are highly dissimilar.

I wonder what impact complex reward functions have. If you have a pair of approximate rewards, added together, could they pull the system closer to the real target by cancelling each other out?

Reply
No posts to display.