LESSWRONGTags
LW

Reward Functions

EditHistory
Discussion (0)
Help improve this page (2 flags)
EditHistory
Discussion (0)
Help improve this page (2 flags)
Reward Functions
Random Tag
Contributors
Posts tagged Reward Functions
5
292Reward is not the optimization targetΩ
TurnTrout
1y
Ω
109
5
47Draft papers for REALab and Decoupled Approval on tamperingΩ
Jonathan Uesato, Ramana Kumar
3y
Ω
2
2
97Scaling Laws for Reward Model OveroptimizationΩ
leogao, John Schulman, Jacob_Hilton
8mo
Ω
11
2
83Seriously, what goes wrong with "reward the agent when it makes you smile"?QΩ
TurnTrout, johnswentworth
10mo
QΩ
42
2
42Four usages of "loss" in AIΩ
TurnTrout
8mo
Ω
18
2
30Language Agents Reduce the Risk of Existential CatastropheΩ
cdkg, Simon Goldstein
18d
Ω
13
2
20$100/$50 rewards for good referencesΩ
Stuart_Armstrong
2y
Ω
5
2
13Why we want unbiased learning processes
Stuart_Armstrong
5y
3
2
5Learning societal values from law as part of an AGI alignment strategy
John Nay
8mo
18
1
46Shutdown-Seeking AIΩ
Simon Goldstein
15d
Ω
31
1
44A Short Dialogue on the Meaning of Reward FunctionsΩ
Leon Lang, Quintin Pope, peligrietzer
7mo
Ω
0
1
30Thoughts on reward engineering Ω
paulfchristiano
4y
Ω
30
1
26The reward engineering problem Ω
paulfchristiano
4y
Ω
3
1
25Reward model hacking as a challenge for reward learningΩ
Erik Jenner
1y
Ω
1
1
18Utility versus Reward function: partial equivalence
Stuart_Armstrong
5y
5
Load More (15/24)
Add Posts