LESSWRONGTags
LW

Reward Functions

EditHistorySubscribe
Discussion (0)
Help improve this page (2 flags)
EditHistorySubscribe
Discussion (0)
Help improve this page (2 flags)
Reward Functions
Random Tag
Contributors
Posts tagged Reward Functions
Most Relevant
5
279Reward is not the optimization targetΩ
TurnTrout
8mo
Ω
103
5
47Draft papers for REALab and Decoupled Approval on tamperingΩ
Jonathan Uesato, Ramana Kumar
2y
Ω
2
2
86Scaling Laws for Reward Model OveroptimizationΩ
leogao, John Schulman, Jacob_Hilton
5mo
Ω
11
2
79Seriously, what goes wrong with "reward the agent when it makes you smile"?QΩ
TurnTrout, johnswentworth
7mo
QΩ
41
2
42Four usages of "loss" in AIΩ
TurnTrout
6mo
Ω
18
2
20$100/$50 rewards for good referencesΩ
Stuart_Armstrong
1y
Ω
5
2
13Why we want unbiased learning processes
Stuart_Armstrong
5y
3
2
3Learning societal values from law as part of an AGI alignment strategy
John Nay
5mo
18
1
42A Short Dialogue on the Meaning of Reward FunctionsΩ
Leon Lang, Quintin Pope, peligrietzer
4mo
Ω
0
1
30Thoughts on reward engineering Ω
paulfchristiano
4y
Ω
30
1
26The reward engineering problem Ω
paulfchristiano
4y
Ω
3
1
25Reward model hacking as a challenge for reward learningΩ
Erik Jenner
1y
Ω
1
1
17Utility versus Reward function: partial equivalence
Stuart_Armstrong
5y
5
1
16Reward functions and updating assumptions can hide a multitude of sinsΩ
Stuart_Armstrong
3y
Ω
2
1
15An investigation into when agents may be incentivized to manipulate our beliefs.
Felix Hofstätter
6mo
0
Load More (15/22)
Add Posts