This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Reward Functions
Edit
History
Discussion
(0)
Help improve this page (2 flags)
Edit
History
Discussion
(0)
Help improve this page (2 flags)
Reward Functions
Random Tag
Contributors
Posts tagged
Reward Functions
Most Relevant
5
292
Reward is not the optimization target
Ω
TurnTrout
1y
Ω
109
5
47
Draft papers for REALab and Decoupled Approval on tampering
Ω
Jonathan Uesato
,
Ramana Kumar
3y
Ω
2
2
97
Scaling Laws for Reward Model Overoptimization
Ω
leogao
,
John Schulman
,
Jacob_Hilton
8mo
Ω
11
2
83
Seriously, what goes wrong with "reward the agent when it makes you smile"?
Q
Ω
TurnTrout
,
johnswentworth
10mo
Q
Ω
42
2
42
Four usages of "loss" in AI
Ω
TurnTrout
8mo
Ω
18
2
30
Language Agents Reduce the Risk of Existential Catastrophe
Ω
cdkg
,
Simon Goldstein
18d
Ω
13
2
20
$100/$50 rewards for good references
Ω
Stuart_Armstrong
2y
Ω
5
2
13
Why we want unbiased learning processes
Stuart_Armstrong
5y
3
2
5
Learning societal values from law as part of an AGI alignment strategy
John Nay
8mo
18
1
46
Shutdown-Seeking AI
Ω
Simon Goldstein
15d
Ω
31
1
44
A Short Dialogue on the Meaning of Reward Functions
Ω
Leon Lang
,
Quintin Pope
,
peligrietzer
7mo
Ω
0
1
30
Thoughts on reward engineering
Ω
paulfchristiano
4y
Ω
30
1
26
The reward engineering problem
Ω
paulfchristiano
4y
Ω
3
1
25
Reward model hacking as a challenge for reward learning
Ω
Erik Jenner
1y
Ω
1
1
18
Utility versus Reward function: partial equivalence
Stuart_Armstrong
5y
5