This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Reward Functions
Edit
History
Subscribe
Discussion
(0)
Help improve this page (2 flags)
Edit
History
Subscribe
Discussion
(0)
Help improve this page (2 flags)
Reward Functions
Random Tag
Contributors
Posts tagged
Reward Functions
Most Relevant
5
317
Reward is not the optimization target
Ω
TurnTrout
1y
Ω
115
5
47
Draft papers for REALab and Decoupled Approval on tampering
Ω
Jonathan Uesato
,
Ramana Kumar
3y
Ω
2
2
102
Scaling Laws for Reward Model Overoptimization
Ω
leogao
,
John Schulman
,
Jacob_Hilton
1y
Ω
11
2
84
Seriously, what goes wrong with "reward the agent when it makes you smile"?
Q
Ω
TurnTrout
,
johnswentworth
1y
Q
Ω
42
2
42
Four usages of "loss" in AI
Ω
TurnTrout
1y
Ω
18
2
30
Language Agents Reduce the Risk of Existential Catastrophe
Ω
cdkg
,
Simon Goldstein
4mo
Ω
13
2
20
$100/$50 rewards for good references
Ω
Stuart_Armstrong
2y
Ω
5
2
13
Why we want unbiased learning processes
Stuart_Armstrong
6y
3
2
5
Learning societal values from law as part of an AGI alignment strategy
John Nay
1y
18
1
48
Shutdown-Seeking AI
Ω
Simon Goldstein
4mo
Ω
31
1
44
A Short Dialogue on the Meaning of Reward Functions
Ω
Leon Lang
,
Quintin Pope
,
peligrietzer
10mo
Ω
0
1
30
Thoughts on reward engineering
Ω
paulfchristiano
5y
Ω
30
1
26
The reward engineering problem
Ω
paulfchristiano
5y
Ω
3
1
25
Reward model hacking as a challenge for reward learning
Ω
Erik Jenner
1y
Ω
1
1
18
Utility versus Reward function: partial equivalence
Stuart_Armstrong
5y
5