This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Reward Functions
Edit
History
Subscribe
Discussion
(0)
Help improve this page (2 flags)
Edit
History
Subscribe
Discussion
(0)
Help improve this page (2 flags)
Random Tag
Contributors
Posts tagged
Reward Functions
Most Relevant
5
347
Reward is not the optimization target
Ω
TurnTrout
2y
Ω
122
5
47
Draft papers for REALab and Decoupled Approval on tampering
Ω
Jonathan Uesato
,
Ramana Kumar
3y
Ω
2
2
102
Scaling Laws for Reward Model Overoptimization
Ω
leogao
,
John Schulman
,
Jacob_Hilton
1y
Ω
13
2
86
Seriously, what goes wrong with "reward the agent when it makes you smile"?
Q
Ω
TurnTrout
,
johnswentworth
2y
Q
Ω
42
2
43
Four usages of "loss" in AI
Ω
TurnTrout
1y
Ω
18
2
40
Intrinsic Drives and Extrinsic Misuse: Two Intertwined Risks of AI
jsteinhardt
5mo
0
2
30
Language Agents Reduce the Risk of Existential Catastrophe
Ω
cdkg
,
Simon Goldstein
10mo
Ω
14
2
20
$100/$50 rewards for good references
Ω
Stuart_Armstrong
2y
Ω
5
2
13
Why we want unbiased learning processes
Stuart_Armstrong
6y
3
2
5
Learning societal values from law as part of an AGI alignment strategy
John Nay
1y
18
1
121
Utility ≠ Reward
Ω
Vlad Mikulik
5y
Ω
24
1
48
Shutdown-Seeking AI
Ω
Simon Goldstein
10mo
Ω
31
1
45
A Short Dialogue on the Meaning of Reward Functions
Ω
Leon Lang
,
Quintin Pope
,
peligrietzer
1y
Ω
0
1
30
Thoughts on reward engineering
Ω
paulfchristiano
5y
Ω
30
1
26
The reward engineering problem
Ω
paulfchristiano
5y
Ω
3