This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Reward Functions
Edit
History
Subscribe
Discussion
(0)
Help improve this page (2 flags)
Edit
History
Subscribe
Discussion
(0)
Help improve this page (2 flags)
Reward Functions
Random Tag
Contributors
Posts tagged
Reward Functions
Most
Relevant
5
279
Reward is not the optimization target
Ω
TurnTrout
8mo
Ω
103
5
47
Draft papers for REALab and Decoupled Approval on tampering
Ω
Jonathan Uesato
,
Ramana Kumar
2y
Ω
2
2
86
Scaling Laws for Reward Model Overoptimization
Ω
leogao
,
John Schulman
,
Jacob_Hilton
5mo
Ω
11
2
79
Seriously, what goes wrong with "reward the agent when it makes you smile"?
Q
Ω
TurnTrout
,
johnswentworth
7mo
Q
Ω
41
2
42
Four usages of "loss" in AI
Ω
TurnTrout
6mo
Ω
18
2
20
$100/$50 rewards for good references
Ω
Stuart_Armstrong
1y
Ω
5
2
13
Why we want unbiased learning processes
Stuart_Armstrong
5y
3
2
3
Learning societal values from law as part of an AGI alignment strategy
John Nay
5mo
18
1
42
A Short Dialogue on the Meaning of Reward Functions
Ω
Leon Lang
,
Quintin Pope
,
peligrietzer
4mo
Ω
0
1
30
Thoughts on reward engineering
Ω
paulfchristiano
4y
Ω
30
1
26
The reward engineering problem
Ω
paulfchristiano
4y
Ω
3
1
25
Reward model hacking as a challenge for reward learning
Ω
Erik Jenner
1y
Ω
1
1
17
Utility versus Reward function: partial equivalence
Stuart_Armstrong
5y
5
1
16
Reward functions and updating assumptions can hide a multitude of sins
Ω
Stuart_Armstrong
3y
Ω
2
1
15
An investigation into when agents may be incentivized to manipulate our beliefs.
Felix Hofstätter
6mo
0