This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Reward Functions
•
Applied to
self-improvement-executors are not goal-maximizers
by
bhauth
15d
ago
•
Applied to
Shutdown-Seeking AI
by
Simon Goldstein
16d
ago
•
Applied to
Language Agents Reduce the Risk of Existential Catastrophe
by
cdkg
19d
ago
•
Applied to
A Short Dialogue on the Meaning of Reward Functions
by
Leon Lang
7mo
ago
•
Applied to
Learning societal values from law as part of an AGI alignment strategy
by
John Nay
8mo
ago
•
Applied to
Scaling Laws for Reward Model Overoptimization
by
David Gross
8mo
ago
•
Applied to
Four usages of "loss" in AI
by
TurnTrout
9mo
ago
•
Applied to
Reward IS the Optimization Target
by
RobertM
9mo
ago
•
Applied to
Leveraging Legal Informatics to Align AI
by
John Nay
9mo
ago
•
Applied to
An investigation into when agents may be incentivized to manipulate our beliefs.
by
RobertM
9mo
ago
•
Applied to
Seriously, what goes wrong with "reward the agent when it makes you smile"?
by
TurnTrout
10mo
ago
•
Applied to
Reward is not the optimization target
by
TurnTrout
1y
ago
•
Applied to
Reward model hacking as a challenge for reward learning
by
Erik Jenner
1y
ago
•
Applied to
Demanding and Designing Aligned Cognitive Architectures
by
Koen.Holtman
1y
ago
•
Applied to
$100/$50 rewards for good references
by
Ruby
2y
ago
•
Applied to
Draft papers for REALab and Decoupled Approval on tampering
by
adamShimi
3y
ago
•
Applied to
The reward engineering problem
by
Gyrodiot
3y
ago
•
Applied to
Probabilities, weights, sums: pretty much the same for reward functions
by
Gyrodiot
3y
ago