This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Reward Functions
•
Applied to
Speedrun ruiner research idea
by
lukehmiles
11d
ago
•
Applied to
Utility ≠ Reward
by
Oliver Sourbut
4mo
ago
•
Applied to
Intrinsic Drives and Extrinsic Misuse: Two Intertwined Risks of AI
by
jacobjacob
6mo
ago
•
Applied to
VLM-RM: Specifying Rewards with Natural Language
by
ChengCheng
6mo
ago
•
Applied to
Some alignment ideas
by
SelonNerias
9mo
ago
•
Applied to
self-improvement-executors are not goal-maximizers
by
bhauth
11mo
ago
•
Applied to
Shutdown-Seeking AI
by
Simon Goldstein
11mo
ago
•
Applied to
Language Agents Reduce the Risk of Existential Catastrophe
by
cdkg
11mo
ago
•
Applied to
A Short Dialogue on the Meaning of Reward Functions
by
Leon Lang
1y
ago
•
Applied to
Learning societal values from law as part of an AGI alignment strategy
by
John Nay
2y
ago
•
Applied to
Scaling Laws for Reward Model Overoptimization
by
David Gross
2y
ago
•
Applied to
Four usages of "loss" in AI
by
TurnTrout
2y
ago
•
Applied to
Reward IS the Optimization Target
by
RobertM
2y
ago
•
Applied to
Leveraging Legal Informatics to Align AI
by
John Nay
2y
ago
•
Applied to
An investigation into when agents may be incentivized to manipulate our beliefs.
by
RobertM
2y
ago
•
Applied to
Seriously, what goes wrong with "reward the agent when it makes you smile"?
by
TurnTrout
2y
ago
•
Applied to
Reward is not the optimization target
by
TurnTrout
2y
ago
•
Applied to
Reward model hacking as a challenge for reward learning
by
Erik Jenner
2y
ago