LESSWRONG
LW

1960
Wikitags

Reward Functions

Edited by Dakara last updated 30th Dec 2024

Reward Function is a mathematical function in reinforcement learning that defines what actions or outcomes are desirable for an AI system by assigning numerical values (rewards) to different states or state-action pairs. It essentially encodes the goals and preferences we want the AI to optimize for, though specifying appropriate reward functions that avoid unintended consequences is a significant challenge in AI development.

Subscribe
Discussion
1
Subscribe
Discussion
1
Posts tagged Reward Functions
5
380Reward is not the optimization target
Ω
TurnTrout
3y
Ω
127
5
47Draft papers for REALab and Decoupled Approval on tampering
Ω
Jonathan Uesato, Ramana Kumar
5y
Ω
2
3
81Reward hacking behavior can generalize across tasks
Ω
Kei Nishimura-Gasparian, Isaac Dunn, Henry Sleight, Miles Turpin, evhub, Carson Denison, Ethan Perez
1y
Ω
5
2
103Scaling Laws for Reward Model Overoptimization
Ω
leogao, John Schulman, Jacob_Hilton
3y
Ω
13
2
87Seriously, what goes wrong with "reward the agent when it makes you smile"?
QΩ
TurnTrout, johnswentworth
3y
QΩ
43
2
75Interpreting Preference Models w/ Sparse Autoencoders
Ω
Logan Riggs, Jannik Brinkmann
1y
Ω
12
2
46Four usages of "loss" in AI
Ω
TurnTrout
3y
Ω
18
2
43A quick list of reward hacking interventions
Ω
Alex Mallen
5mo
Ω
5
2
40Intrinsic Drives and Extrinsic Misuse: Two Intertwined Risks of AI
jsteinhardt
2y
0
2
39Language Agents Reduce the Risk of Existential Catastrophe
Ω
cdkg, Simon Goldstein
2y
Ω
14
2
37When is reward ever the optimization target?
Q
Noosphere89, gwern
1y
Q
17
2
27Security Mindset: Hacking Pinball High Scores
gwern
6mo
3
2
20$100/$50 rewards for good references
Ω
Stuart_Armstrong
4y
Ω
5
2
13Why we want unbiased learning processes
Stuart_Armstrong
8y
3
2
5Learning societal values from law as part of an AGI alignment strategy
John Nay
3y
18
Load More (15/43)
Add Posts