LESSWRONG
LW

2118
The Theoretical Foundations of Reward Learning

The Theoretical Foundations of Reward Learning

Feb 28, 2025 by Joar Skalse

In this sequence I provide an overview of the theoretical reward learning research agenda, including its motivating assumptions, several core results, and some starting points for how to contribute to it further.

27The Theoretical Reward Learning Research Agenda: Introduction and Motivation
Ω
Joar Skalse
8mo
Ω
4
16Partial Identifiability in Reward Learning
Ω
Joar Skalse
8mo
Ω
0
19Misspecification in Inverse Reinforcement Learning
Ω
Joar Skalse
8mo
Ω
0
11STARC: A General Framework For Quantifying Differences Between Reward Functions
Ω
Joar Skalse
8mo
Ω
0
9Misspecification in Inverse Reinforcement Learning - Part II
Ω
Joar Skalse
8mo
Ω
0
15Defining and Characterising Reward Hacking
Ω
Joar Skalse
8mo
Ω
0
16Other Papers About the Theory of Reward Learning
Ω
Joar Skalse
8mo
Ω
0
16How to Contribute to Theoretical Reward Learning Research
Ω
Joar Skalse
8mo
Ω
0