Value Learning

Oct 29, 2018

by rohinmshah

This is a sequence investigating the feasibility of one approach to AI alignment: value learning.

Preface to the sequence on value learning

646moΩ 18Show Highlight
6

Ambitious Value Learning

What is ambitious value learning?

446moΩ 12Show Highlight
28

The easy goal inference problem is still hard

386moΩ 9Show Highlight
16

Humans can be assigned any values whatsoever…

435moΩ 11Show Highlight
8

Latent Variables and Model Mis-Specification

195moΩ 5Show Highlight
1
2

Future directions for ambitious value learning

425moΩ 12Show Highlight
9

Goals vs Utility Functions

Ambitious value learning aims to give the AI the correct utility function to avoid catastrophe. Given its difficulty, we revisit the arguments for utility functions in the first place.

Intuitions about goal-directed behavior

315moΩ 8Show Highlight
12

Coherence arguments do not imply goal-directed behavior

625mo7 min readΩ 19Show Highlight
23

Will humans build goal-directed agents?

393moΩ 11Show Highlight
41

AI safety without goal-directed behavior

403moΩ 12Show Highlight
10

Narrow Value Learning

What is narrow value learning?

213moΩ 7Show Highlight
3

Ambitious vs. narrow value learning

183moΩ 6Show Highlight
15

Human-AI Interaction

183moΩ 7Show Highlight
7

Reward uncertainty

183moΩ 6Show Highlight
0

The human side of interaction

163moΩ 6Show Highlight
2

Following human norms

233moΩ 9Show Highlight
8

Future directions for narrow value learning

123moΩ 7Show Highlight
4

Conclusion to the sequence on value learning

443moΩ 13Show Highlight
13