LESSWRONG
LW

Concept Extrapolation

Apr 16, 2022 by Stuart_Armstrong

This sequence collects the key posts on concept extrapolation. They are not necessarily to be read in this order; different people will find different posts useful.

Concept extrapolation is the skill of taking a concept, a feature, or a goal that is defined in a narrow training situation... and extrapolating it safely to a more general situation. This more general situation might be very extreme, and the original concept might not make much sense (eg defining "human beings" in terms of quantum fields).

Nevertheless, since training data is always insufficient, key concepts must be extrapolated. And doing so successfully is a skill that humans have to a certain degree, and that an aligned AI would need to possess to a higher extent.

13Concept extrapolation: key posts
Ω
Stuart_Armstrong
3y
Ω
2
48Different perspectives on concept extrapolation
Ω
Stuart_Armstrong
3y
Ω
8
79Model splintering: moving from one imperfect model to another
Ω
Stuart_Armstrong
5y
Ω
10
50General alignment plus human values, or alignment via human values?
Ω
Stuart_Armstrong
4y
Ω
27
16Value extrapolation, concept extrapolation, model splintering
Ω
Stuart_Armstrong
3y
Ω
1