615

LESSWRONG
LW

614
Concept Extrapolation
Coherent Extrapolated VolitionAI
Frontpage

13

Concept extrapolation: key posts

by Stuart_Armstrong
19th Apr 2022
AI Alignment Forum
1 min read
2

13

Ω 6

13

Ω 6

Next:
Different perspectives on concept extrapolation
8 comments48 karma
Log in to save where you left off
Concept extrapolation: key posts
3Quintin Pope
2Stuart_Armstrong
New Comment
2 comments, sorted by
top scoring
Click to highlight new comments since: Today at 2:47 PM
[-]Quintin Pope3y30

I have a comment here that argues many patterns in human values and our generalizations of values emerge from an inner alignment failure in the brain. I’d be interested in hearing your perspective on it and whether it tracks with your own thinking on concept extrapolation.

Reply
[-]Stuart_Armstrong3y20

Thanks for that link. It does seem to correspond intuitively to a lot of the human condition. Though it doesn't really explain value extrapolation, more the starting point from which humans can extrapolate values. Still a fascinating read, thanks!

Reply
Moderation Log
More from Stuart_Armstrong
View more
Curated and popular this week
2Comments
Coherent Extrapolated VolitionAI
Frontpage
Mentioned in
50Transfer learning and generalization-qua-capability in Babbage and Davinci (or, why division is better than Spanish)

Concept extrapolation is the skill of taking a concept, a feature, or a goal that is defined in a narrow training situation... and extrapolating it safely to a more general situation. This more general situation might be very extreme, and the original concept might not make much sense (eg defining "human beings" in terms of quantum fields).

Nevertheless, since training data is always insufficient, key concepts must be extrapolated. And doing so successfully is a skill that humans have to a certain degree, and that an aligned AI would need to possess to a higher extent.

This sequence collects the key posts on concept extrapolation. They are not necessarily to be read in this order; different people will find different posts useful.

  • Different perspectives on concept extrapolation collects many different analogies and models of concept extrapolation, intended for different audiences, and collected together here.

  • Model splintering: moving from one imperfect model to another is the original post on "model splintering" - what happens when features no longer make sense because the world-model has changed. A long post with a lot of overview and motivation explanations, showing that model splintering is a problem with almost all alignment methods.

  • General alignment plus human values, or alignment via human values? shows that concept extrapolation is necessary and almost sufficient for successfully aligning AIs.

  • Value extrapolation, concept extrapolation, model splintering defines and disambiguates key terms: model splintering, value extrapolation, and concept extrapolation.