LESSWRONG
LW

510
Wikitags

Value Drift

Edited by Dakara last updated 20th Nov 2024

Value drift refers to the idea that over time, the values or goals of a person or an AI system can change, often in ways that weren’t originally intended.

For humans, this might happen as life experiences, personal growth, or external influences cause someone's beliefs to evolve.

For AI, it could occur if the system starts to interpret its goals differently as it learns and interacts with the world.

Subscribe
Discussion
1
Subscribe
Discussion
1
Posts tagged Value Drift
632Schelling fences on slippery slopes
Scott Alexander
14y
251
48Understanding and avoiding value drift
Ω
TurnTrout
3y
Ω
14
18Straight-edge Warning Against Physical Intimacy
Raphaëll
5y
42
28Would I think for ten thousand years?
Ω
Stuart_Armstrong
7y
Ω
13
4Let Values Drift
Gordon Seidoh Worley
6y
19
39Predicting Parental Emotional Changes?
jefftk
3y
11
37Gandhi, murder pills, and mental illness
erratio
15y
16
33Mahatma Armstrong: CEVed to death.
Stuart_Armstrong
12y
62
15Upcoming stability of values
Stuart_Armstrong
8y
15
5Is value drift net-positive, net-negative, or neither?
Q
MarisaJurczyk
6y
Q
3
12New Hackathon: Robustness to distribution changes and ambiguity
Charbel-Raphaël
3y
3
Add Posts