Upcoming stability of values

by Stuart_Armstrong 1 min read15th Mar 201815 comments


What would you say to someone old who hadn't changed their values since they were five years old?

What would you say to anyone old who hadn't changed their values since they were eighteen years old?

You'd probably have cause to pity the second and seriously worry about the first. The process of learning, and ageing, inevitably reshape our values and preferences, and we have well-worn narratives about how life circumstances change people.

But we may be entering a whole new era. Human values are malleable, and super-powered AIs may become adept at manipulating them, possibly at the behest of other humans.

Conversely, when we start becoming able to fine-tune our own values, people will start to stabilise their own values, preventing value drift. Especially if human lifespan increases, there will be a strong case to keeping your values close, and not allowing a random walk until it hits an attractor. The more we can self-modify, the more the argument about convergent instrumental goals will apply to us - including stability of terminal goals.

So, assuming human survival, I expect that we can look forward to much greater stability of values in the future, with humans making their values fixed, if only to protect themselves against manipulation.

Possible Consequences

In such a world, the whole narrative of human development will change, with "stages of life" marked more by information, wealth, or position, than by changes in values. Nick Bostrom once discussed "super-babies" - entities that preserved the values of babies but had the intelligence of adults. Indeed, many pre-adolescents would object to going through adolescence, and this is unlikely to be formed on all of them. So we may end up with perpetual pre-adolescents, babies created with adult values, or a completely different maturation process, which didn't involve value changes.

Thus, unlike today, creators/parents will be able to fix the values of their offspring with little risk that these values would change. There are many ways this could go wrong, the most obvious being the eternal conservation of pernicious values and the potential splintering of humanity into incompatible factions - or strict regulations on the creation of new entities, to prevent that happening.

In contrast, interactions between different groups may become more relaxed than previously. Change of values through argumentation or social pressure would no longer be options (and I presume these future humans would be able to cure themselves of patterns of interactions they dislike, such as feeling the need to respond to incendiary comments). So interactions would be between beings that know for a fact they could never convince each other of moral facts, thus removing any need to convince, shame, or proselytise in conversation.

It's also possible that divergent stable values may have less consequences than we think. Partly because there are instrumental reasons for people to compromise on values when interacting with each other. But also because it's not clear how much of human value divergence is actually divergence in factual understanding, or mere tribalism. Factual divergences are much harder to sustain artificially, and tribalism is likely to transform into various long term contracts.

On a personal note, if I had full design capabilities over my own values, I'd want to allow for some slack and moral progress, but constrain my values not to wander too far from their point of origin.