Broad Picture of Human Values

A twenty-years-old doesn't value all the same things their fifty-years-old self will value

I have heard this often, but is it really so? When I think about my past self, it seems to me that I am actually more coherent than the society keeps telling me. But of course, maybe I am just lying to myself, rewriting my memory to believe that my past self shared my current values.

I am really curious whether my 20 or 30 years younger self would be okay with my current values and behavior, perhaps after hearing about experience they didn't have yet.

How could we test this experimentally, without a time machine? By giving young people a values questionnaire, including a lot of hypotheticals, such as "if it turned out after repeated attempts that X does not work, would it be okay to give up on X?", then calling them 10 years later and comparing the answers?

[-]Nathan Helm-Burger3y40

This is so great, I find that this so satisfyingly ties together a bunch of piecemeal understandings in my head. Maybe it's not worth getting into, because it's more about understanding humans than the general case of shard-based agents, but... Human brains have a lot of weird bugs that can lead to accidental shard creations / shifts and other stuff like optical illusions or certain drugs being more addictive than would be predicted based on the amount of subjective pleasure they seem to deliver based on idiosyncrasies of how they activate the reward systems. Or like how the local plasticity of the cortex, which allows modules to learn, and also allows for local learning to reallocate module territory on the borders from one module to the other, can lead sometimes to information leaks between modules that can end up accidentally reinforced. Like sensory leaks between skin areas which aren't physically co-located on the body but whose receptive fields in the brain are next to each other and compete for territory. That's an example of something I wouldn't attempt to reproduce if I were trying to make a shard-based / brain-like agent.

[-]Charlie Steiner3y31

But the planner's actual terminal value of satisfying the shard economy's weighted preferences...

Suppose I have a heuristic that fires strongly when I'm eating cupcakes or thinking thoughts that will lead to eating cupcakes, and then a control algorithm that makes my muscles fire to make things happen that correspond to thoughts that a heuristic rates highly, and this control algorithm is hooked up to the cupcake heuristic.

Saying that the "actual terminal value" of the control algorithm is to satisfy whatever heuristic is in its input slot (and so changing the heuristic to one that fires for strawberries wouldn't be a big deal) is kinda wrong. Saying that the "actual terminal value" of the control algorithm is only cupcakes and nothing else would make sense is also kinda wrong. They're both kinda wrong because trying to declare one thing the "actual terminal value" is the wrong exercise to be engaging in in the first place!

This is related to my other warning about the word "actual": this idea that you're "actually" the control algorithm and not the cupcake heuristic. There are multiple ways to think about you that work better or worse in different contexts (Since I just finished editing a sequence about this, I will shamelessly link it). I am so large I don't just contain multitudes, I contain multitudes of ways of parceling myself up into multitudes.

[-]Thane Ruthenis3y1-3

They're both kinda wrong because trying to declare one thing the "actual terminal value" is the wrong exercise to be engaging in in the first place!

I disagree. I'm not talking about the intentional stance or such "external" descriptions. I'm claiming that if you took the explicit algorithmic implementation of the human mind and looked over it, you would find some kind of distinct "planner" part, and that part would be something like an idealized utility-maximizer with a pointer to the shard economy in place of its utility function.

It's not a frame that can be kinda wrong/awkward to use. It's a specific mechanistic prediction that's either flat-out right or flat-out wrong.

This is related to my other warning about the word "actual": this idea that you're "actually" the control algorithm and not the cupcake heuristic

Mm, I'm more willing to relax this assumption. It ties into my model of self-awareness — I suspect it might be the case that the planner is the thing that's being fed summaries of the brain's state, making it literally the thing that's having qualia. But I haven't fully worked out my model of that.

[-]jacob_cannell2y20

I suspect that much of the appeal of shard theory is working through detailed explanations of model-free RL with general value function approximation for people who mostly think of AI in terms of planning/search/consequentialism.

But if you already come from a model-free RL value approx perspective, shard theory seems more natural.

Moment to moment decisions are made based on value-function bids, with little to no direct connection to reward or terminal values. The 'shards' are just what learned value-function approximating subcircuits look like in gory detail.

The brain may have a prior towards planning subcircuitry, but even without a strong prior planning submodules will eventually emerge naturally in a model-free RL learning machine of sufficient scale (there is no fundamental difference between model-free and model-based for universal learners). TD like updates ensure that the value function extends over longer timescales as training progresses. (and in general humans seem to plan on timescales which scale with their lifespan, as you'd expect)

[-]Jack R3y20

Humans can change their action patterns on a dime, inspired by philosophical arguments, convinced by logic, indoctrinated by political or religious rhetoric, or plainly because they're forced to.

I'd add that action patterns can change for reasons other than logical/deliberative ones. For example, adapting to a new culture means you might adopt and have new reactions to objects, gestures, etc that are considered symbolic in that culture.

^{^}

Or, if we've experienced an ontology break so serious as to invalidate all of our constituent shards, as long as there's the potential for new shards to be formed, which will be adapted to the new world-model.

LESSWRONG
LW

LESSWRONG
LW

42

Broad Picture of Human Values

42

Ω 18

42

Ω 18

1. The Shard Theory of Human Value: A Recap

2. The Gaps in the Picture

3. An Attempt At Reconciliation

4. Nice Things About This Framework

5. Closing Thoughts

Acknowledgements