x

By default, avoid ambiguous distant situations — LessWrong

33

By default, avoid ambiguous distant situations

by Stuart_Armstrong

21st May 2019

AI Alignment Forum

3 min read

33

Ω 13

Ethics & MoralityWorld Modeling

33

Ω 13

By default, avoid ambiguous distant situations

4Stuart_Armstrong

4Stuart_Armstrong

7Stuart_Armstrong

3Stuart_Armstrong

2Stuart_Armstrong

2Stuart_Armstrong

9Stuart_Armstrong

New Comment

17 comments, sorted by

Click to highlight new comments since: Today at 7:02 PM

[-]Gurkenglas7y*100

A seems like more of a theorem than an axiom: We are trying to maximize our estimate of human preferences, so upward errors are going to have more effect on the outcome than downward errors. Therefore, in case of doubt, guess low. And if we keep track of a probability distribution over candidate estimates rather than a point estimate, A dissolves like the Fermi paradox.

[-]Stuart_Armstrong7y40

Interesting take; I'll consider it more...

[-]Gurkenglas2y*40

4 years later: Do you agree now?

[-]Stuart_Armstrong2y40

I believe I do.

[-]mako yass7y10

so upward errors are going to have more effect on the outcome than downward errors

Could you explain this step

[-]Gurkenglas7y80

If we are good at maximizing, we will bring about whatever situation we judged best. If somewhere across the many possible situations to judge we make an extremely upward error, we will bring about that situation and end up with that random outcome. If the error is downward, we will only fail to bring about that situation, which is an opportunity cost if that was one of the better outcomes but not a disaster.

[-]Davidmanheim7y*30

I'm going to try thinking about this by applying the reversal heuristic.

If a smarter and/or less evil person had magicked house elves into existence, so that they were mentally incapable of understanding what freedom would entail, instead of just enjoying servitude, should we change them? Equivalently, if we have a world where everyone is happier than this one because their desires are eusocial and fully compatible with each other, but liberty and prestige are literally impossible to conceive of, should we change back? If that world existed, or we found those aliens, should they be "freed" to make them appreciate liberty, when the concept never occurred to them?

OK, now we can ask the question - should we change from our world to one where people are not culturally molded to appreciate any of our current values? Let's say cultural pressures didn't exist, and values emerged from allowing people, starting from when they are babies, to have whatever they want. This is accomplished by non-sentient robots that can read brainwaves and fulfill desires immediately. IS that better, or should we move towards a future where we continue to culturally engineer our children to have a specific set of desires, those we care about - for pathos, prestige, freedom, etc?

Or should we change the world from our current one to one where people's values are eusocial by design? Where being sacrificed for the greater good was pleasant, and the idea of selfishness was impossible?

At the end of this, I'm left with a feeling that yes, I agree that these are actually ambiguous, and "an explicit level of status quo bias to our preferences" is in fact justified.

[-]Stuart_Armstrong7y70

had magicked house elves into existence [...] should we change them?

I'm explicitly arguing that even though we might not want to change them, we could still prefer they not exist in the first place.

should we change from our world to one where people are not culturally molded to appreciate any of our current values?

I'm trying to synthesise actual human values, not hypothetical other values that other beings might have. So in this process, our current values (or our current meta-preferences for our future values) get special place. If we had different values currently, the synthesis would be different. So that would-change is, from our perspective, a loss.

[-]Davidmanheim7y30

Agreed. I'm just trying to think through why we should / should not privilege the status quo. I notice I'm confused about this, since the reversal heuristic implies we shouldn't. If we take this approach to an extreme, aren't we locking in the status-quo as a base for allowing only pareto improvements, rather than overall utilitarian gains?

(I'll note that Eric Drexler's Pareto-topia argument explicitly allows for this condition - I'm just wondering whether it is ideal, or a necessary compromise.)

[-]Stuart_Armstrong7y30

It's locking in the moral/preference status quo; once that's done, non-Pareto overall gains are fine.

Even when locking in that status quo, it explicitly trades off certain values against others, so there is no "only Pareto" restriction.

I have a research agenda to be published soon that will look into these issues in more detail.

I'm trying to synthesise actual human values, not hypothetical other values that other beings might have.

To be clear, when you say "actual human values", do you mean anything different than just "the values of the humans alive today, in the year 2019"? You mention "other beings" - is this meant to include other humans in the past who might have held different values?

[-]Stuart_Armstrong7y20

The aim is to be even more specific - the values of a specific human at a specific time. Then what we do with these syntheses is another point, how much change to allow, etc... Including other humans in the past is a choice that we then need to make, or not.

I see, thank you. So then, would you say this doesn't & isn't intended to answer any question like "whose perspective should be taken into account?", but that it instead assumes some answer to that question has already been specified, & is meant to address what to do given this chosen perspective?

[-]Stuart_Armstrong7y20

It doesn't intend to answer those questions; but those questions become a lot easier to answer once this issue is solved.

Consider a similar situation without creating a race: some wizard brainwashes an existing person into becoming a willing slave. Is it moral to thwart the preferences of the brainwashed person bu not enslaving him, or by forcibly modifying his brain to desire freedom again? Most people would say yes.

You might argue that there is a difference (for instance, you might think that forcibly changing preferences is different from creating a being with unusual preferences) but it may be useful to spell out those differences and distinguish between objections that are affected by those differences and objections which are not.

[-]Stuart_Armstrong7y90

Main difference between brainwashing and creating: the pre-brainwashed person had preferences about their future selves that are partially satisfied by reverting back.

the pre-brainwashed person had preferences about their future selves

That would qualify as

for instance, you might think that forcibly changing preferences is different from creating a being with unusual preferences

Also, it's possible for people to have preferences about either their descendants, or about other sentient beings, just like they have preferences about their future selves. In fact, I would suggest that pretty much all the opposition to the idea is because people have preferences about their descendants or about other sentient beings. Again, it may be useful to spell out why you think those preferences merit less respect than preferences about one's future self.

(Note that some answers to this require making assumptions about how to aggregate preferences, that are also serious points of disagreement. Fo instance, you might say that if you create a lot of slaves, the preferences of that large number should have a large weight. Such assumptions can also be questioned, and by most people, would be questioned.)

More from Stuart_Armstrong

Curated and popular this week

17

"The Ood.[...] They're born for it. Basic slave race." Mr. Jefferson, The Impossible Planet.

"And although he may be poor, he shall never be a slave,", from the Battle Cry of Freedom.

I've talked about morally distant situations: situations far removed from our own. This distance is in the moral sense, not the actual sense: ancient China is further from us than Star Wars is (even for modern Chinese). There are possible worlds out there far more alien than anything in most of our fiction.

In these distant situations, our usual web of connotations falls apart, and closely related terms start to mean different things. As shown in this post, in extreme cases, our preferences can become nonsensical.

Note that not all our preferences need become nonsensical, just some of them. Consider the case of a slave race - a willing slave race. In that situation, our connotations about slavery may come apart: willing and slave don't fit well together. However, we would still be clear ourselves that we didn't want to be slaves. So though some preferences lose power, some do not.

But let's return to that willing slave race situation. Eliezer's Harry says:

Whoever had created house elves in the first place had been unspeakably evil, obviously; but that didn't mean Hermione was doing the right thing now by denying sentient beings the drudgery they had been shaped to enjoy.)

Or consider Douglas Adam cow-variant bread for willingly being eaten:

"I just don't want to eat an animal that's standing here inviting me to," said Arthur, "it's heartless."

"Better than eating an animal that doesn't want to be eaten," said Zaphod. [...]

"May I urge you to consider my liver?" asked the animal, "it must be very rich and tender by now, I've been force-feeding myself for months."

At X, the decision is clear. Should we go to X in the first place?

In those situations, some immediate actions are pretty clear. There is no point in freeing a willing slave race; there is no advantage to eating an animal that doesn't want to be eaten, rather than one that does.

The longer-term actions are more ambiguous, especially as they conflict with other of our values: for example, should we forcibly change the preferences of the slave race/edible race so that they don't have those odd preferences any more? Does it make a difference if there are more manipulative paths that achieve the same results, without directly forcing them? We may not want to allow manipulative paths to count as acceptable in general.

But, laying that aside, it seems there is a prima facie case that we shouldn't enter those kinds of situations. That non-conscious robots are better than conscious willing slaves. That vat grown meat is better than conscious willing livestock.

So there seems to be a good rule of thumb: don't go there. Add an axiom A:

A: When the web of connotations of a strong preference falls apart, those are situations which should get an automatic penalty. Initially at least, those should be treated as bad situations worth avoiding.

Default weights in distant situations

When a web of connotation unravels, the preferences normally end up weaker than initially, because some of the connotations of those preferences are lost or even opposite. So, normally, preferences in these distant situations are quite weak.

But here I'm suggesting adding an explicit meta-preference to these situations. And one that the human subject might not have themselves. This doesn't fit in the formalism of this post. In the language of the forthcoming research agenda, this is a "Global meta-preferences about the outcome of the synthesis process".

Isn't this an overriding of the person's preference? It is, to some extent. But note the "Initially at least" clause in A. If we don't have other preferences about the distant situation, it should be avoided. But this penalty can be overcome by other considerations.

For example, the previous standard web of connotations for sexuality has fallen apart, while the gender one is unravelling; it's perfectly possible to have meta-preferences that would have told us to respect our reflection on issues like that, and our reflection might be fine with these new situations. Similarly, some (but not all) future changes to the human condition are things that would worry me initially but that I'd be ok with upon reflection; I myself have strong meta-preferences that these should be acceptable.

But for situations where our other preferences and meta-preferences don't weigh in, A would downgrade these distant worlds as a default (dis)preference. This adds an explicit level of status quo bias to our preferences, which I feel is justified: better to be prudent rather than reckless where our preferences and values are concerned. The time for (potential) recklessness is in the implementation of these values, not their definition.