Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is a special post for quick takes by Stuart_Armstrong. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Stuart_Armstrong's Shortform

13Stuart_Armstrong

4Dagon

4Stuart_Armstrong

4Dagon

11Stuart_Armstrong

6Stuart_Armstrong

4Stuart_Armstrong

4Vanessa Kosoy

2Stuart_Armstrong

3Diffractor

2Stuart_Armstrong

Lexicographical preference orderings seem to come naturally to humans. Sentiments like "no amount of money is worth one human life" are commonly expressed.

Now, that particular sentiment is wrong because money can be used to purchase human lives.

The other problem comes from using probability and expected utility, which makes anything lexicographically second completely worthless in all realistic cases. It's one thing to say that you prefer apples to pears lexicographically when there are ten of each lying around and everything is deterministic (just take the ten apples first then the ten pears afterwards). But does it make sense to say that you'd prefer one chance in a trillion of extending someone's life by a microsecond, over a billion euros of free consumption?

So this short post will propose a more sensible, smoothed version of lexicographical ordering, suitable to capture the basic intuition, but usable with expected utility.

If the utility has lexicographical priority and is subordinate to it, then choose a value and maximise:

In that case, increases in expected always cause non-trivial increases in expected , but an increase in of will always be more important than any possible increase in .

This seems related to scope insensitivity and availability bias. No amount of money (that I have direct control of) is worth one human life ( in my Dunbar group). No money (which my mind exemplifies as $100k or whatever) is worth the life of my example human, a coworker. Even then, its false, but it's understandable.

More importantly, categorizations of resources (and of people, probably) are map, not territory. The only rational preference ranking is over reachable states of the universe. Or, if you lean a bit far towards skepticism/solopcism, over sums of future experiences.

Preferences exist in the map, in human brains, and we want to port them to the territory with the minimum of distortion.

Oh, wait. I've been treating preferences as territory, though always expressed in map terms (because communication and conscious analysis is map-only). I'll have to think about what it would mean if they were purely map artifacts.

This is a link to "An Increasingly Manipulative Newsfeed" about potential social media manipulation incentives (eg FaceBook).

I'm putting the link here because I keep losing the original post (since it wasn't published by me, but I co-wrote it).

Bayesian agents that knowingly disagree

A minor stub, caveating the Aumann's agreement theorem; put here to reference in future posts, if needed.

Aumann's agreement theorem states that rational agents with common knowledge of each other's beliefs cannot agree to disagree. If they exchange their estimates, they will swiftly come to an agreement.

However, that doesn't mean that agents cannot disagree, indeed they can disagree, and know that they disagree. For example, suppose that there are a thousand doors, and behind of these, there are goats, and behind one there is a flying aircraft carrier. The two agents are in separate rooms, and a host will go into each room and execute the following algorithm: they will choose a door at random among the that contain a goat. And, with probability , they will tell that door number to the agent; with probability , they will tell the door number with the aircraft carrier.

Then each agent will have probability of the named door being the aircraft carrier door, and probability on each of the other doors; so the most likely door is the one named by the host.

We can modify the protocol so that the host will never name the same door to each agent (roll a D100; if it comes up 1, tell the truth to the first agent and lie to the second; if it comes up 2, do the opposite; anything else means tell a different lie to either agent). In that case, each agent will have a best guess for the aircraft carrier, and the certainty that the other agent's best guess is different.

If the agents exchanged information, they would swiftly converge on the same distribution; but until that happens, they disagree, and know that they disagree.

Partial probability distribution

A concept that's useful for some of my research: a partial probability distribution.

That's a that defines for some but not all and (with for being the whole set of outcomes).

This is a partial probability distribution iff there exists a probability distribution that is equal to wherever is defined. Call this a full extension of .

Suppose that is not defined. We can, however, say that is a logical implication of if all full extension has .

Eg: , , will logically imply the value of .

This is a special case of a crisp infradistribution: is equivalent to , a linear equation in , so the set of all 's satisfying it is convex closed.

Sounds like a special case of crisp infradistributions (ie, all partial probability distributions have a unique associated crisp infradistribution)

Given some , we can consider the (nonempty) set of probability distributions equal to where is defined. This set is convex (clearly, a mixture of two probability distributions which agree with about the probability of an event will also agree with about the probability of an event).

Convex (compact) sets of probability distributions = crisp infradistributions.

Here are a few examples of model splintering in the past:

- The concept of honour; which includes concepts such as: "nobility of soul, magnanimity, and a scorn of meanness" [...] personal integrity [...] reputation [...] fame [...] privileges of rank or birth [...] respect [...] consequence of power [...] chastity". That is a grab-bag of different concepts, but in various times and social situations, "honour" was seen as single, clear concept.
- Gender. We're now in a period where people are questioning and redefining gender, but gender has been splintering for a long time. In middle class Victorian England, gender would define so much about a person (dress style, acceptable public attitudes, genitals, right to vote, right to own property if married, whether they would work or not, etc...). In other times (and in other classes of society, and other locations), gender is far less informative.
- Consider a Croat, communist, Yugoslav nationalist in the 1980s. They would be clear in their identity, which would be just one thing. Then the 1990s come along, and all these aspects come into conflict with each other.

Here are a few that might happen in the future; the first two could result from technological change, while the last could come from social change:

- A human subspecies created who want to be left alone without interactions with others, but who are lonely and unhappy when solitary. This splinters preferences and happiness (more than they are today), and changes the standard assumptions about personal freedom and
- A brain, or parts of a human brain, that loop forever through feelings of "I am am happy" and "I want this moment to repeat forever". This splinters happiness-and-preferences from identity.
- We have various ages of consent and responsibility; but, by age 21, most people are taken to be free to make decisions, are held responsible for their actions, and are seen to have a certain level of understanding about their world. With personalised education, varying subcultures, and more precise psychological measurements, we might end up in a world where "maturity" splinters into lots of pieces, with people having different levels of autonomy, responsibility, and freedom in different domains - and these might not be particularly connected with their age.