Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Mentioned in

Toy model piece #1: Partial preferences revisited

5Hazard

2Rohin Shah

3DavidHolmes

2Rohin Shah

4Stuart_Armstrong

2Rohin Shah

4Hazard

2Rohin Shah

4Hazard

2Rohin Shah

4Hazard

2Rohin Shah

2Rohin Shah

2Stuart_Armstrong

1DavidHolmes

2Stuart_Armstrong

1DavidHolmes

3Stuart_Armstrong

New Comment

Just noting that my understanding is that this putting a formalism on what we mean when we say "All things being equal, I prefer more X than less X"

See also CP-nets: A Tool for Representing and Reasoning with Conditional Ceteris Paribus Preference Statements (though note it doesn't turn into a utility function, which as I understand it is a key desideratum).

Thanks for pointing me to this updated version :-). This seems a really neat trick for writing down a utility function that is compatible with the given preorder. I thought a bit more about when/to what extent such a utility function will be unique, in particular if you are given not only the data of a preorder, but also some information on the strengths of the preferences. This ended up a bit too long for a comment, so I wrote a few things in outline here:

https://www.lesswrong.com/posts/7ncFy84ReMFW7TDG6/categorial-preferences-and-utility-functions

It may be quite irrelevant to what you're aiming for here, but I thought it was maybe worth writing down just in case.

I am very confused by the math in this post:

Why must a preorder decompose into disjoint ordered chains? If I have a partial preference and another partial preference how do those induce disjoint ordered chains where worlds between chains are incomparable? Perhaps you are asking us to *assume* that the preorder decomposes into disjoint ordered chains?

How do cycles vanish in ? Can you work through the example where the partial preference expressed by the human is ?

we can extend this to by setting U(w)=U(p(w)).

I think this is extending to ?

which has for all .

Should that be ?

Thanks, corrected a few typos.

Why must a preorder decompose into disjoint ordered chains?

They don't have to; I'm saying that sensible partial preferences (eg ) should do so. I then see how I'd deal with sensible preorders, then generalise to all preorders in the next section.

How do cycles vanish in ? Can you work through the example where the partial preference expressed by the human is ?

Note that what you've written is impossible as means but not . A preorder is transitive, so the best you can get is .

Then projecting down (via ) to will project all these down to the same element. That's why there are no cycles, because all cycles go to points.

Then we need to check some math. Define on by iff .

This is well defined (independently of which and we use to represent and ), because if , then , so, by transitivity, . The same argument works for .

We now want to show the is a partial order on . It's transitive, because if and , then , and the transitivity in implies and hence .

That shows it's a preorder. To show partial order, we need to show there are no cycles. So, if and , then and , hence, by definition of , . So it's a partial order.

For cycles, it looks like the projection to is akin to taking all the worlds that form a given cycle, and compressing them into a single world.

In your example, it's true and when . That's the condition for equivalence in the project, so you have that . If you're thinking about the ordering as a directed graph, you can collapse those worlds to a single point and not mess up the ordering.

Ah yes, that makes sense, thanks! I didn't realize what was the set of equivalence classes of

For the disjoint chain part, first imagine all the worlds in the from and split them into equivalence classes based on equality of . The preference can only compare two worlds that are in the same equivalence class, so each equivalence class will be totally ordered, but no class will be comparable to any other (hence decomposition into disjoint chains).

I thought the (n, v) thing was just an example. If all the worlds are meant to be represented via (n, v), then it seems like you are only ever allowed to give a partial preference based on whatever feature n represents, and you can never talk about any of the other features in v. This seems bad.

(I do agree that if you only give partial preferences on (n, v) worlds in the way you describe, then you get a decomposition into disjoint chains.)

I do believe the OP is talking about partial pref on (n,v) form worlds. Yeah, this seems bad in the "How do I act when looking at different v?" sense, but I get the sense that it's not supposed to answer that question. Or at least Stuart plans to build a lot from here before it will answer that sort of question.

Suppose I express a partial preference over "good worlds" and another one over "bad worlds", for example "when everyone's needs for food, water and shelter are met, then it is better for there to be more social connection" and "when I am living in extreme poverty, I prefer to be in a country with a good social safety net". These talk about mutually exclusive worlds, and so lead to two distinct ordered chains. Then, on average you assign the *same* utility to a good world and a bad world, which seems very bad. How do we avoid this issue?

By adding in a third preference, which explicitely says that extreme poverty is worse than having all needs met.

These are just pieces of the total utility, remember. Even if they are full preferences, they are not *all* our preferences.

This seems really neat, but it seems quite sensitive to how one defines the worlds under consideration, and whether one counts slightly different worlds as actually distinct. Let me try to illustrate this with an example.

Suppose we have a consisting of 7 worlds, , with preferences

and no other non-trivial preferences. Then (from the `sensible case'), I think we get the following utilities:

.

Suppose now that I create two new copies , of the world which each differ by the position of a single atom, so as to give me (extremely weak!) preferences , so all the non-trivial preferences in the new are now summarised as

Then the resulting utilities are (I think):

.

In particular, before adding in these 'trivial copies' we had , and now we get . Is this a problem? It depends on the situation, but to me it suggests that, if using this approach, one needs to be careful in how the worlds are specified, and the 'fine-grainedness' needs to be roughly the same everywhere.

Each partial preference is meant to represent a single mental model inside the human, with all preferences weighted the same (so there can't be "extremely weak" preferences, compared with other preference in the same partial preference). Things like "increased income is better", "more people smiling is better", "being embarrassed on stage is the worse".

We can imagine a partial preference with more internal structure, maybe internal weights, but I'd simply see that as two separate partial preferences. So we'd have the utilities you gave to through to for one partial preference (actually, my formula doubles the numbers you gave), and , , for the other partial preference - which has a very low weight by assumption. So the order of and is not affected.

**EDIT**: I'm pretty sure we can generalise my method for different weights of preferences, by changing the formula that sums the squares of utility difference.

(actually, my formula doubles the numbers you gave)

Are you sure? Suppose we take with , , then , so the values for should be as I gave them. And similarly for , giving values . Or else I have mis-understood your definition?

I'd simply see that as two separate partial preferences

Just to be clear, by "separate partial preference" you mean a separate preorder, on a set of objects which may or may not have some overlap with the objects we considered so far? Then somehow the work is just postponed to the point where we try to combine partial preferences?

EDIT (in reply to your edit): I guess e.g. keeping conditions 1,2,3 the same and instead minimising

where is proportion to the reciprocal of the strength of the preference? Of course there are lots of variants on this!

Yep, sorry, I saw -3, -2, -1, etc... and concluded you weren't doing the 2 jumps; my bad!

Then somehow the work is just postponed to the point where we try to combine partial preferences?

Yes. But unless we have other partial preferences or meta-preferences, then the only resonable way of combining them is just to add them, after weighting.

I like your reciprocal weighting formula. It seems to have good properties.

EDIT: This model is currently obsolete, see here for the most current version.I'm working towards a toy model that will illustrate all the steps in the research agenda. It will start with some algorithmic stand-in for the "human", and proceed to create the UH, following all the steps in that research agenda. So I'll be posting a series of "toy model pieces", that will then be ultimately combined in a full toy model. Along the way, I hope to get a better understanding of how to do the research agenda in practice, and maybe even modify that agenda based on insights making the toy model.

In this post, I'll revisit and re-formalise partial preferences, and then transform them into utility functions.

## The problem with the old definition

My previous model of partial preferences can't capture some very simple mental models, such as P="the more people smile, the better the world is".

This is because the partial preference decomposes the space of worlds locally as Y×Z, fixes two values y− and y+ in Y, and only compares worlds of type (y−,z) and (y+,z) for fixed z. This means that we can only compare worlds with the same z value, and only two of these worlds can be compared: so we can't say w1<w2<w3 for three distinct worlds. Thus we can't say that three people smiling is better than two, which is better than one. Not being able to capture preferences like P is a major flaw.

## New definition: preorder

So now model a partial preference as preorder. A preorder ≤ is a type of ordering that is transitive (if w1≤w2 and w2≤w3, then w1≤w3) and reflexive (w≤w for all worlds w).

The previous type of partial preference can be made into a preorder quite easily: w1<w2 implies w1≤w2, and add w≤w for all worlds w.

Now we can easily express P="the more people smile, the better the world is". Let w(n,v) be a world with n smiling people in it, with v representing all the other relevant variables describing the world. The P is described by the preorder:

For a general preorder ≤, define w<w′ to mean w≤w′ but it not being the case that w′≤w.

## Circular preferences and utility functions

Unfortunately, unlike the previous partial preferences, preorders can allow for circular preferences w1≤w2≤w3≤w1. In practice, most sensible partial preferences will not have circular preferences, and will instead resemble P: just a collection of orderings among separate sets of worlds.

But, it will might be possible to have circular partial preferences, maybe of the type "in Australia, the cities gets nicer as you go clockwise along the coast".

So you need a way of dealing with circular preferences, and with complicated sets of partial preferences that might include some circular preferences.

We also want a way to transform all of these preorders into a full preference, given as a utility function over all worlds. The research agenda calls for aggregating similar preferences before making them into full preferences, but we still need some way of doing this in the cases where we have a single partial preference. The rest of this section will show one way of doing this.

## The sensible case

In the simplest case, we'd have a partial preference such as "these ten worlds are good, in this order", and we'd map them to a utility function with equal spaces between each world. And we wouldn't say these ten worlds were all better or all worse than the other worlds not mentioned.

And we can do that, in the most likely and sensible case. Take P and its preorder ≤ (and <): under this partial preference, the worlds decompose into simple ordered chains. That means that if W is the set of worlds, then it decomposes as a disjoint union W=⋃i∈IWi (for some indexing set I). These sets are incomparable: if wi∈Wi and wj∈Wj, then neither wi≤wj nor wj≤wi.

Moreover, each of these Wi is totally ordered by <: so we can index any wi∈Wi by some natural number k, as wik, and say that wik<wil if and only if k<l. Let ni=||Wi||−1 be the size of Wi minus one, and order the elements of it from 0 upwards: so the set is ordered as wi0<wi1<…<wini.

Then here is how we construct a utility function U that extends the partial order:

This means that if w is incomparable with all other worlds (ie, is not relevant to the partial preference), the U(w)=0, and that all chains Wi have utilities U(wi0)=−ni, U(wi1)=−ni+2, …, U(wini)=ni. So they are symmetric around 0.

## The general case

Here I'll give a way of generalising the above procedure to the case of any preorder. Note that this situation should only come up very rarely, since most preorders derived from mental models will be more sensible. Still, for completeness, here is a process that extends the simple model:

First, to get rid of circular preferences, project the set of worlds W to ¯¯¯¯¯¯W, by using the equivalence relation w≅w′ means w≤w′ and w′≤w. Call this projection p. The preorder on W descends via p to a partial order. So we now work in ¯¯¯¯¯¯W, which has no circular preferences. Then if we assign a utility U to ¯¯¯¯¯¯W, we can extend this to W by setting U(w)=U(p(w)).

Now working in ¯¯¯¯¯¯W, define a link between two (equivalence class of) worlds w and w′. Write w←w to say that w<w′, and that there does not exist any world v∈¯¯¯¯¯¯W with w<v<w′.

Now, decompose ¯¯¯¯¯¯W as collection of disjoint sets ⋃i∈IWi (for some indexing set I). Two worlds w and w′ are in the set Wi if you can get from one to another following links; ie if there exists worlds wik with w=wi0 and w′=wil and for all k, either wik←wi(k+1) or wi(k+1)←wik.

Let ni be the number of links in Wi. We'll now assign utility to elements of Wi, as a constrained optimisation process; in the following, all worlds are assumed to be in Wi:

minimiseg(U)=∑w←w′(U(w′)−U(w))2, subject to the constraints that:As long as U is not constant on Wi, then conditions 2. and 3. just fix a scaling and a translation of U. Condition 1. means that condition 2. is equivalent to ∑w←w′|U(w′)−U(w)|=2ni, since U(w)≤U(w′) for all w<w′, which includes all w←w′.

This is a convex optimisation problem in the values of the U(w), since g(U) is convex in these values. Note that it is not strictly convex, since translations leave it unchanged: g(U)=g(U+c). Nevertheless, it can be seen

^{[1]}that this equation is strictly convex on the subspace defined by condition 3. Therefore there is a single solution to the minimisation problem above.## Extending the sensible case

It's not hard to see that this extends the sensible model above, which has U(w′)−U(w)=2 for all w←w′ (a little more work is required to show that having all those values equal minimises the sum ∑w←w′(U(w′)−U(w))2).

## The final version of partial preferences

Is this the final version of partial preferences? No, certainly not. But to get a better generalisation, we're going to have to have a look at how people actually model things inside their brains and thought processes. Hence the question of how best to model partial models will be an empirical one. But this very general definition will suffice for the moment.

Very rough argument: choose some ordering for the worlds in Wi, write xi for U(wi), and set ¯¯¯x=(x0,…xm). Then, since g is a quadratic with only quadratic terms, we can write it as its own Hessian: g(¯¯¯x)=¯¯¯xTH(g)2¯¯¯x.

Now assume that ¯¯¯xTH(g)2¯¯¯x=0, for ¯¯¯x=(r0,r1,…rm). However, g is the sum of non-negative terms, so this is only possible if all of the ∑w←w′(U(w′)−U(w))2 are zero. This is only possible if U(w′)=U(w) whenever w←w′; thus, since Wi is connected by links, U must be constant on Wi. In other words, only ¯¯¯x=(r,r,…r) allows ¯¯¯xTH(g)2¯¯¯x=0.

Thus H(g) has only one zero eigenvector, corresponding to translations. Since condition 3. precludes additional translations, H(g) is strictly positive definite on the subspace it defines. Hence g is strictly convex on this space. ↩︎