# Ω 5

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

EDIT: This model is currently obsolete, see here for the most current version.

I'm working towards a toy model that will illustrate all the steps in the research agenda. It will start with some algorithmic stand-in for the "human", and proceed to create the , following all the steps in that research agenda. So I'll be posting a series of "toy model pieces", that will then be ultimately combined in a full toy model. Along the way, I hope to get a better understanding of how to do the research agenda in practice, and maybe even modify that agenda based on insights making the toy model.

In this post, I'll revisit and re-formalise partial preferences, and then transform them into utility functions.

# The problem with the old definition

My previous model of partial preferences can't capture some very simple mental models, such as "the more people smile, the better the world is".

This is because the partial preference decomposes the space of worlds locally as , fixes two values and in , and only compares worlds of type and for fixed . This means that we can only compare worlds with the same value, and only two of these worlds can be compared: so we can't say for three distinct worlds. Thus we can't say that three people smiling is better than two, which is better than one. Not being able to capture preferences like is a major flaw.

# New definition: preorder

So now model a partial preference as preorder. A preorder is a type of ordering that is transitive (if and , then ) and reflexive ( for all worlds ).

The previous type of partial preference can be made into a preorder quite easily: implies , and add for all worlds .

Now we can easily express "the more people smile, the better the world is". Let be a world with smiling people in it, with representing all the other relevant variables describing the world. The is described by the preorder:

• if and only if and .

For a general preorder , define to mean but it not being the case that .

# Circular preferences and utility functions

Unfortunately, unlike the previous partial preferences, preorders can allow for circular preferences . In practice, most sensible partial preferences will not have circular preferences, and will instead resemble : just a collection of orderings among separate sets of worlds.

But, it will might be possible to have circular partial preferences, maybe of the type "in Australia, the cities gets nicer as you go clockwise along the coast".

So you need a way of dealing with circular preferences, and with complicated sets of partial preferences that might include some circular preferences.

We also want a way to transform all of these preorders into a full preference, given as a utility function over all worlds. The research agenda calls for aggregating similar preferences before making them into full preferences, but we still need some way of doing this in the cases where we have a single partial preference. The rest of this section will show one way of doing this.

## The sensible case

In the simplest case, we'd have a partial preference such as "these ten worlds are good, in this order", and we'd map them to a utility function with equal spaces between each world. And we wouldn't say these ten worlds were all better or all worse than the other worlds not mentioned.

And we can do that, in the most likely and sensible case. Take and its preorder (and ): under this partial preference, the worlds decompose into simple ordered chains. That means that if is the set of worlds, then it decomposes as a disjoint union (for some indexing set ). These sets are incomparable: if and , then neither nor .

Moreover, each of these is totally ordered by : so we can index any by some natural number , as , and say that if and only if . Let be the size of minus one, and order the elements of it from upwards: so the set is ordered as .

Then here is how we construct a utility function that extends the partial order:

• For all , , set to .

This means that if is incomparable with all other worlds (ie, is not relevant to the partial preference), the , and that all chains have utilities , , , . So they are symmetric around .

## The general case

Here I'll give a way of generalising the above procedure to the case of any preorder. Note that this situation should only come up very rarely, since most preorders derived from mental models will be more sensible. Still, for completeness, here is a process that extends the simple model:

First, to get rid of circular preferences, project the set of worlds to , by using the equivalence relation means and . Call this projection . The preorder on descends via to a partial order. So we now work in , which has no circular preferences. Then if we assign a utility to , we can extend this to by setting .

Now working in , define a link between two (equivalence class of) worlds and . Write to say that , and that there does not exist any world with .

Now, decompose as collection of disjoint sets (for some indexing set ). Two worlds and are in the set if you can get from one to another following links; ie if there exists worlds with and and for all , either or .

Let be the number of links in . We'll now assign utility to elements of , as a constrained optimisation process; in the following, all worlds are assumed to be in :

• minimise , subject to the constraints that:
1. if , then .
2. ,
3. .

As long as is not constant on , then conditions 2. and 3. just fix a scaling and a translation of . Condition 1. means that condition 2. is equivalent to , since for all , which includes all .

This is a convex optimisation problem in the values of the , since is convex in these values. Note that it is not strictly convex, since translations leave it unchanged: . Nevertheless, it can be seen[1] that this equation is strictly convex on the subspace defined by condition 3. Therefore there is a single solution to the minimisation problem above.

## Extending the sensible case

It's not hard to see that this extends the sensible model above, which has for all (a little more work is required to show that having all those values equal minimises the sum ).

# The final version of partial preferences

Is this the final version of partial preferences? No, certainly not. But to get a better generalisation, we're going to have to have a look at how people actually model things inside their brains and thought processes. Hence the question of how best to model partial models will be an empirical one. But this very general definition will suffice for the moment.

1. Very rough argument: choose some ordering for the worlds in , write for , and set . Then, since is a quadratic with only quadratic terms, we can write it as its own Hessian: .

Now assume that , for . However, is the sum of non-negative terms, so this is only possible if all of the are zero. This is only possible if whenever ; thus, since is connected by links, must be constant on . In other words, only allows .

Thus has only one zero eigenvector, corresponding to translations. Since condition 3. precludes additional translations, is strictly positive definite on the subspace it defines. Hence is strictly convex on this space. ↩︎

New Comment

Just noting that my understanding is that this putting a formalism on what we mean when we say "All things being equal, I prefer more X than less X"

See also CP-nets: A Tool for Representing and Reasoning with Conditional Ceteris Paribus Preference Statements (though note it doesn't turn into a utility function, which as I understand it is a key desideratum).

Thanks for pointing me to this updated version :-). This seems a really neat trick for writing down a utility function that is compatible with the given preorder. I thought a bit more about when/to what extent such a utility function will be unique, in particular if you are given not only the data of a preorder, but also some information on the strengths of the preferences. This ended up a bit too long for a comment, so I wrote a few things in outline here:

https://www.lesswrong.com/posts/7ncFy84ReMFW7TDG6/categorial-preferences-and-utility-functions

It may be quite irrelevant to what you're aiming for here, but I thought it was maybe worth writing down just in case.

I am very confused by the math in this post:

Why must a preorder decompose into disjoint ordered chains? If I have a partial preference and another partial preference how do those induce disjoint ordered chains where worlds between chains are incomparable? Perhaps you are asking us to assume that the preorder decomposes into disjoint ordered chains?

How do cycles vanish in ? Can you work through the example where the partial preference expressed by the human is ?

we can extend this to by setting U(w)=U(p(w)).

I think this is extending to ?

which has for all .

Should that be ?

Thanks, corrected a few typos.

Why must a preorder decompose into disjoint ordered chains?

They don't have to; I'm saying that sensible partial preferences (eg ) should do so. I then see how I'd deal with sensible preorders, then generalise to all preorders in the next section.

How do cycles vanish in ? Can you work through the example where the partial preference expressed by the human is ?

Note that what you've written is impossible as means but not . A preorder is transitive, so the best you can get is .

Then projecting down (via ) to will project all these down to the same element. That's why there are no cycles, because all cycles go to points.

Then we need to check some math. Define on by iff .

This is well defined (independently of which and we use to represent and ), because if , then , so, by transitivity, . The same argument works for .

We now want to show the is a partial order on . It's transitive, because if and , then , and the transitivity in implies and hence .

That shows it's a preorder. To show partial order, we need to show there are no cycles. So, if and , then and , hence, by definition of , . So it's a partial order.

Thanks!

[-]HazardΩ140

For cycles, it looks like the projection to is akin to taking all the worlds that form a given cycle, and compressing them into a single world.

In your example, it's true and when . That's the condition for equivalence in the project, so you have that . If you're thinking about the ordering as a directed graph, you can collapse those worlds to a single point and not mess up the ordering.

Ah yes, that makes sense, thanks! I didn't realize what was the set of equivalence classes of

For the disjoint chain part, first imagine all the worlds in the from and split them into equivalence classes based on equality of . The preference can only compare two worlds that are in the same equivalence class, so each equivalence class will be totally ordered, but no class will be comparable to any other (hence decomposition into disjoint chains).

I thought the (n, v) thing was just an example. If all the worlds are meant to be represented via (n, v), then it seems like you are only ever allowed to give a partial preference based on whatever feature n represents, and you can never talk about any of the other features in v. This seems bad.

(I do agree that if you only give partial preferences on (n, v) worlds in the way you describe, then you get a decomposition into disjoint chains.)

I do believe the OP is talking about partial pref on (n,v) form worlds. Yeah, this seems bad in the "How do I act when looking at different v?" sense, but I get the sense that it's not supposed to answer that question. Or at least Stuart plans to build a lot from here before it will answer that sort of question.

That makes the math make sense, but I really object to calling this the "sensible" case.

Suppose I express a partial preference over "good worlds" and another one over "bad worlds", for example "when everyone's needs for food, water and shelter are met, then it is better for there to be more social connection" and "when I am living in extreme poverty, I prefer to be in a country with a good social safety net". These talk about mutually exclusive worlds, and so lead to two distinct ordered chains. Then, on average you assign the same utility to a good world and a bad world, which seems very bad. How do we avoid this issue?

By adding in a third preference, which explicitely says that extreme poverty is worse than having all needs met.

These are just pieces of the total utility, remember. Even if they are full preferences, they are not all our preferences.

This seems really neat, but it seems quite sensitive to how one defines the worlds under consideration, and whether one counts slightly different worlds as actually distinct. Let me try to illustrate this with an example.

Suppose we have a consisting of 7 worlds, , with preferences
and no other non-trivial preferences. Then (from the `sensible case'), I think we get the following utilities:

.

Suppose now that I create two new copies , of the world which each differ by the position of a single atom, so as to give me (extremely weak!) preferences , so all the non-trivial preferences in the new are now summarised as

Then the resulting utilities are (I think):

.

In particular, before adding in these 'trivial copies' we had , and now we get . Is this a problem? It depends on the situation, but to me it suggests that, if using this approach, one needs to be careful in how the worlds are specified, and the 'fine-grainedness' needs to be roughly the same everywhere.

Each partial preference is meant to represent a single mental model inside the human, with all preferences weighted the same (so there can't be "extremely weak" preferences, compared with other preference in the same partial preference). Things like "increased income is better", "more people smiling is better", "being embarrassed on stage is the worse".

We can imagine a partial preference with more internal structure, maybe internal weights, but I'd simply see that as two separate partial preferences. So we'd have the utilities you gave to through to for one partial preference (actually, my formula doubles the numbers you gave), and , , for the other partial preference - which has a very low weight by assumption. So the order of and is not affected.

EDIT: I'm pretty sure we can generalise my method for different weights of preferences, by changing the formula that sums the squares of utility difference.

(actually, my formula doubles the numbers you gave)

Are you sure? Suppose we take with , , then , so the values for should be as I gave them. And similarly for , giving values . Or else I have mis-understood your definition?

I'd simply see that as two separate partial preferences

Just to be clear, by "separate partial preference" you mean a separate preorder, on a set of objects which may or may not have some overlap with the objects we considered so far? Then somehow the work is just postponed to the point where we try to combine partial preferences?

EDIT (in reply to your edit): I guess e.g. keeping conditions 1,2,3 the same and instead minimising

where is proportion to the reciprocal of the strength of the preference? Of course there are lots of variants on this!

Yep, sorry, I saw -3, -2, -1, etc... and concluded you weren't doing the 2 jumps; my bad!

Then somehow the work is just postponed to the point where we try to combine partial preferences?

Yes. But unless we have other partial preferences or meta-preferences, then the only resonable way of combining them is just to add them, after weighting.

I like your reciprocal weighting formula. It seems to have good properties.