Toy model piece #1: Partial preferences revisited

(actually, my formula doubles the numbers you gave)

Are you sure? Suppose we take with , , then , so the values for should be as I gave them. And similarly for , giving values . Or else I have mis-understood your definition?

I'd simply see that as two separate partial preferences

Just to be clear, by "separate partial preference" you mean a separate preorder, on a set of objects which may or may not have some overlap with the objects we considered so far? Then somehow the work is just postponed to the point where we try to combine partial preferences?

EDIT (in reply to your edit): I guess e.g. keeping conditions 1,2,3 the same and instead minimising

where is proportion to the reciprocal of the strength of the preference? Of course there are lots of variants on this!

Toy model piece #1: Partial preferences revisited

This seems really neat, but it seems quite sensitive to how one defines the worlds under consideration, and whether one counts slightly different worlds as actually distinct. Let me try to illustrate this with an example.

Suppose we have a consisting of 7 worlds, , with preferences

and no other non-trivial preferences. Then (from the `sensible case'), I think we get the following utilities:

.

Suppose now that I create two new copies , of the world which each differ by the position of a single atom, so as to give me (extremely weak!) preferences , so all the non-trivial preferences in the new are now summarised as

Then the resulting utilities are (I think):

.

In particular, before adding in these 'trivial copies' we had , and now we get . Is this a problem? It depends on the situation, but to me it suggests that, if using this approach, one needs to be careful in how the worlds are specified, and the 'fine-grainedness' needs to be roughly the same everywhere.

Categorial preferences and utility functions

Thanks! I like the way your optimisation problem handles non-closed cycles.

I think I'm less comfortable with how it treats disconnected components - as I understand it you just translate each separately to have `centre of mass' at 0. If one wants to get a utility function out at the end one has to make some kind of choice in this situation, and the choice you make is probably the best one, so in that sense it seems very good.

But for example it seems vulnerable to creating 'virtual copies' of worlds in order to shift the centre of mass and push connected components one way or the other. That was what started me thinking about including strength of preference - if one adds to your setup a bunch of virtual copies of a world between which one is `almost indifferent' then it seems it will shift the centre of mass, and thus the utility relative to come other chain. Of course, if one is actually indifferent then the 'virtual copies' will be collapsed to a single point in your , but if they are just extremely close then it seems it will affect the utility relative to some other chain. I'll try to explain this more clearly in a comment to your post.

Categorial preferences and utility functions

Thanks for the comment Charlie.

If I am indifferent to a gamble with a probability of ice cream, and a probability 0.8 of chocolate cake and 0.2 of going hungry

To check I understand correctly, you mean the agent is indifferent between the gambles (probability of ice cream) and (probability 0.8 of chocolate cake, probability 0.2 of going hungry)?

If I understand correctly, you're describing a variant of Von Neumann–Morgenstern where instead of giving preferences among all lotteries, you're specifying a certain collection of special type of pairs of lotteries between which the agent is indifferent, together with a sign to say in which `direction' things become preferred? It seems then likely to me that the data you give can be used to reconstruct preferences between all lotteries...

If one is given information in the form you propose but only for an `incomplete' set of special triples (c.f.`

weak preferences' above), then one can again ask whether and in how many ways it can be extended to a complete set of preferences. It feels to me as if there is an extra ambiguity coming in with your description, for example if the set of possible outcomes has 6 elements and I am given the value of the `Betterness`

function on two disjoint triples, then to generate a utility function I have to not only choose a `translation' between the two triples, but also a scaling. But maybe this is better/more realistic!

. By `special types', I mean indifference between pairs of gambles of the form

(probability of A) vs (probability of B and probability of C)

for some , and possible outcomes A, B, C. Then the sign says that I prefer higher probability of B (say).

Toy model piece #1: Partial preferences revisited

Thanks for pointing me to this updated version :-). This seems a really neat trick for writing down a utility function that is compatible with the given preorder. I thought a bit more about when/to what extent such a utility function will be unique, in particular if you are given not only the data of a preorder, but also some information on the strengths of the preferences. This ended up a bit too long for a comment, so I wrote a few things in outline here:

https://www.lesswrong.com/posts/7ncFy84ReMFW7TDG6/categorial-preferences-and-utility-functions

It may be quite irrelevant to what you're aiming for here, but I thought it was maybe worth writing down just in case.

Partial preferences and models

Never mind - I had fun thinking about this :-).

Partial preferences and models

Hi Stuart,

I’m working my way through your `Research Agenda v0.9’ post, and am therefore going through various older posts to understand things. I wonder if I could ask some questions about the definition you propose here?

First, that be contained in for some seems not so relevant; can I just assume X, Y and Z are some manifolds ( for some )? And we are given some partial order on X, so that we can refer to `being a better world'?

Then, as I understand it, your definition says the following:

Fix X, and Z. Let Y be a manifold and , . Given a local homomorphism , we say that is partially preferred to if for all , we have .

I’m not sure which inequalities should be strict, but this seems non-essential for now. On the other hand, the dependence of this definition on the choice of Y seems somewhat subtle and interesting. I will try to illustrate this in what follows.

First, let us make a new definition. Fix X, , and Z as before. Let , a two-element set equipped with the discrete topology, and let be an immersion of -manifolds. We say that is weakly partially preferred to if for all , we have .

First, it is clear that partial preference implies weak partial preference. More formally:

Claim 1: Fix X, and Z. Suppose we have a manifold Y, points , , and a local homomorphism such that is partially preferred to . Setting with the subspace topology from (i.e. discrete), and taking to be the restriction of from to , we have that is weakly partially preferred to .

Proof: obvious. $\qed$

However, the converse can fail if Z is not contractible. First, let’s prove that the concepts are equivalent for Z contractible:

Claim 2: Fix X, and Z, and assume that Z is contractible. Suppose we have a two-element set and a map making weakly partially preferred to . Then there exist a manifold Y, an injection , and a local homeomorphism whose restriction to is , making partially preferred to .

Proof: Let’s assume for simplicity of notation that X is equidimensional, say of dimension , and write for the dimension of Z. Let Y be the disjoint union of two open balls of dimension , with the inclusion of the centres of the balls. Then take an -neighbourhood of Z in X; it is diffeomorphic to since the normal bundle to Z in X is trivialisable (c.f. https://math.stackexchange.com/questions/857784/product-neighborhood-theorem-with-boundary). $\qed$

If we want examples where weak partial preference and partial preference don’t coincide, we should look for an example where Z is not contractible, and its normal bundle in X is not contractible.

Example 3: Let X be the disjoint union of two moebius bands, and let Z be a circle. Note that including Z along the centre of either band gives a submanifold whose tubular neighbourhood is not a product. Assume that is such that one component of X is preferred to the other (and is indifferent within each connected component). Then take , and to be the inclusion of the two circles along the centres of the two moebius bands, such that ends up in the preferred band. This yields a situation where is weakly partially preferred to , but the conclusion of Claim 2 fails, i.e. this cannot be extended to a partial preference for over .

What conclusion should we draw from this? To me, it suggests that the notion of partial preference is not yet quite as one would want. In the setting of Example 3, where X consists of two moebius strips, one of which is preferred to the other, then landing in the preferred strip should be preferred to landing in the un-preferred strip?! And yet the `local homeomorphism from a product’ condition gets in the way. This example is obviously quite artificial, and maybe analogous things cannot occur in reality. But I’m not so happy with this as an answer, since our approaches to AI safety should be (so far as possible) robust against the flaws in our understanding of physics.

Apologies for the overly-long comment, and for the imperfect LaTeX (I've not used this type of form much before).

The Univariate Fallacy

Thanks for the reply, Zack.

The reason this objection doesn't make the post completely useless...

Sorry, I hope I didn't suggest I thought that! You make a good point about some variables being more natural in given applications. I think it's good to keep in mind that sometimes it's just a matter of coordinate choice, and other times the points may be separated but not in a linear way.

The Univariate Fallacy

Hi Zack,

Can you clarify something? In the picture you draw, there is a codimension-1 linear subspace separating the parameter space into two halves, with all red points to one side, and all blue points to the other. Projecting onto any 1-dimensional subspace orthogonal to this (there is a unique one through the origin) will thus yield a `variable' which cleanly separates the two points into the red and blue categories. So in the illustrated example, it looks just like a problem of bad coordinate choice.

On the other hand, one can easily have much more pathological situations; for examples, the red points could all lie inside a certain sphere, and the blue points outside it. Then no choice of linear coordinates will illustrate this, and one has to use more advanced analysis techniques to pick up on it (e.g. persistent homology).

So, to my vague question: do you have only the first situation in mind, or are you also considering the general case, but made the illustrated example extra-simple?

Perhaps this is clarified by your numerical example, I'm afraid I've not checked.

Sure, in the end we only really care about what comes top, as that's the thing we choose. My feeling is that information on (relative) strengths of preferences is often available, and when it is available it seems to make sense to use it (e.g. allowing circumvention of Arrow's theorem).

In particular, I worry that, when we only have ordinal preferences, the outcome of attempts to combine various preferences will depend heavily on how finely we divide up the world; by using information on strengths of preferences we can mitigate this.