Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Note: working on a research agenda, hence the large amount of small individual posts, to have things to link to in the main documents.

EDIT: This model is currently obsolete, see here for the most current version.

I've talked about partial preferences and partial models before. I haven't been particularly consistent in terminology so far ("proto-preferences", "model fragments"), but from now on I'll stick with "partial".


So what are partial models, and partial preferences?

Assume that every world is described by the values of different variables, .

A partial model is given by two sets, and , along with an addition map . Thus for and , is an element of .

We'll want to have 'reasonable' properties; for the moment I'm imagining and as manifolds and as local homeomorphism. If you don't understand that terminology, it just means that is well behaved and that as you move and around, you move in every direction in .

A partial preference given the partial model above are two values , along with the value judgement that:

  • for all , describes a better world than .

We can generalise to non-linear subspaces, but this version works well for many circumstances.


The are the foreground variables that we care about in our partial model. The are the 'background variables' that are not relevant to the partial model at the moment.

So, for example, when I contemplate whether to walk or run back home, then the GDP of Sweden, the distance Voyager 2 is from Earth, the actual value of the cosmological constant, the number of deaths from malaria, and so on, are not actually relevant to that model. They are grouped under the (irrelevant) background variables category.

Notice that these variables are only irrelevant if they are in a 'reasonable range'. If the GDP of Sweden had suddenly hit zero, if Voyager 2 was about to crash into my head, if the cosmological constant suddenly jumped, or if malaria deaths reached of the population, then this would affect my walking/running speed.

So the set also encodes background expectations about the world. Being able to say that certain values are in an 'irrelevant' range is a key part of symbol grounding and the frame problem: it allows us to separate and as being, in a sense, complementary or orthogonal to each other. Note that human definitions of are implicit, incomplete, and often wrong. But that doesn't matter; whether I believe that worldwide deaths from malaria are in the thousands or in the millions, that's equally irrelevant for my current decision.

In comparison, the and the values are much simpler, and are about the factors I'm currently contemplating: one of them involves running, the other walking. The variables of could be future health, current tiredness, how people might look at me as I run, how running would make me feel, and how I currently feel about running. Or it could just be a single variable, like the monster behind me with the teeth, or the whether I will be home on time to meet a friend.

So the partial preference is saying that, holding the rest of the values of the world constant, when looking at these issues, I currently prefer to run or to walk.

Re-inventing the wheel

This whole construction feels like re-inventing the wheel: surely someone has designed something like partial models before? What are the search terms I'm missing?

New to LessWrong?

New Comment
9 comments, sorted by Click to highlight new comments since: Today at 11:52 PM

Hi Stuart,

I’m working my way through your `Research Agenda v0.9’ post, and am therefore going through various older posts to understand things. I wonder if I could ask some questions about the definition you propose here?

First, that be contained in for some seems not so relevant; can I just assume X, Y and Z are some manifolds ( for some )? And we are given some partial order on X, so that we can refer to `being a better world'?

Then, as I understand it, your definition says the following:

Fix X, and Z. Let Y be a manifold and , . Given a local homomorphism , we say that is partially preferred to if for all , we have .

I’m not sure which inequalities should be strict, but this seems non-essential for now. On the other hand, the dependence of this definition on the choice of Y seems somewhat subtle and interesting. I will try to illustrate this in what follows.

First, let us make a new definition. Fix X, , and Z as before. Let , a two-element set equipped with the discrete topology, and let be an immersion of -manifolds. We say that is weakly partially preferred to if for all , we have .

First, it is clear that partial preference implies weak partial preference. More formally:

Claim 1: Fix X, and Z. Suppose we have a manifold Y, points , , and a local homomorphism such that is partially preferred to . Setting with the subspace topology from (i.e. discrete), and taking to be the restriction of from to , we have that is weakly partially preferred to .

Proof: obvious. $\qed$

However, the converse can fail if Z is not contractible. First, let’s prove that the concepts are equivalent for Z contractible:

Claim 2: Fix X, and Z, and assume that Z is contractible. Suppose we have a two-element set and a map making weakly partially preferred to . Then there exist a manifold Y, an injection , and a local homeomorphism whose restriction to is , making partially preferred to .

Proof: Let’s assume for simplicity of notation that X is equidimensional, say of dimension , and write for the dimension of Z. Let Y be the disjoint union of two open balls of dimension , with the inclusion of the centres of the balls. Then take an -neighbourhood of Z in X; it is diffeomorphic to since the normal bundle to Z in X is trivialisable (c.f. $\qed$

If we want examples where weak partial preference and partial preference don’t coincide, we should look for an example where Z is not contractible, and its normal bundle in X is not contractible.

Example 3: Let X be the disjoint union of two moebius bands, and let Z be a circle. Note that including Z along the centre of either band gives a submanifold whose tubular neighbourhood is not a product. Assume that is such that one component of X is preferred to the other (and is indifferent within each connected component). Then take , and to be the inclusion of the two circles along the centres of the two moebius bands, such that ends up in the preferred band. This yields a situation where is weakly partially preferred to , but the conclusion of Claim 2 fails, i.e. this cannot be extended to a partial preference for over .

What conclusion should we draw from this? To me, it suggests that the notion of partial preference is not yet quite as one would want. In the setting of Example 3, where X consists of two moebius strips, one of which is preferred to the other, then landing in the preferred strip should be preferred to landing in the un-preferred strip?! And yet the `local homeomorphism from a product’ condition gets in the way. This example is obviously quite artificial, and maybe analogous things cannot occur in reality. But I’m not so happy with this as an answer, since our approaches to AI safety should be (so far as possible) robust against the flaws in our understanding of physics.

Apologies for the overly-long comment, and for the imperfect LaTeX (I've not used this type of form much before).

Hey there! Thanks for your long comment - but, alas, this model of partial preferences is obsolete :-(


Because of other problems with this, I've replaced it with the much more general concept of a preorder. This can express all the things we want to express, but is a lot less intuitive for how humans model things. I may come up with some alternative definition at some point (less general than a preorder, but more general than this post.

Thanks for the comment in any case.

Never mind - I had fun thinking about this :-).

Re: Reinventing the wheel

I don’t know of any slam dunk search term, but I suspect that the discussion you want to have surrounding partial preferences will contain mainly similarities to the work done on ceteris paribus laws. Particularly, if we aggregate the partial preferences of all moral agents, we will produce something like a moral ceteris paribus law, where we are holding the set Z of background variables “unchanged” (i.e. within a “reasonable” range of values). You might find the discussion around the justification of CP laws useful.

Additionally, I believe there must be some relevant work on the application to morality of modal logic and possible world semantics. I don’t have something to point to here, but it might be a worthwhile direction.

Side note: what do you think about preferences about preferences of other people?

For example: "I want M. to love me" or "I prefer that everybody will be utilitarian".

Was it covered somewhere?

Those are very normal preferences; they refer to states of the outside world, and we can estimate whether that state is met or not. Just because it's potentially manipulative, doesn't mean it isn't well-defined.

But they are somehow recursive: I need to know the real nature of human preferences in order to be sure that other people actually want what I want.

In other words, such preferences about preference have embedded idea about what I think is "preference": if M. will behave as if she loves me - is it enough? Or it should be her claims of love? Or her emotions? Or coherency of all three?

How does this notion of partial preferences differ from saying "preferences are determined by a causal net"? I.e., the y's would be the direct causal parents of a decision, and the z's everything else.

This differs, because the z are assumed to be in a "standard" range. There are situations where extreme values of z, if known and reflected upon, would change the sign of the decision (for example, what if your decision is being filmed, and there are billions being bet upon your ultimate choice, by various moral and immoral groups?).

But yeah, if you assume that the z are in that standard range, then this looks a lot like considering just a few nodes of a causal net.