Partial preferences and models

byStuart_Armstrong 5mo19th Mar 20199 comments

13

Ω 5


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Note: working on a research agenda, hence the large amount of small individual posts, to have things to link to in the main documents.

EDIT: This model is currently obsolete, see here for the most current version.

I've talked about partial preferences and partial models before. I haven't been particularly consistent in terminology so far ("proto-preferences", "model fragments"), but from now on I'll stick with "partial".

Definitions

So what are partial models, and partial preferences?

Assume that every world is described by the values of different variables, .

A partial model is given by two sets, and , along with an addition map . Thus for and , is an element of .

We'll want to have 'reasonable' properties; for the moment I'm imagining and as manifolds and as local homeomorphism. If you don't understand that terminology, it just means that is well behaved and that as you move and around, you move in every direction in .

A partial preference given the partial model above are two values , along with the value judgement that:

  • for all , describes a better world than .

We can generalise to non-linear subspaces, but this version works well for many circumstances.

Interpretations

The are the foreground variables that we care about in our partial model. The are the 'background variables' that are not relevant to the partial model at the moment.

So, for example, when I contemplate whether to walk or run back home, then the GDP of Sweden, the distance Voyager 2 is from Earth, the actual value of the cosmological constant, the number of deaths from malaria, and so on, are not actually relevant to that model. They are grouped under the (irrelevant) background variables category.

Notice that these variables are only irrelevant if they are in a 'reasonable range'. If the GDP of Sweden had suddenly hit zero, if Voyager 2 was about to crash into my head, if the cosmological constant suddenly jumped, or if malaria deaths reached of the population, then this would affect my walking/running speed.

So the set also encodes background expectations about the world. Being able to say that certain values are in an 'irrelevant' range is a key part of symbol grounding and the frame problem: it allows us to separate and as being, in a sense, complementary or orthogonal to each other. Note that human definitions of are implicit, incomplete, and often wrong. But that doesn't matter; whether I believe that worldwide deaths from malaria are in the thousands or in the millions, that's equally irrelevant for my current decision.

In comparison, the and the values are much simpler, and are about the factors I'm currently contemplating: one of them involves running, the other walking. The variables of could be future health, current tiredness, how people might look at me as I run, how running would make me feel, and how I currently feel about running. Or it could just be a single variable, like the monster behind me with the teeth, or the whether I will be home on time to meet a friend.

So the partial preference is saying that, holding the rest of the values of the world constant, when looking at these issues, I currently prefer to run or to walk.

Re-inventing the wheel

This whole construction feels like re-inventing the wheel: surely someone has designed something like partial models before? What are the search terms I'm missing?

13

Ω 5