Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Naming the "generalised" models

In this post, I'll apply some mathematical rigour to my ideas of model splintering, and see what they are as a category[1].

And the first question is... what to call them? I can't refer to them as 'the models I use in model splintering'. After a bit of reflection, I decided to call them 'generalised models'. Though that's a bit vague, it does describe well what they are, and what I hope to use them for: a formalism to cover all sorts of models.

The generalised models

A generalised model is given by three objects:

Here is a set of features. Each feature consists of a name or label, and a set in which the feature takes values. For example, we might have the feature "room empty?" with values "true" and "false", or the feature "room temperature?" with values in , the positive reals.

We allow these features to sometimes take no values at all (such as the above two features if the room doesn't exist) or multiple values (such as "potential running speed of person " which includes the maximal speed and any speed below it).

Define as the set component of the feature, and as disjoint union of all the sets of the different features - ie .

A world, in the most general sense, is defined by all the values that the different features could take (including situations where features take multiple values and none at all). So the set of worlds, , is the set of functions from to , with representing the fact that that feature takes that value, and the opposite. Hence , the power set of .

The set of environments is a specific subset of these worlds: . The choice of is actually more important than that of , as that establishes which values of the features we are modelling.

The is a partial probability distribution. In general, we won't worry as to whether is normalised (ie whether ) or not; we'll even allow s with . So could be more properly be defined as a partial weight distribution. As long as we consider terms like , then the normalisation doesn't matter.

Morphisms: relations

For simplicity, assume there are finitely many features taking values in finite sets, making all sets in the generalised model finite.

If and are generalised models, then we want to use binary relations between and as morphisms between the generalised models.

Let be a relation between and , written as . Then it defines a map between subsets of and . This map is defined by iff there exists an with . The map is defined similarly[2], seeing as the inverse relation, iff .

We say that the relation is a morphism between the generalised models if, for any and :

  • , or both measures are undefined.
  • , or both measures are undefined.

The intuition here is that probability flows along the connections: if then probability can flow from to (and vice-versa). Thus must have picked up all the probability that flowed out of - but it might have picked up more probability, since there may be connections coming into it from outside . Same goes for and the probability of .

Morphisms properties

We now check that these relations obey the requirements of morphisms in category theory.

Let be a morphism (ie a relation between and ), and let be a morphism (ie a relation between and ).

We compose relations by the composition of relations: iff there exists an with and . Composition of relations is associative.

We now need to show that is a morphism. But this is easy to show:

  • , or all three measures are undefined.
  • , or all three measures are undefined.

Finally, the identity relation is the one that relates a given only to itself; then and are the identity maps on , and the morphism properties for are trivially true.

So define the category of generalised models as .

-stable sets

Say that a set is -stable if .

For such an -stable set, and , thus .

Hence if is a morphism, it preserves the probability measure on the -stable sets.

In the particular case where is a bijective function, all points of are -stable (and all points of are -stable), so it's an isomorphism between and that forces .

Morphism example: probability update

Suppose we wanted to update our probability measure , maybe by updating that a particular feature takes a certain value .

Then let be the set of environments where takes that value . Then updating on is the same as restricting to and then rescaling.

Since we don't care about the scaling, we can consider updating on as just restricting to . This morphism is given by:

  1. ,
  2. on ,
  3. the morphism is given by the relation that for all .

Morphism example: surjective partial function

In my previous posts I defined how could be a refinement of .

In the language of the present post, is a refinement of if there exists a generalised model and a surjective partial function (functions and partial functions are specific examples of binary relations) that is a morphism from to . The is required to be potentially 'better' than on , in some relevant sense.

This means that is 'better' than in three ways. The is surjective, so covers all of , so its set of environments is at least as detailed. The is a partial function, so might have even more environments that don't correspond to anything in (it considers more situations). And, finally, is better than , by whatever definition of better that we're using.

Feature-split relations

The morphisms/relations defined so far use and - but they don't make any use of . Here is one definition that does make use of the feature structure.

Say that the generalised model is feature-split if and such that

Note that implies , so lies naturally within .

Designate such a generalised model by .

Then a feature-split relation between and is a morphism that is defined as with a relation between and .


  1. I'm not fully sold on category theory as a mathematical tool, but it's certainly worthwhile to formalise your mathematical structures so that they can fit within the formalism of a category; it makes you think carefully about what you're doing. ↩︎

  2. There is a slight abuse of notation here: and are not generally inverses. They are inverses precisely for the "r-stable" sets that are discussed further down in the post. ↩︎

New to LessWrong?

New Comment
9 comments, sorted by Click to highlight new comments since: Today at 7:46 PM
[-]sj99992yΩ6100

I think these might be some typos you could correct: 

, or both measures are undefined.

The  should be  .

For such an -stable set,  and , thus .

There is a missing parenthesis and the  should be :     

Thanks so much! Typos have been corrected.

So the set of worlds, , is the set of functions from to ...

I guess the should be a ? Also, you don't seem to define ; perhaps ?

Thanks! Corrected both of those; is a subset of .

Re "I'm not fully sold on category theory as a mathematical tool", if someone (e.g. me) were to take the category you've outlined and run with it, in the sense of establishing its general structure and special features, could you be convinced? Are there questions that you have about this category that you currently are only able to answer by brute force computation from the definitions of the objects and morphisms as you've given them? More generally, are there variants of this category that you've considered that it might be useful to study in parallel?

For the moment, I'm going to be trying to resolve practical questions of model splintering, and then I'll see if this formalism turns out to be useful for them.

Cross reference: I am not a big fan of stating things in category theory notation, so I made some remarks on the building and interpretation of generalised models in the comment section of this earlier post on model splintering.

Cheers! My opinion on category theory has changed a bit, because of this post; by making things fit into the category formulation, I developed insights into how general relations could be used to connect different generalised models.

Definitely, it has also been my experience that you can often get new insights by constructing mappings to different models or notations.