I have tried to achieve two goals in this post. The first is to provide a self-contained explanation of Natural Latents using lots of pictures of probability distributions. The second is to frame Natural Latents in terms of statements about mutual information, rather than the KL-divergences and Bayes nets that Wentworth and Lorell normally use[1]. The two approaches are mathematically equivalent, but the different framings can bring slightly different way of thinking about the problem.
This post will focus on what it means for a variable to satisfy the natural latent conditions and what those conditions correspond to intuitively. I'll also discuss a little bit about the motivation for studying natural latents. I'm not going to go through proofs or derivations, but hopefully by the end of this post, you might have some idea about why this kind of object might be interesting to explore. There is nothing new in this post that hasn't been discussed elsewhere, but it might provide an introduction to the topic, presented in a slightly different way than it usually is. This post doesn't require any familiarity with natural latents or any mathematical background beyond a basic familiarity with random variables and Shannon Entropy.
Motivation: Natural Abstractions
Most of us have the intuition that sunflowers and tulips are different 'kinds of thing'.
But if you have a field of tulips, all of which are slightly different, we would still probably agree that they are different instances of the same type of thing.
This doesn't seem to just be a fact about biology or genetics. It seems like you could give anyone a bunch of mixed tulips and sunflowers and they would naturally understand that these are two different kinds of thing, even if they had little knowledge of botany and even if they just inspected a few plants with a cursory glance. And if you gave someone a few examples of tulips, they would be able to see that these were the same 'kind of thing'. It seems that this distinction isn't just an arbitrary societal convention, but some kind of 'natural' distinction which is an objective feature of the data, which becomes apparent whenever you collect sufficient information.
On the other hand, there are other distinctions that we make in the world which seem less natural and more arbitrary/contingent. For example, the category of 'things that are polite to do in Victorian society' depends almost entirely on the social and cultural background. We wouldn't assume that people lacking that background would quickly grasp this category when shown a few examples.
In this sense, we might say that the polite/impolite distinction is 'unnatural' or at least 'less natural' than the tulip/sunflower distinction. Why is this the case? What makes some abstractions natural and others unnatural? Is there a sense in which all reasonable agents (including humans and AIs) will learn the same abstractions of the world? These questions are often investigated under the umbrella of the Natural Abstractions Hypothesis. More broadly, this project is sometimes considered as part of the 'Ontology Identification' research area, which is the project of understanding how agents (eg. AIs) internally represent the world and translating this representation into something that humans can understand.
This is interesting from an AI Safety perspective since it might allow us to understand what is going on in 'inscrutable' machine learning models (interpretability) and translate the goals of AI systems into chunks or concepts that we can study. We might want to try to construct AIs so that they use 'natural' abstractions to model the world, or we might want to prove that our AI systems converge on using natural abstractions by default. There might also be a sense in which certain natural abstractions are 'correct' if we can argue that they correspond to particularly efficient representations of the world. Additionally, our values are often defined in terms of the abstractions we use. If I value 'tulips', and I ask my AI to maximize the number of tulips, then I want to be sure that the AI has learned the same abstraction of tulips that I have.
The Natural Latents research program is an attempt to characterize the 'naturalness' of abstractions information theoretically [2]. It centres around the claim/premise that a 'natural' abstraction is one which captures 'all and only' the shared information between members of a class. This is done by positing a 'latent variable'; an additional variable 'on top of' our data which explains its structure and captures the correlations found within that data. Agents (such as humans or AIs) can use latent variables to construct predictive models of datasets. Doing this is called constructing a 'latent variable model' of the data.
For example, let's return to a sample containing just sunflowers and tulips. For simplicity, lets assume that the only variables we can measure are, 'colour of flower' and 'height'. So that we can make 'colour' a continuous variable, we'll quantify it as 'the primary wavelength of light reflected by the flower'. If we measure the height and flower colour for a number of plants, we might get some data which looks like this:
In this sample, the colour of the flower is strongly correlated with the height of the plant, since sunflowers have yellow flowers and are generally over 1.5m tall, whereas tulips have red flowers and are less than 1m tall[3]. So if I tell you the colour of the flower, you can make a pretty good guess at the height (and vice versa). In this sense, the 'height' and 'colour' variables share information. We can capture this shared information by introducing a third variable 'plant species' which can take two values 'tulip' or 'sunflower':
This variable is a 'latent' variable, because it isn't something we directly observed; instead it is something we inferred from the data we did observe. Notice that this latent has some interesting properties:
It induces independence between 'Height' and 'Colour'. In the whole dataset 'Height' and 'Colour' are correlated, but within the set of sunflowers, colour and height are uncorrelated (similarly within the set of tulips).
It doesn't contain any 'extra' information, other than the correlation between 'Height' and 'Colour'.
You can tell which species of plant you have by just looking at one of either the colour or the height. If you know that a plant is red, it must be a tulip. If you know that the plant is over 1.5m tall, it must be a sunflower [4].
Loosely, if a latent satisfies these properties (we'll make them more crisp later on), then we call it 'Natural Latent'. In this toy setting a Natural Latent loosely corresponds to a Natural Abstraction. The Natural Latent captures 'all and only' the shared information between the height and the colour of the flower.
To link this back to the problem of Ontology Identification, Wentworth and Lorell have proved some 'Translatability' results[5]. Roughly, these results show that if you have any latent variable model which is good at predicting some dataset, then you can 'translate' from that latent variable model, to a Natural Latent Model of the same data. We won't have room to prove these results in this post, but hopefully it this feels a bit intuitive. Suppose you have a model of data which captures all of the correlations in it and nothing else (a Natural Latent Model). And suppose someone else has a model of the data which is 'good' (in a predictive sense). Then this person's model must contain, somewhere 'inside it', the information about the correlations which is captured by your Natural Latent model, otherwise they wouldn't be able to go a good job a predicting the data. So it makes sense that you should be able to 'translate' between this persons model and the Natural Latent Model.
In our tulips/sunflowers example, this is pretty trivial. Suppose Alice uses the latent variable model of "Tulip = thing that is under 1m tall and has peak wavelength between 650nm and 750nm " and "Sunflower = thing that is over 1.25mtall and have peak wavelength between 500nm and 650nm". Furthermore, if her model says that 50% of the datapoints are sunflowers and 50% are tulips, then she has a pretty good predictive model of the data.
If Bob uses the variable definition "Snark = thing that is under 1.2m tall and has peak wavelength between 640nm and 756nm " and "Boojum = thing that is over 1.34mtall and has peak wavelength between 510nm and 642nm" and has a model which predicts that 51% of the datapoints will be Snarks and 49% will be Boojums, this will also be pretty good a predicting the data. The translatability theorems would allow us to 'translate' between Alice and Bob's models and realise that (up to some approximation) 'Tulips' are equivalent to 'Snarks' and 'Sunflowers' are equivalent to 'Boojums'.
One final clarification before we begin the maths. We are not saying that the definition 'Tulip=thing that is red and under 1m tall' is a Platonic Fact About Reality Engraved into the Laws of the Universe. It is 'natural' only relative to this dataset. In some contexts/datasets (such as the one considered above), this is a 'good' or 'natural' way of modelling the data. But it might not always be. Someone might come along and paint all of the tulips in the world blue and then we would have to change our model. But this would be as a result of the objective change in the dataset and we would have to update our latent variable model in response to this change.
Enough hand-waving and disclaimers. In what follows, we'll try to make these concepts a bit more mathematically precise, more general, and less botanical.
Introduction
Broadly, the aim of defining a natural latent is to identify 'all and only' information that is shared between two variables[6]. We will call these two variables and .
The setting we will use to explore this idea is as follows. Both and are random variables which can take ten possible values. can take a value from the set and can take a value from the set . The variables can be described using a joint probability distribution . We can visualize this joint distribution using a graph with darker squares representing higher probabilities and lighter squares representing lower probabilities. White squares indicate zero probability. (I'm not going to be too careful to be consistent with the exact shades, but white will always indicate zero probability). For example:
Fig 1
In this distribution, with low indices are more likely than ones with high indices and pairs whose indices differ by more than two have zero probability. We'll operationalize 'sharing information' by saying 'if I told you the value of , would it help you guess the value of ?'. In the above distribution, and clearly share information in this sense.
But not all variables do contain information about each other. For example:
Fig 2
In this distribution, and are completely independent. If I tell you the value of , it does nothing to help you guess the value of A way to quantify this is through mutual information between and , which we write as .
Mutual information is the average change in Shannon entropy of one variable that occurs when you learn the value of the other variable. It is symmetric, so we have:
In the case where and are independent (as in Figure 2) we have .
Lets consider the following joint distribution where and do contain information about each other.
Fig 3
How do and 'contain' information about each other? If I tell you that , then you can be certain that is in the set . Similarly, if I tell you that , you can be certain that is . Loosely, knowing one of the variables reduces your uncertainty by halving the possible outcomes, so the mutual information between and is equal to 1 bit [7].
It is pretty obvious visually that we can split this distribution up into two distinct 'chunks' of probability mass: one where the and indices less than or equal to 5 and the other where they are greater than 5. We will label these chunks '' and '' respectively and colour them differently.
Fig 4
We haven't changed the underlying distribution, just coloured the graph and labelled the chunks. We can think of this as adding another random variable to the setup, 'on top of' and . Now, we have a three-variable distribution . The conditional distribution can be defined as follows
(for completeness we have defined to equal for the off-diagonal combinations, but since these have zero probability in our distribution we won't end up talking about much here.)
In this case, we can also think of as a deterministic function of and , ie. we can write where is defined as
If you want to translate this back to the tulips/sunflowers example, treat as a binned version 'height', as a binned version of 'colour of flower' and as 'species of plant'
For now, all of the latents we will discuss will be deterministic functions of but strictly this assumption isn't needed. In graphs, we'll denote different values of by colouring the corresponding squares different colours. In a later section, we'll see an example of an approximate latent which is not a deterministic function of .
Now, we are going to claim that captures 'all and only' the shared information between and and formalise this claim. We will do this by introducing the 'Mediation' and 'Redundancy' conditions and showing that exactly satisfies them.
First, we'll inspect the claim that (as shown in Figure 4) captures 'all' the shared information between and . Suppose you don't know what and are, but you do know that . Then, you will know that the true pair lies somewhere in the bottom left quadrant of the graph. As a result, your subjective distribution over and will look like this:
Fig. 5
This is the conditional distribution . Notice that, once you know, if I then tell you the true value (say ), this doesn't tell you anything else about . You start with an evenly spread uncertainty over all five -values and after finding out that , you still have equal uncertainty over each of those five Y-values. This is true for all and pairs. Once you know , the variables and become independent.
We can express this fact using conditional mutual information. This just the mutual information we introduced earlier, but calculated using the conditional distribution .
Conditional on the fact that , and share no information. We can say the same thing for :
Fig. 6
So, knowing the value of 'extracts' all of the correlation between and . Once you know , you can't find out anything else about by looking at (or vice versa).
We quantify this using the conditional mutual information which is the expected value of and . This just equals zero, since both terms equal zero:
The mutual information between and , conditional on is equal to zero. If this is the case, we say that exactly mediates between and or we say that exactly satisfies the 'Mediation condition':
(We're emphasising that 'exactly' satisfies the mediation condition because eventually we'll look at cases where the mediation condition is only approximately satisfied. In that case, we'll have is small but nonzero. More on this later!)
Extending the intuitions behind this, we can apply the mediation condition to any distribution, not just the toy model presented here. If for any three variables , and then we can interpret that as saying that contains all the information contained in about and vice versa.
Now, we'll explain the 'Redundancy' conditions which require that contains only the shared information between and .
We want to check that contains 'only the shared information between and '. In other words, we don't want to contain a bunch of other information about if that information isn't also shared with . Again, to operationalise this, we can ask: 'if I tell you the value of , is there any mutual information between and remaining?'. If, after telling you the value of , we still have mutual information between and , then must contain 'extra' information about which is not helpful for predicting . To do this, let's plot for a few of values of .
Fig. 7
Notice that in all cases, and are independent, conditional on . We can see that in each case, once you condition on a particular value, knowing does not give you any extra information about and knowing doesn't give you any extra information about . This is true whether we pick one of the 'upper' -values or one of the lower ones meaning that for all . The conditional mutual information is given by the expected value:
We can do something similar to find out if contains any 'excess information' about . Going through the same process, we would find that mutual information between and , conditional on also equals zero.
If we find that we can conclude that does not contain any information present in that is not present in (and vice versa). Taken together, these two conditions are known as the 'exact redundancy conditions':
Like the Mediation condition, we can check whether these conditions apply to any distribution , not just our toy example. If for any joint distribution , we can interpret this as meaning \Lambda does not contain any information present in that is not present in .
More exact Natural Latent examples
Now, we have expressed the three exact natural latents conditions and justified why they characterise a latent which contains 'all and only' information shared by and . Here they are again, all together:
In words, these conditions correspond to the following:
Mediation: contains all of the information shared between and
Redundancy 1: does not contain any information which is contained in but not contained in .
Redundancy 2: does not contain any information which is contained in but not contained in .
To build some intuitions, let's explore some more distributions which satisfy all three of the exact natural latent conditions. (As an exercise, you may wish to verify for yourself that each of these examples do indeed satisfy the conditions.)
First, if and share more information than can be captured by a binary random variable, we can add more outcomes to . For example, the following distribution:
can be given the following natural latent:
Fig. 8
Here, each of the outcomes of might contain different amounts of information about . In this distribution, if I tell you that , you know exactly what value and take. But if I tell you that , you have some information about and , but not as much. Nonetheless, still captures 'all and only' the shared information between and so it satisfies the exact natural latent conditions.
Our examples shown have had the joint distributions conveniently arranged into chunks, which makes it easy to see patterns in the data. But this is just a feature of how we have labelled the dataset, not an information-theoretic fact about the data. Not all distributions which satisfy the natural latent conditions will look like this. For example, we could change the and axis labelling of Figure 8 above to obtain the following image:
Fig. 9
This distribution (with still represented by the four different colours) still satisfies the exact natural latent conditions, since it is information-theoretically identical to the distribution shown in Figure 8.
In our examples so far, the 'chunk' of probability mass associated with each latent has been uniform, but this does not have to be the case. We just need that and are independent given (as well as the redundancy conditions). As a result, we can consider natural latents for distributions which look like this:
Notice that we can apply the same latent variable that did in the case of our initial distribution[10] and this still satisfies the exact natural latent conditions for this distribution. While the distributions are no longer uniform, we still have that and are independent conditional on . Similarly, the redundancy conditions are also exactly satisfied.
Approximate Natural Latent Conditions
Instead of requiring that our latent satisfies the exact natural latent conditions, we can also talk about latents which 'approximately' satisfy the three conditions. Rather than saying that the conditional mutual informations listed above are exactly zero, we can just enforce that they are 'small' in the sense that they are less than some . In the next few sections, we'll explore what it would mean for mediation and redundancy errors to be non-zero.
Approximate Mediation
The mediation condition required that, once we conditioned on , there was no further mutual information between and . We described this by saying that contained all of the shared information between and . To make this condition approximate, we can instead require that, conditioned on , and share a nonzero but 'small' amount of mutual information:
The smaller , the better the latent is at mediating. In an approximate natural latent, with small but nonzero , we have that captures some but not all of the shared information between and . What would this look like?
Consider the following distribution:
Fig. 10
Along with this distribution let's use our latent from before, which labels the top right corner with and the bottom left corner with :
Fig. 11
In this case, conditioning on or respectively gives us the following distributions:
Fig. 12
If you initially knew that the distribution looked like Figure 11 and then received the extra information that , your updated subjective distribution over should look like the plot on the left. Clearly, being told the value of has given you some information about and , but not all of the shared information. If you are just told , you would know that is in the set , but if you were told the value of , (say, ) after being told , you would learn something else about that wasn't captured by alone (namely, if then or but not or ). This means that does not capture all of the shared information between and .
In this case, we don't have , we have [11]. If 0.9 bits is small for our purposes, we could say that approximately satisfies the mediation condition with error .
Now, we'll talk about what it would mean for a latent to approximately satisfy a redundancy condition.
Approximate Redundancy
The (approximate) mediation condition requires that contains (approximately) all of the shared information between and . The redundancy conditions require that contains no 'extra' information that is present in but not in (or vice versa). By making the redundancy conditions approximate, we can enforce a weaker version of this: that contains only a small amount of information that is present in one variable but not the other. The approximate redundancy conditions can be expressed as follows:
What does it mean for a latent to contain some information about one variable that is not present in the other? Consider this distribution:
We might want to use the approach of identifying visual 'chunks' in the distribution and labelling finding a latent that corresponds to labelling each of these chunks:
Fig. 13
But this latent does not carry 'only' the shared information between and . If we plot we can see clearly that conditioning on doesn't remove all shared information between and . This means that contains some information which is present in which is not present in .
If we do the calculation for this distribution, we get [12] . This means that the exact redundancy condition would not be satisfied. But, for , the approximate redundancy condition would still be satisfied.
Similarly, we can consider a latent which contains some information about which is not present in .
Fig. 15
This distribution will fail to satisfy the exact redundancy condition , but, again for it will still satisfy the approximate redundancy condition.
We can also have latents which are approximate with respect to both redundancy conditions
Fig. 16
(Incidentally, this distribution also fails to satisfy the exact mediation condition. Can you see why?)
Introducing Randomness to Latents
So far, we have looked at latents which are deterministic functions of X and Y, ie. they have been latents which can be expressed as
But to be more general, we might want to consider latents which are defined by a general conditional probability distribution . This would mean that we allow to be randomized for some (or all) pairs.
Why might we want to do this? Recall in the previous section, Figure 13 showed that a latent can fail to satisfy the exact redundancy conditions due to containing too much information about that is not present in .
Fig. 17
One way to remove this 'extra' information from is to get to randomize for certain pairs. The outcomes which caused this latent to fail to satisfy the exact redundancy conditions were those where and . When and , the latent in the above diagram insists on labelling these outcomes either with either or in a way that depends only on , leading to containing extra information about that is not present in . We can remove this extra information by requiring that, whenever or , the latent simply tosses a coin, picking half of the time and the other half of the time. We can depict this by colouring the squares a mixture of orange and blue:
Fig. 18
(I have removed the labels but blue still means and orange still means ).
Now we can view which now looks like this:
Fig. 19
Now, given , we have that and are independent. no longer contains information present in that is not also present in so exactly satisfies the redundancy condition. So introducing randomness to the latent can remove the 'extra' information from , allowing it to better satisfy the redundancy condition. (But note: this modification to now means that it fails the exact mediation condition! Try sketching to see why.)
Some Example Latents
To keep building intuitions let's examine a couple of general classes of latents.
Example: Constant latent
Consider the trivial latent which is always equal to , regardless of and . Recall the three conditions:
Mediation: contains all of the information shared between and
Redundancy 1: does not contain any information which is contained in but not contained in .
Redundancy 2: does not contain any information which is contained in but not contained in .
Which (if any) of these conditions will the constant latent satisfy? (If you want to test your understanding, try to work out the answer before reading on)
Mediation. If there is any shared information between and , the constant latent will not capture it. Conditioning on will not affect the joint distribution, so we have . As a result, we have . This means that the constant latent only satisfies the exact mediation condition if there is no shared information between and . In other words: a constant contains zero information, so the only way it can capture 'all of the shared information' between and is if and share zero information!
Redundancy. The constant latent will always exactly satisfy the two redundancy conditions. Since contains no information about or it cannot contain any information that is present in but not present in (indeed, it contains no information at all).
Example: Everything Latent
Let us define a new which we'll call the 'everything latent'. Let be a deterministic function of with a unique value for every pair. How does this latent perform?
Mediation. In this case, , and all equal zero so the conditional mutual information also equals zero. In this case, captures all shared information between and because it in fact captures all information about and , shared or not.
Redundancy. Note that for the everything latent. Therefore the mutual information quantities for the redundancy conditions will be determined by the conditional entropies of original distribution:
Summary
In this post, we have introduced the concept of Natural Abstractions and discussed its relation to AI Safety research programs. We then introduced the Natural Latents framework which is one approach to formalising such problems mathematically. Then, we introduced the (exact and approximate) Natural Latent conditions and gave some examples to demonstrate what they correspond to intuitively.
There have been no ideas here which have not already been covered by Wentworth and Lorell in various places and there are many other ideas which did not make it into this post. But hopefully this post has served its purpose as a reasonably self-contained introduction to the core of the relevant ideas and you now feel empowered to tackle some of the many other writings about this subject. If you are feeling brave, you might want to try your hand at one of the open problems in this field (such as this one or this one). Let me know if you have any success!
Others have pointed out that the KL-divergence formulation of the Natural Latents Conditions can be expressed as conditional mutual information expressions. Notably in this post and this comment .
This is a 'bonus' property that corresponds to our latent being a deterministic function of 'height' and 'colour'. If this property is satisfied, we have a 'deterministic natural latent'. We won't go into the distinction between deterministic and stochastic natural latents in this post. This is discussed more here.
Actually, we can consider latent variables which capture 'all and only' shared information between more than two variables, but we'll stick with the 2 variable case in this post since its easier to visualise.
More formally, the calculation is as follows. Initially is uniform over 10 outcomes, so its entropy is . Then, upon learning any particular , the conditional entropy is a uniform distribution over 5 -outcomes which has entropy . This is true for all so . The mutual information is then
Thanks to @Jeremy Gillen for reading and commenting on the draft. This was written while I was was funded by the Advanced Research + Invention Agency (ARIA) through project code MSAI-SE01-P005.
I have tried to achieve two goals in this post. The first is to provide a self-contained explanation of Natural Latents using lots of pictures of probability distributions. The second is to frame Natural Latents in terms of statements about mutual information, rather than the KL-divergences and Bayes nets that Wentworth and Lorell normally use[1]. The two approaches are mathematically equivalent, but the different framings can bring slightly different way of thinking about the problem.
This post will focus on what it means for a variable to satisfy the natural latent conditions and what those conditions correspond to intuitively. I'll also discuss a little bit about the motivation for studying natural latents. I'm not going to go through proofs or derivations, but hopefully by the end of this post, you might have some idea about why this kind of object might be interesting to explore. There is nothing new in this post that hasn't been discussed elsewhere, but it might provide an introduction to the topic, presented in a slightly different way than it usually is. This post doesn't require any familiarity with natural latents or any mathematical background beyond a basic familiarity with random variables and Shannon Entropy.
Motivation: Natural Abstractions
Most of us have the intuition that sunflowers and tulips are different 'kinds of thing'.
But if you have a field of tulips, all of which are slightly different, we would still probably agree that they are different instances of the same type of thing.
This doesn't seem to just be a fact about biology or genetics. It seems like you could give anyone a bunch of mixed tulips and sunflowers and they would naturally understand that these are two different kinds of thing, even if they had little knowledge of botany and even if they just inspected a few plants with a cursory glance. And if you gave someone a few examples of tulips, they would be able to see that these were the same 'kind of thing'. It seems that this distinction isn't just an arbitrary societal convention, but some kind of 'natural' distinction which is an objective feature of the data, which becomes apparent whenever you collect sufficient information.
On the other hand, there are other distinctions that we make in the world which seem less natural and more arbitrary/contingent. For example, the category of 'things that are polite to do in Victorian society' depends almost entirely on the social and cultural background. We wouldn't assume that people lacking that background would quickly grasp this category when shown a few examples.
In this sense, we might say that the polite/impolite distinction is 'unnatural' or at least 'less natural' than the tulip/sunflower distinction. Why is this the case? What makes some abstractions natural and others unnatural? Is there a sense in which all reasonable agents (including humans and AIs) will learn the same abstractions of the world? These questions are often investigated under the umbrella of the Natural Abstractions Hypothesis. More broadly, this project is sometimes considered as part of the 'Ontology Identification' research area, which is the project of understanding how agents (eg. AIs) internally represent the world and translating this representation into something that humans can understand.
This is interesting from an AI Safety perspective since it might allow us to understand what is going on in 'inscrutable' machine learning models (interpretability) and translate the goals of AI systems into chunks or concepts that we can study. We might want to try to construct AIs so that they use 'natural' abstractions to model the world, or we might want to prove that our AI systems converge on using natural abstractions by default. There might also be a sense in which certain natural abstractions are 'correct' if we can argue that they correspond to particularly efficient representations of the world. Additionally, our values are often defined in terms of the abstractions we use. If I value 'tulips', and I ask my AI to maximize the number of tulips, then I want to be sure that the AI has learned the same abstraction of tulips that I have.
The Natural Latents research program is an attempt to characterize the 'naturalness' of abstractions information theoretically [2]. It centres around the claim/premise that a 'natural' abstraction is one which captures 'all and only' the shared information between members of a class. This is done by positing a 'latent variable'; an additional variable 'on top of' our data which explains its structure and captures the correlations found within that data. Agents (such as humans or AIs) can use latent variables to construct predictive models of datasets. Doing this is called constructing a 'latent variable model' of the data.
For example, let's return to a sample containing just sunflowers and tulips. For simplicity, lets assume that the only variables we can measure are, 'colour of flower' and 'height'. So that we can make 'colour' a continuous variable, we'll quantify it as 'the primary wavelength of light reflected by the flower'. If we measure the height and flower colour for a number of plants, we might get some data which looks like this:
In this sample, the colour of the flower is strongly correlated with the height of the plant, since sunflowers have yellow flowers and are generally over 1.5m tall, whereas tulips have red flowers and are less than 1m tall[3]. So if I tell you the colour of the flower, you can make a pretty good guess at the height (and vice versa). In this sense, the 'height' and 'colour' variables share information. We can capture this shared information by introducing a third variable 'plant species' which can take two values 'tulip' or 'sunflower':
This variable is a 'latent' variable, because it isn't something we directly observed; instead it is something we inferred from the data we did observe. Notice that this latent has some interesting properties:
Loosely, if a latent satisfies these properties (we'll make them more crisp later on), then we call it 'Natural Latent'. In this toy setting a Natural Latent loosely corresponds to a Natural Abstraction. The Natural Latent captures 'all and only' the shared information between the height and the colour of the flower.
To link this back to the problem of Ontology Identification, Wentworth and Lorell have proved some 'Translatability' results[5]. Roughly, these results show that if you have any latent variable model which is good at predicting some dataset, then you can 'translate' from that latent variable model, to a Natural Latent Model of the same data. We won't have room to prove these results in this post, but hopefully it this feels a bit intuitive. Suppose you have a model of data which captures all of the correlations in it and nothing else (a Natural Latent Model). And suppose someone else has a model of the data which is 'good' (in a predictive sense). Then this person's model must contain, somewhere 'inside it', the information about the correlations which is captured by your Natural Latent model, otherwise they wouldn't be able to go a good job a predicting the data. So it makes sense that you should be able to 'translate' between this persons model and the Natural Latent Model.
In our tulips/sunflowers example, this is pretty trivial. Suppose Alice uses the latent variable model of "Tulip = thing that is under 1m tall and has peak wavelength between 650nm and 750nm " and "Sunflower = thing that is over 1.25mtall and have peak wavelength between 500nm and 650nm". Furthermore, if her model says that 50% of the datapoints are sunflowers and 50% are tulips, then she has a pretty good predictive model of the data.
If Bob uses the variable definition "Snark = thing that is under 1.2m tall and has peak wavelength between 640nm and 756nm " and "Boojum = thing that is over 1.34mtall and has peak wavelength between 510nm and 642nm" and has a model which predicts that 51% of the datapoints will be Snarks and 49% will be Boojums, this will also be pretty good a predicting the data. The translatability theorems would allow us to 'translate' between Alice and Bob's models and realise that (up to some approximation) 'Tulips' are equivalent to 'Snarks' and 'Sunflowers' are equivalent to 'Boojums'.
One final clarification before we begin the maths. We are not saying that the definition 'Tulip=thing that is red and under 1m tall' is a Platonic Fact About Reality Engraved into the Laws of the Universe. It is 'natural' only relative to this dataset. In some contexts/datasets (such as the one considered above), this is a 'good' or 'natural' way of modelling the data. But it might not always be. Someone might come along and paint all of the tulips in the world blue and then we would have to change our model. But this would be as a result of the objective change in the dataset and we would have to update our latent variable model in response to this change.
Enough hand-waving and disclaimers. In what follows, we'll try to make these concepts a bit more mathematically precise, more general, and less botanical.
Introduction
Broadly, the aim of defining a natural latent is to identify 'all and only' information that is shared between two variables[6]. We will call these two variables and .
The setting we will use to explore this idea is as follows. Both and are random variables which can take ten possible values. can take a value from the set and can take a value from the set . The variables can be described using a joint probability distribution . We can visualize this joint distribution using a graph with darker squares representing higher probabilities and lighter squares representing lower probabilities. White squares indicate zero probability. (I'm not going to be too careful to be consistent with the exact shades, but white will always indicate zero probability). For example:
Fig 1
In this distribution, with low indices are more likely than ones with high indices and pairs whose indices differ by more than two have zero probability. We'll operationalize 'sharing information' by saying 'if I told you the value of , would it help you guess the value of ?'. In the above distribution, and clearly share information in this sense.
But not all variables do contain information about each other. For example:
Fig 2
In this distribution, and are completely independent. If I tell you the value of , it does nothing to help you guess the value of A way to quantify this is through mutual information between and , which we write as .
Mutual information is the average change in Shannon entropy of one variable that occurs when you learn the value of the other variable. It is symmetric, so we have:
In the case where and are independent (as in Figure 2) we have .
Lets consider the following joint distribution where and do contain information about each other.
Fig 3
How do and 'contain' information about each other? If I tell you that , then you can be certain that is in the set . Similarly, if I tell you that , you can be certain that is . Loosely, knowing one of the variables reduces your uncertainty by halving the possible outcomes, so the mutual information between and is equal to 1 bit [7].
It is pretty obvious visually that we can split this distribution up into two distinct 'chunks' of probability mass: one where the and indices less than or equal to 5 and the other where they are greater than 5. We will label these chunks ' ' and ' ' respectively and colour them differently.
Fig 4
We haven't changed the underlying distribution, just coloured the graph and labelled the chunks. We can think of this as adding another random variable to the setup, 'on top of' and . Now, we have a three-variable distribution . The conditional distribution can be defined as follows
(for completeness we have defined to equal for the off-diagonal combinations, but since these have zero probability in our distribution we won't end up talking about much here.)
In this case, we can also think of as a deterministic function of and , ie. we can write where is defined as
If you want to translate this back to the tulips/sunflowers example, treat as a binned version 'height', as a binned version of 'colour of flower' and as 'species of plant'
For now, all of the latents we will discuss will be deterministic functions of but strictly this assumption isn't needed. In graphs, we'll denote different values of by colouring the corresponding squares different colours. In a later section, we'll see an example of an approximate latent which is not a deterministic function of .
Now, we are going to claim that captures 'all and only' the shared information between and and formalise this claim. We will do this by introducing the 'Mediation' and 'Redundancy' conditions and showing that exactly satisfies them.
The (Exact) Natural Latent Conditions
The Exact Mediation Condition [8]
First, we'll inspect the claim that (as shown in Figure 4) captures 'all' the shared information between and . Suppose you don't know what and are, but you do know that . Then, you will know that the true pair lies somewhere in the bottom left quadrant of the graph. As a result, your subjective distribution over and will look like this:
Fig. 5
This is the conditional distribution . Notice that, once you know , if I then tell you the true value (say ), this doesn't tell you anything else about . You start with an evenly spread uncertainty over all five -values and after finding out that , you still have equal uncertainty over each of those five Y-values. This is true for all and pairs. Once you know , the variables and become independent.
We can express this fact using conditional mutual information . This just the mutual information we introduced earlier, but calculated using the conditional distribution .
Conditional on the fact that , and share no information. We can say the same thing for :
Fig. 6
So, knowing the value of 'extracts' all of the correlation between and . Once you know , you can't find out anything else about by looking at (or vice versa).
We quantify this using the conditional mutual information which is the expected value of and . This just equals zero, since both terms equal zero:
The mutual information between and , conditional on is equal to zero. If this is the case, we say that exactly mediates between and or we say that exactly satisfies the 'Mediation condition':
(We're emphasising that 'exactly' satisfies the mediation condition because eventually we'll look at cases where the mediation condition is only approximately satisfied. In that case, we'll have is small but nonzero. More on this later!)
Extending the intuitions behind this, we can apply the mediation condition to any distribution, not just the toy model presented here. If for any three variables , and then we can interpret that as saying that contains all the information contained in about and vice versa.
Now, we'll explain the 'Redundancy' conditions which require that contains only the shared information between and .
The Exact Redundancy Conditions[9]
We want to check that contains 'only the shared information between and '. In other words, we don't want to contain a bunch of other information about if that information isn't also shared with . Again, to operationalise this, we can ask: 'if I tell you the value of , is there any mutual information between and remaining?'. If, after telling you the value of , we still have mutual information between and , then must contain 'extra' information about which is not helpful for predicting . To do this, let's plot for a few of values of .
Fig. 7
Notice that in all cases, and are independent, conditional on . We can see that in each case, once you condition on a particular value, knowing does not give you any extra information about and knowing doesn't give you any extra information about . This is true whether we pick one of the 'upper' -values or one of the lower ones meaning that for all . The conditional mutual information is given by the expected value:
We can do something similar to find out if contains any 'excess information' about . Going through the same process, we would find that mutual information between and , conditional on also equals zero.
If we find that we can conclude that does not contain any information present in that is not present in (and vice versa). Taken together, these two conditions are known as the 'exact redundancy conditions':
Like the Mediation condition, we can check whether these conditions apply to any distribution , not just our toy example. If for any joint distribution , we can interpret this as meaning \Lambda does not contain any information present in that is not present in .
More exact Natural Latent examples
Now, we have expressed the three exact natural latents conditions and justified why they characterise a latent which contains 'all and only' information shared by and . Here they are again, all together:
In words, these conditions correspond to the following:
To build some intuitions, let's explore some more distributions which satisfy all three of the exact natural latent conditions. (As an exercise, you may wish to verify for yourself that each of these examples do indeed satisfy the conditions.)
First, if and share more information than can be captured by a binary random variable, we can add more outcomes to . For example, the following distribution:
can be given the following natural latent:
Fig. 8
Here, each of the outcomes of might contain different amounts of information about . In this distribution, if I tell you that , you know exactly what value and take. But if I tell you that , you have some information about and , but not as much. Nonetheless, still captures 'all and only' the shared information between and so it satisfies the exact natural latent conditions.
Our examples shown have had the joint distributions conveniently arranged into chunks, which makes it easy to see patterns in the data. But this is just a feature of how we have labelled the dataset, not an information-theoretic fact about the data. Not all distributions which satisfy the natural latent conditions will look like this. For example, we could change the and axis labelling of Figure 8 above to obtain the following image:
Fig. 9
This distribution (with still represented by the four different colours) still satisfies the exact natural latent conditions, since it is information-theoretically identical to the distribution shown in Figure 8.
In our examples so far, the 'chunk' of probability mass associated with each latent has been uniform, but this does not have to be the case. We just need that and are independent given (as well as the redundancy conditions). As a result, we can consider natural latents for distributions which look like this:
Notice that we can apply the same latent variable that did in the case of our initial distribution[10] and this still satisfies the exact natural latent conditions for this distribution. While the distributions are no longer uniform, we still have that and are independent conditional on . Similarly, the redundancy conditions are also exactly satisfied.
Approximate Natural Latent Conditions
Instead of requiring that our latent satisfies the exact natural latent conditions, we can also talk about latents which 'approximately' satisfy the three conditions. Rather than saying that the conditional mutual informations listed above are exactly zero, we can just enforce that they are 'small' in the sense that they are less than some . In the next few sections, we'll explore what it would mean for mediation and redundancy errors to be non-zero.
Approximate Mediation
The mediation condition required that, once we conditioned on , there was no further mutual information between and . We described this by saying that contained all of the shared information between and . To make this condition approximate, we can instead require that, conditioned on , and share a nonzero but 'small' amount of mutual information:
The smaller , the better the latent is at mediating. In an approximate natural latent, with small but nonzero , we have that captures some but not all of the shared information between and . What would this look like?
Consider the following distribution:
Fig. 10
Along with this distribution let's use our latent from before, which labels the top right corner with and the bottom left corner with :
Fig. 11
In this case, conditioning on or respectively gives us the following distributions:
Fig. 12
If you initially knew that the distribution looked like Figure 11 and then received the extra information that , your updated subjective distribution over should look like the plot on the left. Clearly, being told the value of has given you some information about and , but not all of the shared information. If you are just told , you would know that is in the set , but if you were told the value of , (say, ) after being told , you would learn something else about that wasn't captured by alone (namely, if then or but not or ). This means that does not capture all of the shared information between and .
In this case, we don't have , we have [11]. If 0.9 bits is small for our purposes, we could say that approximately satisfies the mediation condition with error .
Now, we'll talk about what it would mean for a latent to approximately satisfy a redundancy condition.
Approximate Redundancy
The (approximate) mediation condition requires that contains (approximately) all of the shared information between and . The redundancy conditions require that contains no 'extra' information that is present in but not in (or vice versa). By making the redundancy conditions approximate, we can enforce a weaker version of this: that contains only a small amount of information that is present in one variable but not the other. The approximate redundancy conditions can be expressed as follows:
What does it mean for a latent to contain some information about one variable that is not present in the other? Consider this distribution:
We might want to use the approach of identifying visual 'chunks' in the distribution and labelling finding a latent that corresponds to labelling each of these chunks:
Fig. 13
But this latent does not carry 'only' the shared information between and . If we plot we can see clearly that conditioning on doesn't remove all shared information between and . This means that contains some information which is present in which is not present in .
If we do the calculation for this distribution, we get [12] . This means that the exact redundancy condition would not be satisfied. But, for , the approximate redundancy condition would still be satisfied.
Similarly, we can consider a latent which contains some information about which is not present in .
Fig. 15
This distribution will fail to satisfy the exact redundancy condition , but, again for it will still satisfy the approximate redundancy condition.
We can also have latents which are approximate with respect to both redundancy conditions
Fig. 16
(Incidentally, this distribution also fails to satisfy the exact mediation condition. Can you see why?)
Introducing Randomness to Latents
So far, we have looked at latents which are deterministic functions of X and Y, ie. they have been latents which can be expressed as
But to be more general, we might want to consider latents which are defined by a general conditional probability distribution . This would mean that we allow to be randomized for some (or all) pairs.
Why might we want to do this? Recall in the previous section, Figure 13 showed that a latent can fail to satisfy the exact redundancy conditions due to containing too much information about that is not present in .
Fig. 17
One way to remove this 'extra' information from is to get to randomize for certain pairs. The outcomes which caused this latent to fail to satisfy the exact redundancy conditions were those where and . When and , the latent in the above diagram insists on labelling these outcomes either with either or in a way that depends only on , leading to containing extra information about that is not present in . We can remove this extra information by requiring that, whenever or , the latent simply tosses a coin, picking half of the time and the other half of the time. We can depict this by colouring the squares a mixture of orange and blue:
Fig. 18
(I have removed the labels but blue still means and orange still means ).
Now we can view which now looks like this:
Fig. 19
Now, given , we have that and are independent. no longer contains information present in that is not also present in so exactly satisfies the redundancy condition. So introducing randomness to the latent can remove the 'extra' information from , allowing it to better satisfy the redundancy condition. (But note: this modification to now means that it fails the exact mediation condition! Try sketching to see why.)
Some Example Latents
To keep building intuitions let's examine a couple of general classes of latents.
Example: Constant latent
Consider the trivial latent which is always equal to , regardless of and . Recall the three conditions:
Which (if any) of these conditions will the constant latent satisfy? (If you want to test your understanding, try to work out the answer before reading on)
Mediation. If there is any shared information between and , the constant latent will not capture it. Conditioning on will not affect the joint distribution, so we have . As a result, we have . This means that the constant latent only satisfies the exact mediation condition if there is no shared information between and . In other words: a constant contains zero information, so the only way it can capture 'all of the shared information' between and is if and share zero information!
Redundancy. The constant latent will always exactly satisfy the two redundancy conditions. Since contains no information about or it cannot contain any information that is present in but not present in (indeed, it contains no information at all).
Example: Everything Latent
Let us define a new which we'll call the 'everything latent'. Let be a deterministic function of with a unique value for every pair. How does this latent perform?
Mediation. In this case, , and all equal zero so the conditional mutual information also equals zero. In this case, captures all shared information between and because it in fact captures all information about and , shared or not.
Redundancy. Note that for the everything latent. Therefore the mutual information quantities for the redundancy conditions will be determined by the conditional entropies of original distribution:
Summary
In this post, we have introduced the concept of Natural Abstractions and discussed its relation to AI Safety research programs. We then introduced the Natural Latents framework which is one approach to formalising such problems mathematically. Then, we introduced the (exact and approximate) Natural Latent conditions and gave some examples to demonstrate what they correspond to intuitively.
There have been no ideas here which have not already been covered by Wentworth and Lorell in various places and there are many other ideas which did not make it into this post. But hopefully this post has served its purpose as a reasonably self-contained introduction to the core of the relevant ideas and you now feel empowered to tackle some of the many other writings about this subject. If you are feeling brave, you might want to try your hand at one of the open problems in this field (such as this one or this one). Let me know if you have any success!
Others have pointed out that the KL-divergence formulation of the Natural Latents Conditions can be expressed as conditional mutual information expressions. Notably in this post and this comment .
Here, we will discuss the Shannon information formulation, though an algorithimic information version of these ideas is described in this post.
I know that there are different varieties of sunflowers and tulips which break these rules. Ignore them for now.
This is a 'bonus' property that corresponds to our latent being a deterministic function of 'height' and 'colour'. If this property is satisfied, we have a 'deterministic natural latent'. We won't go into the distinction between deterministic and stochastic natural latents in this post. This is discussed more here.
which can be found in this paper and this post.
Actually, we can consider latent variables which capture 'all and only' shared information between more than two variables, but we'll stick with the 2 variable case in this post since its easier to visualise.
More formally, the calculation is as follows. Initially is uniform over 10 outcomes, so its entropy is . Then, upon learning any particular , the conditional entropy is a uniform distribution over 5 -outcomes which has entropy . This is true for all so . The mutual information is then
The Mediation Condition was sometimes called the 'Independence Condition' in earlier work.
The Redundancy Conditions were sometimes called 'Insensitivity Conditions' in other work.
ie.
Since
we have .