# 30

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Epistemological Status: I attempt to distill the minimal map argument and provide a natural way to chain a sequence of queries together. I argue that this definition, together with reasonable symmetry constraints on the minimal map, provides an intuitive explanation for the appearance of neural networks.

## Minimal Maps

Say we sample an observation from a state-observation pair and want to decide based only on whether has a property . We can define the function to indicate whether or not an observation has the property we have in mind. Concretely, I've chosen to suppress the reference to the hidden state since it's not an observable. This could alternatively be viewed as a semi-decision that either terminates in finite time with a positive answer or runs forever. The catch is that each observation is partial and dependent on an unobserved hidden state. Here's one natural question: what information summarizes the outcome of the semi-decision?

Minimal Map Lemma: Up to isomorphism, any information in observing the outcome of for a given is equivalent to . Moreover, this map is minimal in the sense that any other representation is either equivalent in information content or lossy.

Proof: First, fix some other map and consider the sequence . By the data processing inequality the total mutual information satisfies . Second, by definition, the event is equivalent to any observation because the outcome of the semi-decision is unaffected. Because we also have we can just write, Finally, we have for the entropy, Thus, with respect to the semi-decision, the information contained in the probability of a decision is equivalent to that contained in the observation.

## Semi-Decision Process

At this point, we conclude that it's natural to represent the semi-decision on the observation as a probability. Now we can consider something more familiar. Consider a set of properties that satisfy the following for , If the previous properties are satisfied we'll say that the objects can be classified by . Alternatively, we can say is a classifier for the objects . It's worth noting that we can always add a 'non of the above' property in the case that our is only mutually exclusive which results in a proper classifier.

Fact: Every property can be associated to a classifier .

This essentially is the motivation for the choice of definition.

It’d be nice to have a natural way to compose classifiers together on observations. Why? A central advantage is that if we do this correctly then we would be able to build up complicated classifiers from simpler ones.

With this goal in mind, we’ll first define, The is an isomorphism that comes from the 'up to isomorphism' clause of the minimal map lemma. This is actually quite useful as we'll see in a bit. For the moment, our definition of return can be understood as a projection of the object onto a basic collection of minimal maps. After this, we need one more definition, This projects the resulting minimal maps onto a new collection of minimal maps. What we're trying to do is allow for queries about query outcomes. This allows us to chain together a sequence of queries. Specifically, there will be some number of sequential binds with queries at each bind for . The sequence of queries at each bind defines a semi-decision process over the observations.

This definition allows us to treat our queries as a monad. While the only immediate use is for function composition, which could've been defined without the formality, this approach makes it easier to think about side-effects and related constructs such as the comonad (categorical dual of the monad) based on the exact outcomes of the queries.

This representation might also be useful to anticipate a natural class of minimal maps to consider. This is because our bind maintains the conditional notation and almost appears as and we could view each query as a projection against the objects.

## Neural Representations

We'll say the observations and queries have (vector) representations that are rotation invariant if we have, The property is telling us that if we rotate our observations and our query then the resulting determination should remain the same. We'll say that the observation and queries are dual if we have, In other words, if the process is dual then mapping the query over the observations is equivalent to mapping the data over the query. With these two notions, we can state the main insight,

Proposition:

If our semi-decision process on the observations is rotation invariant and dual then we have, Moreover, is a ridge-activation and the outcome of -sequential binds is equivalent to a neural network with -layers.

Proof: The first equality follows from the fact the inner-products are the only rotationally invariant quantities that can be made out of the query and the observation. The second inequality follows from symmetry. Only the inner-product between the query and observation is symmetric under swapping.

For the next part we'll relabel . Say the -th sequential bind has queries. Then we have, In the last equation, we've replaced the index to the queries with a matrix for compactness. The last expression is the definition of a multi-layer neural network.

As an immediate corollary, we see that this covers both radial and ridge activation functions. Our activation is fixed as a ridge activations since duality holds, but if we were to relax the condition to just rotation invariance then we'd have the possibility of radial activations. Moreover, if is monotone in its argument then because is an isomorphism the activation will also be monotone. For example, if is a sigmoid and then is Softplus.

Even if our data doesn't satisfy our representation property, every (reasonable) classifier can be approximated as such by the universal approximation theorem. This means that if we only care about producing approximations of minimal maps we can view the assumptions we made as a Wittgenstein ladder. Overall, what we showed is that the minimal map theorem, the monadic representation of a semi-decision process, and basic symmetry assumptions are enough to derive the general form of a neural network.

# 30

New Comment

I'm going to have to spend some time unpacking the very compact notation in the post, but here are my initial reactions.

First, very clean proof of the lemma, well done there.

Second... if I'm understanding this correctly, each neuron activation (or set of neuron activations?) would contain all the information from some-part-of-data relevant to some-other-part-of-data and the output. So it's of roughly the right form for neuron activations to encode abstractions. That would lend direct support to the hypothesis that neurons/small sets of neurons in nets often end up encoding human-like concepts because both humans and NNs are learning natural abstractions.

Better yet, it looks like the OP gives a recipe for unpacking those natural abstractions? That would give a potentially-really-powerful tool for transparency, allowing us to directly test some of the hypotheses in alignment by default, and potentially addressing some of the bottlenecks in that whole approach.

The more I think about it, the more this post looks really exciting.

I'm going to have to spend some time unpacking the very compact notation in the post, but here are my initial reactions.

I should apologize a bit for that. To a degree I wasn't really thinking about any of the concepts in the title and only saw the connection later.

First, very clean proof of the lemma, well done there.

Thanks!

Second... if I'm understanding this correctly, each neuron activation (or set of neuron activations?) would contain all the information from some-part-of-data relevant to some-other-part-of-data and the output.

To be honest, I haven't thought about interpreting the monad beyond the equivalence with neural networks. One thing I noticed early on is that you can create sequences of activations that delete information in the limit. For example, the ReLU activation is the limit of the SoftMax (change log base). I think something like this could be seen as abstracting away unnecessary data.

Better yet, it looks like the OP gives a recipe for unpacking those natural abstractions?

I'm not sure. I do think the method can justify the reuse of components (queries) and I wouldn't be surprised if this is a pre-requisite for interpreting network outputs. Most of my interest comes from trying to formalize the (perhaps obvious) idea that anything that can be reduced to a sequence of classifications can be used to systematically translate high-level reasoning about these processes into a neural networks.

I guess it's best to give an example of how I currently think about abstraction. Say we take the position that every object is completely determined by the information contained in a set of queries such that . For a picture, consider designing a game-avatar (mii character) by fiddling around with some knobs. The formalism lets us package observations as queries using return. Thus, we're hypothesizing that we can take a large collection of queries and make them equivalent to a small set of queries. Said another way, we can answer a large collection of queries by answering a much smaller set of 'principle' queries. In fact, if our activation was linear we'd be doing PCA. How we decide to measure success determines what abstraction is learned. If we only use the to answer a few queries then we're basically doing classification. However, if the have to be able to answer every query about then we're doing auto-encoding.