Epistemological Status: I attempt to distill the minimal map argument and provide a natural way to chain a sequence of queries together. I argue that this definition, together with reasonable symmetry constraints on the minimal map, provides an intuitive explanation for the appearance of neural networks.

Minimal Maps

Say we sample an observation $(o, h) \in O \times H$ from a state-observation pair and want to decide based only on $o$ whether $h$ has a property $U$ . We can define the function $χ_{U} : H \times O \to {0, 1}$ to indicate whether or not an observation has the property we have in mind. Concretely, $χ_{U} (o) = χ_{U} (h_{o} | o) = {\begin{matrix} 1 h_{o} has property U 0 otherwise \end{matrix}$ I've chosen to suppress the reference to the hidden state since it's not an observable. This could alternatively be viewed as a semi-decision that either terminates in finite time with a positive answer or runs forever. The catch is that each observation is partial and dependent on an unobserved hidden state. Here's one natural question: what information summarizes the outcome of the semi-decision?

Minimal Map Lemma: Up to isomorphism, any information in observing the outcome of $χ_{U} (o)$ for a given $o \in O$ is equivalent to $P (χ_{U} (o) | o)$ . Moreover, this map is minimal in the sense that any other representation $g_{U} (o)$ is either equivalent in information content or lossy.

Proof: First, fix some other map $g_{U}$ and consider the sequence $o \to χ_{U} (o) \to g_{U} (o)$ . By the data processing inequality the total mutual information satisfies $I (o; χ_{U} (o)) \geq I (o; g_{U} (o))$ . Second, by definition, the event $P (χ_{U} (o) | o) = p$ is equivalent to any observation $^o \in P^{- 1} (p)$ because the outcome of the semi-decision is unaffected. Because we also have $o \in P^{- 1} (p)$ we can just write, $P (χ_{U} (o) | P (χ_{U} (o) | o)) = P (χ_{U} (o) |^o) = P (χ_{U} (o) | o)$ Finally, we have for the entropy, $H (χ_{U} (o) | P (χ_{U} (o) | o)) = - \sum χ_{U} (o) \in {0, 1} P (χ_{U} (o) | P (χ_{U} (o) | o)) \cdot log (P (χ_{U} (o) | P (χ_{U} (o) | o))) = - \sum χ_{U} (o) \in {0, 1} P (χ_{U} (o) | o) \cdot log (P (χ_{U} (o) | o)) = H (χ_{U} (o) | o)$ Thus, with respect to the semi-decision, the information contained in the probability of a decision is equivalent to that contained in the observation. $□$

Semi-Decision Process

At this point, we conclude that it's natural to represent the semi-decision on the observation as a probability. Now we can consider something more familiar. Consider a set of properties $U = {U_{i}}_{i \in I}$ that satisfy the following for $o \in O$ , $\exists i \in I : χ_{U_{i}} (o) = 1 (Decidable) χ_{U_{i}} (o) = 1 \to χ_{U_{j}} = 0, \forall j \in I ∖ {i} (Mutually Exclusive)$ If the previous properties are satisfied we'...

Posts

Wikitag Contributions

Comments