Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Embedded Naive Bayes

6Vanessa Kosoy

4johnswentworth

4Rohin Shah

6johnswentworth

5Rohin Shah

6johnswentworth

New Comment

6 comments, sorted by Click to highlight new comments since: Today at 8:57 AM

I am indeed leaving out some assumptions, mainly because I am not yet convinced of which assumptions are "right". The simplest assumption - used by Aczel - is that and are monotonic. But that's usually chosen more for mathematical convenience than for any principled reason, as far as I can tell. We certainly want some assumptions which rule out the trivial solution, but I'm not sure what they should be.

Here's the use case I have in mind. We have some neural network or biological cell or something performing computation. It's been optimized via gradient descent/evolution, and we have some outside-view arguments saying that optimal reasoning should approximate Bayesian inference. We also know that the "true" behavior of the environment is causal - so optimal reasoning for our system should approximate Bayesian reasoning on some causal model of the environment.

The problem, then, is to go check whether the system actually is approximating Bayesian reasoning over some causal model, and what that causal model is. In other words, we want to check whether the system has a particular causal model (e.g. a Naive Bayes model) of its input data embedded within it.

What do you imagine "embedded" to mean?

I usually imagine the problems of embedded agency (at least when I'm reading LW/AF), where the central issue is that the agent is a part of its environment (in contrast to the Cartesian model, where there is a clear, bright line dividing the agent and the environment). Afaict, "embedded Naive Bayes" is something that makes sense in a Cartesian model, which I wasn't expecting.

It's not that big a deal, but if you want to avoid that confusion, you might want to change the word "embedded". I kind of want to say "The Intentional Stance towards Naive Bayes", but that's not right either.

Ok, that's what I was figuring. My general position is that the problems of agents embedded in their environment reduce to problems of abstraction, i.e. world-models embedded in computations which do not themselves obviously resemble world-models. At some point I'll probably write that up in more detail, although the argument remains informal for now.

The immediately important point is that, while the OP makes sense in a Cartesian model, it *also* makes sense without a Cartesian model. We can just have some big computation, and pick a little chunk of it at random, and say "does this part here embed a Naive Bayes model?" In other words, it's the sort of thing you could use to *detect* agenty subsystems, without having a Cartesian boundary drawn in advance.

Suppose we have a bunch of earthquake sensors spread over an area. They are not perfectly reliable (in terms of either false positives or false negatives), but some are more reliable than others. How can we aggregate the sensor data to detect earthquakes?

A “naive” seismologist without any statistics background might try assigning different numerical scores to each sensor, roughly indicating how reliable their positive and negative results are, just based on the seismologist’s intuition. Sensor i gets a score s+i when it’s going off, and s−i when it’s not. Then, the seismologist can add up the s+i scores for all sensors going off at a given time, plus the s−i scores for sensors not going off, to get an aggregate “earthquake score”. Assuming the seismologist has decent intuitions for the sensors, this will probably work just fine.

It turns out that this procedure is equivalent to a

Naive Bayes model.Naive Bayes is a

causal modelin which there is some parameter θ in the environment which we want to know about - i.e. whether or not there’s an earthquake happening. We can’t observe θ directly, but we can measure it indirectly via some data {xi} - i.e. outputs from the earthquake sensors. The measurements may not be perfectly accurate, but their failures are at least independent - one sensor isn’t any more or less likely to be wrong when another sensor is wrong.We can represent this picture with a causal diagram:

From the diagram, we can read off the model’s equation: P[θ,{xi}]=P[θ]∏iP[xi|θ]. We’re interested mainly in the posterior probability P[θ|{xi}]=1ZP[θ]∏iP[xi|θ] or, in log odds form,

L[θ|{xi}]=lnP[θ]P[∼θ]+∑ilnP[xi|θ]P[xi|∼θ]

Stare at that equation, and it’s not hard to see how the seismologist’s procedure turns into a Naive Bayes model: the seismologist’s intuitive scores for each sensor correspond to the “evidence” from the sensor lnP[xi|θ]P[xi|∼θ]. The “earthquake score” then corresponds to the posterior log odds of an earthquake. The seismologist has unwittingly adopted a statistical model. Note that this is still true regardless of whether the scores used are well-calibrated or whether the assumptions of the model hold - the seismologist is implicitly using this model, and whether the model is

correctis an entirely separate question.## The Embedded Naive Bayes Equation

Let’s formalize this a bit.

We have some system which takes in data x, computes some stuff, and spits out some f(x). We want to know whether a Naive Bayes model is embedded in f(x). Conceptually, we imagine that f(x) parameterizes a probability distribution over some unobserved parameter θ - we’ll write P[θ;f(x)], where the “;” is read as “parameterized by”. For instance, we could imagine a normal distribution over θ, in which case f(x) might be the mean and variance (or any encoding thereof) computed from our input data. In our earthquake example, θ is a binary variable, so f(x) is just some encoding of the probability that θ=True.

Now let’s write the actual equation defining an embedded Naive Bayes model. We assert that P[θ;f(x)] is the same as P[θ|x] under the model, i.e.

P[θ;f(x)]=P[θ|x]=1ZP[θ]∏iP[xi|θ]

We can transform to log odds form to get rid of the Z:

L[θ;f(x)]=lnP[θ]P[∼θ]+∑ilnP[xi|θ]P[xi|∼θ]

Let’s pause for a moment and go through that equation. We know the function f(x), and we want the equation to hold for all values of x. θ is some hypothetical thing out in the environment - we don’t know what it corresponds to, we just hypothesize that the system is modelling

somethingit can’t directly observe. As with x, we want the equation to hold for all values of θ. The unknowns in the equation are the probability functions P[θ;f(x)], P[θ] and P[xi|θ]. To make it clear what’s going on, let’s remove the probability notation for a moment, and just use functions G and {gi}, with θ written as a subscript:∀θ,x:Gθ(f(x))=cθ+∑igθi(xi)

This is a functional equation: for each value of θ, we want to find functions G, {gi}, and a constant c such that the equation holds for all possible x values. The solutions G and {gi} can then be decoded to give our probability functions P[θ;f(x)] and P[xi|θ], while c can be decoded to give our prior P[θ]. Each possible θ-value corresponds to a different set of solutions Gθ, {gθi}, cθ.

This particular functional equation is a variant of Pexider’s equation; you can read all about it in Aczel’s

Functional Equations and Their Applications, chapter 3. For our purposes, the most important point is: depending on the function f, the equation may or may not have a solution. In other words, there is a meaningful sense in which some functions f(x)doembed a Naive Bayes model, and othersdo not. Our seismologist’s proceduredoesembed a Naive Bayes model: let G be the identity function, c be zero, and gi(xi)=sxii, and we have a solution to the embedding equation with f(x) given by our seismologist’s add-all-the-scores calculation (although this is not theonlysolution). On the other hand, a procedure computing f(x)=xxx321 for real-valued inputs x1, x2, x3 wouldnotembed a Naive Bayes model: with this f(x), the embedding equation would not have any solutions.