Modeling Concepts Probabilistically

Gretta Duleba

As I come up to speed on John Wentworth and David Lorell’s work on natural abstraction, I’m filling in some of the gaps in their writing. Previously I posted about a Test Suite for Concepts. Today I’m going to talk about why they use a probabilistic frame when reasoning about concept representation in the minds of agents and the kind of funny way they throw around the word “latent."

Working With Concepts

So far, we have this very high-level idea of a “concept” in a “mind.” We want to start to think about this in a mathematical way, so we can reason about it more precisely.

What kind of framework should we use?

There’s considerable prior art in the area of concept representation from such disparate fields as cognitive science, psychology, neuroscience, good-old-fashioned AI (GOFAI), and machine learning. There’s quite a good Related Work section in the Natural Abstractions distillation article by Chan, Lang, and Jenner, so I won’t reprise all of that here.

These various frameworks are solving different problems. Some of them are descriptive and grounded in, e.g., the biological details of exactly how information is retained in a brain made out of neurons, or the numerical details of how information is represented in a neural network. Other frameworks are quite high level and philosophical. So we’re not really comparing apples to apples when we look at all of these things next to each other.

John’s main lens for thinking about concepts in minds is probabilistic and Bayesian, and it occupies a middle ground between the descriptive, low-level, hardware-specific frameworks and the abstract, philosophical, high-level frameworks. He’s interested in useful ways of modeling what’s going on inside the mind of an agent.

Do agents “really” do Bayesian reasoning natively?

Is John saying that all the various kinds of minds/agents literally implement probabilistic world models and update them in a purely Bayesian way? Is he saying that they’re all optimal utility maximizers?

No, definitely not, even though it sometimes looks like he’s saying that.^[1] There are plenty of counterexamples. Humans, for example, are, uh, non-optimal. (Citation needed.)

As I understand it, he’s saying some nearby but easier-to-defend things:

Even very simple, basic agents can be usefully modeled as probabilistic. You can take an intentional stance toward an E. coli, for example, attributing goals and a world model to it, while recognizing that any individual bacterium is probably not doing any deep causal reasoning and updating – but a probabilistic model is still useful because E. coli evolved under selection pressure and is optimized over a variety of conditions to which its ancestors were exposed.
More advanced agents probably “really” are forming probabilistic world models, but those models are embedded rather than being directly represented, and you’re also going to find various heuristics and approximations layered on top so the agent can function with limited resources.

Here by “embedded” we mean that when you look at the actual biochemical activity in a human brain, or at the circuits in a neural network, you’re probably not going to find something that’s obviously just a Pearl-style causal DAG structure with joint probability tables and update loops and so on. You’re going to find… something else. Something that is not obviously Bayesian at first glance. However, its behavior is often well-approximated by a Bayesian model, and if we could figure out how to reverse-compile what we see in there, John claims the mapping would be pretty good.^[2]
And yes, he is saying that Bayesian reasoning is a good model^[3], as opposed to any other fundamental framework, while accepting that whatever’s actually running may be different from but isomorphic to Bayesian reasoning.

Furthermore, when considering advanced agents – the more selection pressure the mind has been under, and the more resources it has, the more likely it is that a purer version of a probabilistic model is going to fit. The reason is that reality bites back; agents that perform well at prediction and steering are going to need to build a compact causal world model, with concepts as components in that model.

Another angle on this is that “you don’t get to choose the ontology.” We want to talk to aliens and AIs. We don’t get to dictate whatever half-baked ontology we want; we need to acknowledge that the environment itself contains and dictates the structure, and all of the agents are trying to reason about and predict the environment. The intelligent ones – that is, the effective agents – will (we believe) converge on similar ontologies and have interoperable semantics.

The language of statistics for concepts

So let’s start to pick up that statistical language and map it onto our conceptual work.

We’re going to think in terms of data. Random variables – observables and latents.

Our embedded agent with its wonky I/O channels will gather information about reality through its sensory apparatus: eyes, tentacles, photosensitive array, whatever. We’ll call these the observables.

We will think of the agent as building a world model. It will hypothesize that the world is made of moving parts that affect each other causally. Those parts, and their actions, will have conceptual types. A dog might chase a ball, and the agent can reason and make predictions about that.

As the agent continues to make observations, we will think of the agent as filtering and aggregating the data and using it to update its world model. That process of filtering, aggregation, sense-making, and updating, will run through a process we will think of as statistical analysis, and it will likely involve working with latent variables – higher level concepts that were not directly observed by the agent.

Latents, as it turns out, are a pretty big deal, because pretty much all of the concepts in a functional world-model are latents rather than direct observations.

The next step is to take a closer look at latents and see how they arise and how they map to concepts represented in minds.

Concepts as Latents

Latents

A latent, here, means pretty much exactly what you would expect if you have done any work with random variables in statistics. Latent variables appear inside some statistical models.

A statistical model binds to reality if reality is well-compressed by the model. The fact that the model compresses the data implies that the data does in fact have structure – there is some predictability/negentropy about it. Perhaps some of that structure was something that you did not or could not directly measure, but instead represented with a latent variable.

I am trying to be as general as possible here, so I am not, at this stage, claiming that the latent has great explanatory power. It’s just a statistical property, a way of collapsing down or compressing the observed data.

This most general form of a latent variable is not very interesting for our purposes. At this point it’s not really easier for our minds to hold onto. Looking at this kind of latent feels like reading a gzipped file, or I guess like looking at whatever structure was abstracted out when you did the gzipping. It’s not that intuitive… yet.

Which brings us to…

Intuitive Latents

So. What kind of latent would be interesting for our purposes?

Well, suppose a latent does have explanatory power. It seems particularly easy to think or reason about.

(This is mostly a statement about you and your brain and what you’ve already got going on in there; “intuitive” here means “easy for a human to reach for.” This is explicitly a bid for you to use your own experience being an agent to understand the sort of latents we mean.)

What kinds of things do humans easily think about?

Well, start by looking at a two-year-old. What kinds of concepts are so intuitive that a toddler is able to locate them?

A toddler takes a vast stream of sensory data coming into her eyes and ears and manages to find objects and classes of objects in the world, and now (appears to) have such concepts as “ball” and “dog.”

These things might seem kind of funny as “latents” because this is not the way we usually talk about latents in statistics!

A very typical first example of a latent in statistics is “intelligence.” We can’t directly measure it, we have only proxies for it, and we don’t even completely agree with each other about what it is. But most of us think intelligence is something, and if we try to sort people by how “intelligent” they are, the orderings we come up with are highly correlated with each other. There is something going on there.

In some data sets, if we create a causal model containing a latent called “intelligence,” we find that latent to be a good mediator over a variety of observables, and we find that the latent correlates well with some of our proxy-measures.

So… what is up with calling a “ball” an intuitive latent? That’s weird, we can see a ball, it’s right there! We can observe it. Doesn’t that make it an observable?

Well, for one thing, we need to distinguish between one particular ball and the concept of a ball in general.

The concept of a ball in general is not observable! It’s a collection of approximate ball-facts, like “balls are mostly pretty spherical” and “if you throw a ball, the whole thing will fly through the air as a unit” and “when it lands, it might squish a little, and then it will probably bounce or roll.” So this begins to seem much more abstract and latent-like already. You can’t see or hold the Platonic ideal of a ball, but you can think about it, and you can use it to reason and predict.

And even any particular instance of a ball is still not directly observable! What you have is a bunch of visual and maybe tactile sensory data that you have cobbled together in your brain and modeled as “this specific ball.” The idea you hold in your mind is wildly compressed down from the raw sensory data and also from the actual physical reality of the ball’s atoms.

In this way, the idea of “balls in general” and of “this ball in particular” are both abstractions. We’re not tracking all the atoms in the ball, their chemical properties that keep them bound together, and so on. We’ve plucked the ball-object out of the low-level physics and modeled it at a much higher level. We inferred the ball and its characteristics, based on our experience with other similar objects, or even with this specific object in the past, that were associated with similar sensory data.

But look – even though it was complicated, pretty much all humans do exactly the same thing. We pretty much all have a concept of “balls in general” and it sure seems like those concepts all pretty much match. When almost any two humans look at a specific ball they mentally pluck it out of the environment and map it to the “balls in general” concept and reason about it similarly.

So this latent is intuitive. It’s easy for a human brain to grab onto.

Intuitive latents are a proper subset of latents in general. There are plenty of ways to compress data that don’t really correspond to anything you or I would recognize as a useful, easy-to-grab-onto concept.

Intuitive latents are, at the very least, good for humans to talk to each other. Humans broadly share the same cognitive structure, environmental conditions, and training regimens, and as a result we find a lot of the same things intuitive.
We don’t yet know exactly how well this generalizes to other kinds of agents, or where it breaks down.

Natural Latents

John and David have introduced a specific mathematically-defined kind of latent called a natural latent.

The definition of a natural latent quickly gets technical. The two big important parts are that a natural latent is both a mediator and a redund over the parts of a system it’s summarizing or compressing. This means (extremely roughly) that a natural latent is “just right” at capturing the structure in the data; it’s neither too specific nor too general.

Their go-to example of a natural latent is the temperature of a volume of an ideal gas. We don’t know where all the particles are, but we know ~everything important about that volume of gas if we know how hot it is.

John and David explain the concepts and the math in much greater detail elsewhere.

The key thing to know for our purposes right now is that it’s kind of finicky to find natural latents because the requirements on them are pretty stringent, but if you can find one, it’s going to be a really good latent. For reasons described elsewhere, we expect natural latents to work for all minds (given that they are observing / operating over the same or similar environments), not just human minds.

We expect that natural latents are a subset of intuitive latents.

Factorization

Factorization is about how minds store multiple concepts at the same time and how the various concept-properties are separated out.

Half the factorization story: overlapping concepts

When I talked about “balls in general” earlier, I mentioned that if you throw a ball, it typically moves through the air as a unit.

But that’s not just a property of balls, that’s a property of lots of objects. So does it make sense to store that property on the “balls in general” concept? Probably not.

And what about oranges? Are they a kind of ball or a kind of fruit?

It’s easy and fun to make up taxonomies or ontologies or whatever – but remember, you don’t get to decide the ontology. We’re mainly interested in whatever reality dictates about (the compressibility of) structure in the environment, because that’s what we’re likely to find when we look into other agents’ minds.

Maybe reality has quite a lot to say about factorization – because certain factorizations are strongly favorable for compression – or maybe every mind is just freestyling. Maybe factorization is sort of a decorative design choice. We don’t know much about this.

Another half of the factorization story: latent representation

I said before that a good model with the right latents in it helps you build and store a compressed, but still predictive, model of the world.

Here’s the thing though: compressed data is harder to read than expanded data. Going back to our gzipped file example, even if the structure of the file is very intuitive, it’s a bit tricky to read off the contents of the file while it’s compressed, especially if you’re only interested in a little bit of the file.

That’s what it’s like when you look inside a neural net. It’s hard to pick out the structures you’re interested in, because they don’t look like nice neat XML data telling you which concepts the net was thinking about when you peeked. It looks like… strings of bits. And sure, if it’s an image net or something, you can run an image of a ball through it and look for “ball” activations, but it gets hairy fast if there are a lot of different things in the picture, and/or if the things are best represented by sets of overlapping concepts.

It is possible that there is a third half^[4] to the factorization story, or worse. We’re not far enough along to be sure.

Factorization greatly complicates empirical work on latents.

Latents are particular to an environment

All of the types of latents are particular to an environment! To see this, go back to the overall statistical structure.

The agent exists in an environment, and it has observations of that environment. We model the agent as creating a model of the environment. We say the agent’s model is good if it compresses the observed data well. That model may contain latent variables.

If the environment changes, then the agent will need a new model with new latents in order to compress the new observed data well.

I’m belaboring this point because I want to emphasize that, if aliens from the planet Zorbthraxx land on Earth tomorrow, they might not already have the same concepts in their minds that we do, even if our concepts are natural latents in the Earth environment. However, if they take a look around on Earth, using sensory organs that pick up some of the same data streams as ours, we expect them to converge on some of the same natural latents.

Similarly, let us assume for the moment that cutting edge LLMs are sufficiently intelligent to model an environment, generalize over data, and form causal models of that environment that fit the data well. We have to ask – what environment were they observing?

And the answer today is that LLMs were mostly observing a gigantic corpus of human-generated text data, RLHF data, and so on. They were mostly not observing the physical world directly. Their environment differs substantially from ours. This will be important later when we talk about empirical work on natural abstractions.

^{^}
Earlier in my study of John’s work I was pretty frustrated that he’d just go ahead and model people as Bayesian without any justification. The more I look into it, though, the more I find this to be a reasonable choice (I’ll say more about why in a moment).
^{^}
It’s not just a John thing. In the case of human brains, see also Friston’s free energy principle.
^{^}
I spent considerable time looking into alternative ways of modeling agentic minds and was surprised to realize that probabilistic models really are the main game in town.
If you’re looking at very basic agents you can sometimes get away with simple, hard-wired control systems. There are also a few exotic mathematical alternatives that are beyond the scope of this article (and frankly, my current level of understanding). There’s Richard Ngo’s article advocating for fuzzy truth-values in epistemology (and therefore, probably in useful modeling of intelligent agents), which I will not further discuss here because IMO John’s comments on that article already adequately summarize how he handles Richard’s concerns while remaining in a Bayesian frame.
Everything else I looked into was not meaningfully different or better than probabilistic models, which surprised me. If you think you can name a serious contender for an alternative framework, please comment, I’d love to hear about it. Also, if there’s interest in further exploration of (sorta-)alternatives to probabilistic models, that could perhaps merit its own separate post, let me know.
^{^}
Lazily-evaluated concepts come to mind, for example.

37