Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Knowledge is not just precipitation of action

Financial status: This is independent research. I welcome financial support to make further posts like this possible.

Epistemic status: This is in-progress thinking.

This post is part of a sequence on the accumulation of knowledge. Our goal is to articulate what it means for knowledge to accumulate within a physical system.

The challenge is this: given a closed physical system, if I point to a region and tell you that knowledge is accumulating in this region, how would you test my claim? What are the physical characteristics of the accumulation of knowledge? What is it, exactly, about an artifact inscribed with instructions for building advanced technology that makes it so different from an ordinary rock, or from a video camera that has been travelling the cosmos recording data since the beginning of the universe? We are looking for a definition of knowledge at the level of physics.

Our goal is to articulate what it means for knowledge to accumulate within a physical system. The previous post looked at mutual information between high- and low-level configurations of a digital abstraction layer as a possible definition of knowledge and found that mutual information did not differentiate raw sensor data from useful models derived from that sensor data.

In this post we will consider a definition of knowledge as that which precipitates effective goal-directed action. That is whenever we see some entity taking actions that are effective and goal-directed, we could conclude that knowledge exists. This is, after all, the informal goalpost that we have been comparing each previous definition of knowledge to. Rather than seeking a separate definition of knowledge and comparing it to this goalpost, this post will look at ways that we might make this informal definition formal.

Example: Satellite tracker

Consider a computer scanning the sky for a satellite in order to transmit some information to it. The computer will scan the sky looking for transmissions on certain radio frequencies, and will integrate each of these noisy observations over time into an estimate of the satellite’s position. The computer only has enough power to transmit the information once, so it’s important that it locks onto the satellite’s true position before it transmits.

Initially, the computer has no knowledge of the satellite’s position and is approximately equally likely to find it in any of its possible positions:

The X axis of the graph above is the true position of the satellite, and the Y axis is the performance that we expect the system to eventually achieve. In this example the computer has a pretty good apriori chance of locking onto the satellite’s position no matter where the satellite starts out.

But as the computer receives observations of the satellites and builds up a model of its position, the configuration of the computer changes in such a way that its performance will be poor if the satellite is not where the computer thinks it is:

At the moment just before the computer transmits its one-time message to the satellite, the configuration of the computer is such that its performance is extremely sensitive to the actual position of the satellite.

This is the counter-intuitive thing about knowledge: as the computer accumulates knowledge in anticipation of taking an action, the computer actually becomes increasingly sensitive to the configuration of its environment. The computer in this example is "placing its bets" in a more and more irreversible way, and this makes sense because the computer has to eventually "take its shot" by taking an irreversible action whose consequences depend on the system being in the configuration that it thinks it is. You can’t take your shot without committing. Perhaps knowledge accumulation is about a kind of gradual commitment to more and more specific possible world states, as measured in anticipated future performance.

If we have made good choices in setting up our mechanism of knowledge accumulation, then the world configurations that we can expect the entity to perform well in will precisely match the configuration that the world is actually in. That is: the entity’s knowledge matches the truth.

But even so, it seems that knowledge is necessarily associated with a decrease in robustness to possible world configurations. This is the opposite of the intuitive notion of knowledge as something that increases flexibility. The following example investigates this further.

Example: Learning how to use a search engine

Suppose I grow up in an isolated community where all technology is banned. Emerging into the modern world as a young adult, I find myself struggling to find food, housing, and health care. One day, a stranger teaches me how to use a search engine on a computer. This knowledge greatly increases my capacity to navigate the modern world.

In this example, my "anticipated performance" has increased over a wide variety of "ways the world could be". However, it has decreased in all the worlds where search engines exist but provide incorrect information. In terms of physical configurations of the world, there are actually far more ways for search engines to exist and provide incorrect information than there are ways for them to exist and provide correct information. It is not by pure chance, of course, that the world is in one of the astronomically tiny possible configurations where search engines exist and provide accurate information, but by acquiring knowledge about search engines it is the case that my capacity to get by in the world has become tightly coupled to worlds where search engines provide accurate information.

I am not making a point here about the dangers of depending on technology in general. It is not that we anticipate the search engine becoming unavailable in the future and therefore shouldn’t depend on it. It’s that even today, even while the search engine is available and providing accurate information, my "anticipated performance" as evaluated over all possible world configurations has gone from being relatively flat along the dimension corresponding to search engine accuracy, to highly peaked. This is not a bad thing when our knowledge is accurate and so the way the world actually is corresponds to the peak in the anticipated performance graph. Perhaps we can define knowledge as the extent to which the anticipated performance of a system is sensitive to the configuration of its environment. The next example will sharpen up this definition.

Example: Go-karting

Consider a go-kart being driven by a computer system that uses a camera to estimate the shape of the turn and plans its turn accordingly. It is approaching a turn:

If we were to press "freeze" on the world just as the go-kart was entering the right turn and swap out the right turn for a left turn then computer system would execute the turn poorly, since it takes a few observations for the computer to realize that the world has unexpected changed on it, and by that time it will already have partially committed to a right turn. In particular, it would execute the left turn more poorly than it would if it approached the left turn in an ordinary way, without it being swapped out at the last minute.

Due to this counterfactual inflexibility, we say that the go-kart had accumulated some knowledge about the turn.

Consider now a go-kart that is driven by a simpler computer system that drives in a straight line until it bumps into a wall, then makes a small correction and keeps going, and makes its way around the course in this manner. This go-kart does not build up a model of the world. Its behavior is a reaction to its immediate observations. Now suppose that this go-kart is entering a right turn, and we again freeze the world and swap out the right turn for a left turn. How will this go-kart perform in this left turn, compared to how it would have performed if it had approached the left in an ordinary way? This go-kart will do exactly the same thing in both cases. If it had approached the left turn from the start, it would have driven in a straight line until it bumped into a wall and then made a correction, and in the case that we switched out the right turn for a left turn at the last minute, it also would drive in a straight line until it bumps into a wall.

This is not an argument that the naive go-kart has any actual advantage over the sophisticated go-kart. We do not expect right turns to magically change into left turns in the real world. If we had a yet more sophisticated go-kart that was designed to deal with spuriously shape-shifting race courses then we could still apply this trick to it, so long as it was acting on any model of its environment that required time to accumulate.

So perhaps we can view the accumulation of knowledge in this way: insert an agent into an environment, then, at a certain time, "kidnap" the agent and insert it into some other environment and compare its performance there to how it would have performed if it had been in that other environment from the start. Do this for many environments similar to the original environment. The extent to which the kidnapped agent’s performance is systematically worse than its performance in the same environment having been located there from the start is the extent to which the agent can be said to have accumulated knowledge of its environment.

Example: Go-kart with data recorder

Consider now a go-kart that once again drives straight until it bumps into a wall, but modified to collect data from a variety of sensors and record all sensor data to an internal hard drive. This is the case that foiled our previous attempt to define the accumulation of knowledge, since this data-recording go-kart has very high mutual information with its environment — higher, even, than the "intelligent" go-kart that models and anticipates turns as they approach. But our definition of the accumulation of knowledge as a sensitivity of anticipated performance to the configuration of this environment handles this case, since we can "kidnap" this data-recording go-kart out of the right turn and into a left turn and see that its behavior is the same as if it had approached the left turn from the start, and hence under this definition we conclude that the data-recording go-kart does not have as much knowledge of its environment as the intelligent go-kart that anticipates and plans for turns, which matches our intuitions.

Counterexample: Sailing ship

Unfortunately this new definition fails on the first example from the first post in this sequence: a sailing ship making cartographic maps of a coastline. We might imagine that an accurate map is constructed, but that the ship later sinks and the map is never shared with anyone, so there are no actions taken by anyone that are dependent on the map. Therefore we would not be able to discern the accumulation of knowledge happening aboard the ship by examining the ship’s counterfactual performance on some task, since there are no tasks being undertaken in a way that uses the knowledge that is accumulating.

Yet it does not seem quite right to say that there is no knowledge accumulating here. We could board the ship and see an accurate map being drawn. It would be strange to deny that this map constitutes knowledge simply because it wasn’t later used for some instrumental purpose.


Knowledge appears to bind an agent’s actions to some particular configurations of the world. The ideas in this post seem like helpful signposts on the way to a definition of the accumulation of knowledge, but it does not seem that we are all the way there. We ought to be able to recognize knowledge that is accumulated but never used, or if this is impossible then we should understand why that is.

New Comment
6 comments, sorted by Click to highlight new comments since:

Here's a similarly-motivated model which I have found useful for the knowledge of economic agents.

Rather than imagining that agents choose their actions as a function of their information (which is the usual picture), imagine that agents can choose their action for every world-state. For instance, if I'm a medieval smith, I might want to treat my iron differently depending on its composition.

In economic models, it's normal to include lots of constraints on agents' choices - like a budget constraint, or a constraint that our medieval smith cannot produce more than n plows per unit of iron. With agents choosing their actions in every world, we can introduce information as just another constraint: if I don't have information distinguishing two worlds, then I am constrained to take the same action in those two worlds. If the medieval smith cannot distinguish iron with two different compositions, then the action taken in those two worlds must be the same.

One interesting feature of this model is that "knowledge goods" can be modeled quite naturally. In our smith example: if someone hands the smith a piece of paper which has different symbols written on it in worlds where the iron has different composition, and the smith can take different actions depending on what the paper says, then the smith can use that to take different actions in worlds where the iron has different composition.

Do you want to chat sometime about this?

I think it's pretty clear why we think of the map-making sailboat as "having knowledge" even if it sinks, and it's because our own model of the world expects maps to be legible to agents in the environment, and so we lump them into "knowledge" even before actually seeing someone use any particular map. You could try to predict this legibility part of how we think of knowledge from the atomic positions of the item itself, but you're going to get weird edge cases unless you actually make a intentional-stance-level model of the surrounding environment to see who might read the map.

EDIT: I mean, the interesting thing about this to me is then asking the question of what this means about how granular to be when thinking about knowledge (and similar things).

So, your proposed definition of knowledge is information that pays rent in the form of anticipated experiences?

Well most certainly yes, but what does that actually look like at the level of physics? How do I determine the extent to which my robot vacuum is forming beliefs that pay rent in the form of anticipated experiences? And most importantly, what if I don't trust it to answer questions truthfully and so don't want to rely on its standard input/output channels?

To me it seems useful to distinguish two different senses of 'containing knowledge', and that some of your examples implicitly assume different senses. Sense 1: How much knowledge a region contains, regardless of whether an agent in fact has access to it (This is the sense in which the sunken map does contain knowledge) and 2. How much knowledge a region contains and how easily a given agent can physically get information about the relevant state of the region in order to 'extract' the knowledge it contains (This is the sense in which the go-kart with a data recorder does not contain a lot of knowledge). 

If we don't make this distinction, it seems like either both or neither of the sunken map and go kart with data recorder examples should be said to contain knowledge. You make an argument that the sunken map should count as containing knowledge, but it seems like we could apply the same reasoning to the go-kart with data recorder:

"We could board the ship and see an accurate map being drawn. It would be strange to deny that this map constitutes knowledge simply because it wasn’t later used for some instrumental purpose."


"We could retrieve the data recorder and see accurate sensor recordings being made. It would be strange to deny that this data recorder constitutes knowledge simply because it wasn't later used for some instrumental purpose."

Though there does seem to be a separate quantitative distinction between these two cases, which is something like "Once you know the configuration of the region in question (map or data recorder), how much computation do you have to do in order to be able to use it for improving your decisions about what turns to make." (Map has lower computation needed, data recorder has more as you need to compute the track shape from the sensor data). But this 'amount of computation' distinction is different to the distinction you make about 'is it used for an instrumental purpose'. 

Interesting sequence so far! 

Could we try like an 'agent relative' definition of knowledge accumulation?

e.g. Knowledge about X (e.g. the shape of the coastline) is accumulating in region R (e.g. the parchment) accessibly for an agent A (e.g. a human navigator) to the extent that agent A is able to condition its behaviour on X by observing R and not X directly. (This is borrowing from the Cartesian Frames definition of an 'observable' being something the agent can condition on).


If we want to break this down to lower level concepts than 'agents' and 'conditioning behaviour' and 'observing', we could say something roughly  like (though this is much more unwieldy):

 is some feature of the system (e.g. shape of coastline). 

 is some region of the system (e.g. the parchment). 

 is some entity in the system which can 'behave' in different ways (over time) (e.g. the helmsman who can turn the ship's wheel over time ('over time' in the sense that they don't just have single the option to 'turn right' or 'turn left' once, rather they have the option to 'turn right for thirty minutes, then turn left for twenty minutes, then...' or some other trajectory)

Definition for 'conditioning on': We say  is 'conditioning on'  if: changing  causes a change in 's behaviour (i.e. if we perturb (e.g. change the map) then  changes (e.g. the steering changes).) So just a Pearlian notion of causality I think.

An intermediate concept: We say  is 'utilising the knowledge in R about X' if: 1. A is conditioning on R (e.g. the helmsman is condition their steering on the content of the parchment) and 2. There exists some basin of attraction B which goes to some target set T (e.g. B is some wide range of ways the world can be, and T is 'the ship ends up at this village by this time') and if A were not conditioning on R then B would be smaller (if the helmsman were not steering according to the map then they would only end up at the village on time in far fewer worlds), and 3. If A were to also condition on X, this would not expand B much (e.g. seeing the shape of the coastline once you can already read the map doesn't help you much), but 4. IF A were not conditioning on R, then conditioning on X would expand B a lot more (e.g. if you couldn't steer by the map, then seeing the shape of the coastline would help you a lot).  (You could also put all this in terms of utility functions instead of target sets I reckon, but the target set approach seemed easier for this sketch).

So we've defined what it means for A to 'utilise the knowledge in R about X', but what we really want is to say what it means for A to be able to utilise knowledge in X about R, because when A is able to utilise knowledge in X about R, we can say that R contains knowledges about X accesibly for A. (e.g. if the map is not on the ship, the helmsman will not be utilising its knowledge, but in some sense they 'could' and thus we would still say the map contains the knowledge)

But now I find that it's far past my bedtime and I'm too sleepy to work out this final step haha! Maybe it's something like that R contains knowledge about X accessibly to R 'if we can, without much change to R or A, cause A to utilise the knowledge in R about X' (e.g. just by moving the map onto the ship, and not changing anything else, we can cause the helmsman to utilise the knowledge in the map). Though a clear problem here is: what if A is not 'trying' to achieve a goal that requires the knowledge on the map? (e.g. if helmsman were on the other side of the world trying to navigate somewhere else there, then they wouldn't utilise the knowledge in this map because it wouldnt be relevant). In this case it seems we cant cant A to utilise the knowledge in R about X 'without much change to R or A'-- we would need to change A to change A's goal to make it utilise the knowledge in R. Hmm.....

One thing I like about this approach is that when R does have information about X but it's not in a very 'action ready' or 'easily usable' form (e.g. if R is a  disk of 10,000 hours of video taken by ships, which you could use to eventually work out the shape of the coastline) then I think this approach would say that R does contain knowledge about X (accessibly to A) to some degree but less so than if it just directly gave the shape of the coastline. What makes this approach say this? Because in the "10,000 hours of footage" case, the agent is less able to condition its behaviour on X by observing R (which is the 'definition' of knowledge under this approach)-- because A has to first do all the work of watching through the footage and extracting/calculating the relevant knowledge before it can use it, and so therefore in all that time when it is doing this processing it cannot yet condition its behaviour on X by observing R, so overall over time its behaviour is 'less conditioned' on X via R.

Anyway curious to hear your thoughts about this approach, I might get to finish filling it out another time!