Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Financial status: This is independent research, now supported by a grant. I welcome financial support.

Epistemic status: I’m 85% sure that this post faithfully relates the content of the paper that it reviews.


David Wolpert has written a paper entitled constraints on physical reality arising from a formalization of knowledge. The paper is interesting to me because it provides a formalization of knowledge that presupposes no agent boundaries, and makes almost no assumptions about the representation of knowledge or the structure of the entities using it, yet is strong enough to derive some quite interesting impossibility results. I think his approach is worth understanding because it is, to my eyes, both novel and helpful, although I don’t expect the specific definitions he gives to provide any "off the shelf" resolution to questions in AI alignment.

Wolpert begins with the notion of a universe, and then supposes that if there is an entity anywhere within the universe that can infer facts about that universe, then there ought to be some universes where that entity entertains questions about some property of the universe in which it is embedded, and in those universes, such an entity ought to later be configured in a way that corresponds to the truth about the question that was entertained.

It is not required that we can look at a particular universe and identify which specific question is being entertained at which time by which entity, nor that we can identify boundaries between "entities" at all. All that is required is that if something resembling inference is happening somewhere within the universe, then there exists a function from universes to questions, a function from universes to answers, and a function from universes to the actual value of some property such that a particular relationship holds between these three functions.

Wolpert’s contention is that his formalization captures what it means for inference to be happening. He does not make this contention explicit, though the contention is certainly required to make his framework interesting, since the conclusions he wishes to draw are things concerning what can or cannot be known by intelligent entities within a universe. If the underlying contention is false, and his mathematical structure does not capture what it means for inference to be happening, then all his theorems still go through, but the conclusions are just statements of mathematics with nothing really to say about the nature of inference, knowledge, or intelligence.

I will now attempt to explain Wolpert’s framework through an example. I will begin with a specific instantiation of his framework, and then describe the general case.

Example: robot vacuum

Let us consider universes in which there is a single room of some particular shape and size, and a robot vacuum that starts in some particular position and orientation. Here are three possible floorplans and robot poses:

Let us assume that the robot vacuum is programmed with some deterministic movement algorithm designed to clean the floor, and that in each universe the robot moves about for ten minutes and then that is the end of that universe. When I refer to "a universe" I am referring not to a single time-slice, but to an entire ten minute universe history, and that includes the evolving internal state of the robot’s computer. For the remainder of this section, the set will refer to all possible universes with this particular structure (floorplan plus robot vacuum evolving for exactly ten minutes).

Here are three trajectories:

Now the robot vacuum is equipped with a laser emitter and receiver that it uses to measure the distance to the nearest obstacle in front of it, and its movement algorithm uses these measurements to decide when to make turns. Intuitively, we would say that the robot is able to infer the distance to the nearest obstacle, but making such a notion precise has proven surprisingly difficult in past attempts. We are going to set things up according to Wolpert’s formalization of physical inference and compare the results to common sense.

Let’s suppose that within the robot vacuum’s code, there is a function that is used to query whether the distance to the nearest obstacle is within four possible ranges, say 0-1 inches, 1-3 inches, 3-10 inches, or more than 10 inches. Let’s suppose that this function takes one input indicating which of the four distance ranges is to be queried, and produces the output YES if the robot’s sensors indicate that the distance to the nearest obstacle is within that range, or NO otherwise. Let’s suppose that the function works by switching on the laser emitter, then reading off a measurement from the laser receiver, then doing some calculations to determine whether that measurement is consistent with the queried distance range. I will call this sequence of events a "distance query". Let’s suppose that such distance queries are performed reasonably frequently as part of the robot vacuum’s ordinary movement algorithm. This is all happening inside the ordinary evolution of a universe.

Now we are going to step outside of the universe and construct some mathematical objects with which to analyze such universes according to Wolpert’s definition of inference.

First, define a question function over the set of universes as follows. Given a universe , find the last time before the end of the universe that a distance was queried in that universe, and let the output of the function be the query that was run. If no distance was ever queried in that universe then let the output be a special null value. So this is a function that maps universes to one of five possible values corresponding to the four possible distance queries plus a special value if no distance was ever queried.

Next, define an answer function over the set of universes as follows. Given a universe , find the last time before the end of the universe that a distance was queried in that universe, and let the output be 1 if the result of that query was YES, or -1 otherwise. If no distance was ever queried then let the output be -1. This leaves an ambiguity between an answer of NO and the case where no distance was ever queried, but this is unimportant.

Finally, define a reference function over the set of universes as follows. Given a universe , find the last time before the end of the universe that a distance was queried in that universe, and let the output be the 1 if the distance between the robot and the nearest obstacle at that time was between 0 and 1 inches, 2 if the distance between the robot and the nearest obstacle at that time was between 1 and 3 inches, 3 if the distance between the robot and the nearest obstacle at that time was 3 and 10 inches, and 4 otherwise.

So in this example and are functions over universes that examine the robot's internal state, and is a function over universes that examines the actual distance between the robot and the nearest obstacle. In general , , and can be any function whatsoever over universes, although must have image . There is no need for there to be an obvious agent like there is in this example. The important part is what it means for an inference device to infer a function.

An inference device is a pair of functions over universes, and Wolpert says that such a device weakly infers a particular function over universes, if for each possible value that function can take on (there are four in our example), there is a particular question that can be asked such that for all universes where that question is the one that is asked, the answer given is 1 if the function under consideration has the particular value under consider in the universe under consideration, or -1 otherwise. More formally:

An inference device over a set is a pair of functions , both with domain , and with surjective onto .

Let be a function over . An inference device weakly infers iff for all in the image of , there is some in the image of such that for all universes in that have , is 1 if , or -1 otherwise.

There is no structure at all to the universes in this definition, not even a configuration that evolves over time. The only structure is given by the functions , , and , which partition the universes according to which question was asked, which answer was given, and the true value of a certain property. The definition of weak inference is a relationship among these partitions.

Note that it is okay for there to be universes in which no question at all is asked.

Example: human inferring weather

The motivating example that Wolpert gives in the paper consists of a human who is inferring the weather at a particular place and time. At a certain time , Jim is thinking about the question "will there be a cloud in the sky over London tomorrow at noon?", then at some later time before noon , Jim is thinking either "yes" or "no", then at time there is either a cloud in the sky over London or not. We set up the functions , , and to pick out the relevant quantities at the relevant times, and then we apply the definition of weak inference to understand what it means for Jim to infer the weather over London at noon.

This example is a particular form of inference that we might call prediction, because . The definition of weak inference doesn’t require a universe to have any time-based structure at all, much less for questions, answers, and reference measurements to be given in a certain time order, so we could also consider examples in which , which we might call observation because the question is posed first, then the event transpires, then an answer is given. We could also consider , which we might call remembering because the event transpires first, then the question is posed, then the answer is given. Wolpert’s framework gives a single definition of what it means to predict correctly, observe correctly, and remember correctly.

Of course there might be many universes in which there is no "Jim", or where Jim is not entertaining any such question at , or is not thinking "yes" or "no" at time . This is fine. All that weak inference requires is that for each possible value of the reference function there is a question such that for all universes where that question is the one picked out by , the answer picked out by is "yes" if the reference function does in fact evaluate to the value under consideration in universe under consideration, or "no" otherwise.

There might be universes in which a question is entertained at but no answer is entertained at . In this case we can simply define to only pick out a question in universes in which an answer is also given by , and similarly for to only pick out an answer in universes where a question is given by . We might ask whether this flexibility in choice of and might allow us to "cheat" and say that inference is happening when it is not. I will discuss this below.

Example: language models

We might consider a language model such as GPT-3 as an inference device. The question space would be the set of all possible prompts that can be given to GPT-3, and we might take the answer to be "yes" if the first word in the response is "yes", and "no" otherwise. We could then ask which properties of the world GPT-3 is capable of inferring. This wouldn’t require that GPT-3 answer all possible questions correctly, only that there is some question that it always answers correctly with respect to each possible value of a reference function.

In order to apply Wolpert’s notion of weak inference in this way, we need to imagine some range of possible universes, each of which has a separate GPT-3 trained within it. Since the universes differ in certain ways, the training data supplied to GPT-3 in each universe will be different, so the model may give different answers to the same question in different universes. We cannot apply Wolpert’s notion of weak inference within a single universe.

This makes sense because the whole reason that we are impressed by language models such as GPT-3 is that it seems that the training method is somewhat universe-agnostic. When given a prompt such as "flying time from London to Paris is", GPT-3 gives an answer that (we assume) would have been different if the flying time from London to Paris was in fact different to what it is in our universe.

Impossibility results

We have so far considered a single inference device and a single reference function . It is also possible that a single inference device might weakly infer multiple reference functions. This would mean that the question space is sufficiently broad that questions can be entertained about each possible value of many reference functions. For example, we might consider reference functions corresponding to various possible temperatures in London at noon, and various possible wind speeds in London at noon. In any one universe, only one question will ever be picked out by , and that question might concern either the temperature or the wind speed. The answer picked out by must be either yes or no in order for to satisfy the definition of an inference device, so only one answer is ever given by a particular inference device, which means that questions cannot ask for answers to multiple sub-questions.

We might ask whether it is possible to have an omniscient inference device to which questions about any reference function whatsoever can be posed, and which will always give the correct answer. Wolpert gives a simple proof that no such inference device can exist. The proof simply notes that you could take the inference device’s own answer function, which can take on two values ("yes" or "no"), and ask whether the output of that function is "no". In this case we are asking a question like "will you answer no to this question?" A correct answer cannot be given.

Wolpert gives a second impossibility result concerning two inference devices. He shows that it is not possible for there to be two inference devices that mutually infer each others’ output functions. The formal statement of this requires a notion of distinguishability between inference devices:

Two devices and are distinguishable if for each pair of questions in the image of and in the image of , there is a universe in which and .

Given this definition, Wolpert proves that no two distinguishable inference devices can mutually infer each others’ answer function. The proof goes as follows. Suppose we have two inference devices named "Alice" and "Bob". Let us take their answer functions and divide all possible universes into four groups according to the answers Alice and Bob give:

Now we wish to know whether it is possible for any such pair of inference devices to weakly infer each other’s answer functions, so we are going to ask questions to Bob and Alice concerning the answer given by their partner, and we are allowed to ask about either of the two possible answers ("yes" or "no"). Suppose that we ask Alice the question "is Bob’s answer ‘no’?" and we ask Bob the question "is Alice’s answer ‘yes’?’ Now there are only four pairs of answers that any pair of inference devices can give, and these correspond to the regions labelled 1 through 4 in the figure above:

  • In region 1, Alice’s answer is "yes" and Bob’s answer is "no", in which case Bob is incorrect.

  • In region 2, Alice’s answer is "yes" and Bob’s answer is "yes", in which case Alice is incorrect.

  • In region 3, Alice’s answer is "no" and Bob’s answer is "yes", in which case both are incorrect.

  • In region 4, Alice’s answer is "no" and Bob’s answer is "no", in which case both are incorrect.

Hence no two inference devices can mutually infer one another’s answer function.

Strong inference, stochastic inference, and physical knowledge

The ideas that I have discussed so far occupy just the first quarter of the paper. Wolpert goes on to define strong inference, stochastic inference, and something that he calls physical knowledge. Strong inference concerns inference of input/output maps, which differs from the way that weak inference is about inferring one value at a time. A device that strongly infers another device can predict the output of the other device for any input. Stochastic inference introduces a way to talk about the accuracy with which a device infers a reference function, which relies on a probability distribution over universes. Physical knowledge is weak inference with the added constraint that the device must answer "yes" in at least one universe and "no" in at least one universe for each possible question.

Overall I think that there is a central idea about how to define inference in the paper, and I think that the definition of weak inference contains that idea, so I am going to examine that concept alone. I will give an example below in which weak inference fails to capture common sense notions of inference. I do not think any of the extensions Wolpert proposes meaningfully mitigate these concerns.

Discussion

I would like to know what some policy produced by machine learning does or does not know about. If my robot vacuum is unexpectedly tracking, for example, whether I am sleepy or alert, then I’d like to switch it off and work out what is going on. My robot vacuum should be concerned with floorplans and carpet cleanliness alone.

But merely having information about facets of the world beyond floorplans and carpet cleanliness is not yet a reason for concern. If my robot vacuum has a camera, then the images it records will almost certainly have mutual information with my level of sleepiness or alertness whenever I am within view, but this does not concern me unless my robot vacuum is explicitly modelling my state of mind. What does it mean for one thing to explicitly model another thing? That is the basic question of this whole investigation into what I am calling "knowledge". What would it look like if we took Wolpert’s notion of weak inference as the condition under which one thing would be said to be modelling another thing?

I think weak inference is probably a necessary but not sufficient condition for knowledge. The reason is that the functions and can contain a lot of intelligence and can therefore be set up to make a system that is really just recording raw data appear as if it is inferring some high-level property of the world. Here is an example:

Suppose I have installed a video camera in my house that records 10 seconds of video footage in a rotating buffer. I will now show that this video camera weakly infers my level of sleepiness or alertness. Take the reference function to be my true level of sleepiness expressed as a number between 0 and 255. Take the question function to be the top-left pixel in the image at noon on some particular day, which we will interpret as asking "is my sleepiness level exactly x" where x is the intensity of that pixel. Now suppose that we have, outside of the universe, constructed a model that does in fact estimate my level of sleepiness from an image containing me. Call this external "cheating" function and suppose that it works perfectly. Then define as follow:

  • If the intensity of the top left-pixel in the image at noon equals the level of sleepiness predicted by then output 1

  • Otherwise, output -1

The pair now weakly infers because for each sleepiness level between 0 and 255, there is a "question that can be asked" such that for all universes where that is the "question that has been asked", the "answer given" is 1 if equals the sleepiness level under consideration in the universe under consideration, or -1 otherwise. Two things are going on here:

  1. The answer function inspects the same properties of the universe that the question function inspects and in this way can produce answers that depend on the question, even though we are using a single answer function for all 256 possible questions.

  2. The answer function contains a model of the relationship between images and sleepiness, but it is exactly this kind of model that we are trying to detect inside the universe. This gives us false positives.

The main results that Wolpert derives from his framework are impossibility results concerning the limits of inference within physical systems. That his definitions provide necessary but, it seems to me, not sufficient conditions for what common sense would describe as "inference" is therefore not much of a criticism of the paper, since the impossibility of a necessary condition for a thing implies the impossibility of the thing itself, and in general obtaining a result in mathematics using weaker starting assumptions is of value since the result can be applied to a greater variety of mathematical structures.

Our goal, on the other hand, is to positively identify deception in intelligent systems, and our main impediment to a useful definition is not false negatives but false positives, and so one more necessary condition for knowledge is not of such great value.

A knowledge computation barrier?

Many proposed definitions of knowledge fail to differentiate knowledge from information. Knowledge can be created from information by a purely computational process, so any definition of knowledge that is not sensitive to computation cannot differentiate information from knowledge. Since this has been the downfall of several definitions of knowledge, I will give it the following name:

(Knowledge computation barrier) Suppose that a definition of knowledge claims that knowledge is present within some system. Consider now the ways that this knowledge, wherever it is claimed to exist, might be transformed into different representations via computation. If the definition claims that knowledge is still present under all such transformations, then the definition fails to capture what knowledge is.

To see why this is true, consider a system in which data is being recorded somewhere. We would not, on its own, call this knowledge because the mere recording of data gives us no reason to expect the subsystem doing the recording to have outsized influence over the system as a whole. But now consider the same system modified so that a model is built from the data that is being recorded. This system differs only in that a certain lossy transformation of the recorded data is being applied, yet in many cases we would now identify knowledge as present within the subsystem. Model-building is a purely computational process, and it almost always reduces information content compared to raw data. Any definition of knowledge that is not sensitive to computational transformations, therefore, will either claim that knowledge exists in both the data-recording and model-building subsystems, or that knowledge exists in neither the data-recording nor model-building subsystems. Both are incorrect; therefore, an accurate definition of knowledge must be sensitive to computation.

Conclusion

I think Wolpert’s framework is worth examining closely. There is an elegance to his framework that seems helpful. It assumes no Cartesian boundaries between agents, and grounds everything into a clear notion of a shared objective reality. In this way it is highly aligned with the embedded agency ethos.

It may be worth quantifying inference in terms of the circuit complexity of the function required to recognize what answer is given in what universe. Perhaps we could describe the accumulation of knowledge as a reduction over time of the circuit complexity of the simplest capable of discerning answers from the physical configuration of a system.

New to LessWrong?

New Comment
3 comments, sorted by Click to highlight new comments since: Today at 10:03 AM

Quick thought: reading this I get a sense that some of our collective confusion here revolves around "knowledge" as a quantifiable noun rather than "knowing" as a verb, and if we give up on the idea that knowledge is first a quantifiable thing (rather than a convenient post hoc reification) we open up new avenues of understanding knowledge.

Yeah that resonates with me. I'd be interested in any more thoughts you have on this. Particularly anything about how we might recognize knowing in another entity or in a physical system.

I don't really have a whole picture that I think says more than what others have. I think there's something to knowing as the act of operationalizing information, by which I mean a capacity to act based on information.

To make this more concrete, consider a simple control system like a thermostat or a steam engine governor. These systems contain information in the physical interactions we abstract away to call "signal" that's sent to the "controller". If we had only signal there'd be no knowledge because that's information that is not used to act. The controller creates knowledge by having some response it "knows" to perform when it gets the signal.

This view then doesn't really distinguish knowledge from purpose in a cybernetic sense, and I think that seems reasonable at first blush. This let's us draw a hard line between "dead" information like words in a book and "live" information like words being read.

Of course this doesn't necessarily make all the distinctions we'd hope to make, since this makes no difference between a thermostat and a human when it comes to knowledge. Personally I think that's correct. There's perhaps some interesting extra thing to say about the dynamism of these two systems (the thermostat is an adaption executor only, the human is that and something capable of changing itself intentionally), but I think that's separate from the knowledge question.

Obviously this all hinges on a particular sort of deflationary approach to these terms to have them make sense with the weakest possible assumptions and covering the broadest classes of systems. Whether or not this sort of "knowledge" I'm proposing here is useful for much is another question.