Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Financial status: This is independent research, now supported by a grant. I welcome financial support .

Epistemic status: This is in-progress thinking.

This post is part of a sequence on the accumulation of knowledge. Our goal is to articulate what it means for knowledge to accumulate within a physical system.

The challenge is this: given a closed physical system, if I point to a region and tell you that knowledge is accumulating in this region, how would you test my claim? What are the physical characteristics of the accumulation of knowledge? What is it, exactly, about an artifact inscribed with instructions for building advanced technology that makes it so different from an ordinary rock, or from a video camera that has been travelling the cosmos recording data since the beginning of the universe? We are looking for a definition of knowledge at the level of physics.

The previous four posts have explored four possible accounts for what the accumulation of knowledge consists of. Three of the accounts viewed knowledge as a correspondence between map and territory, exploring different operationalizations of "correspondence". The fourth account viewed the accumulation of knowledge as an increasing sensitivity of actions to the environment. We found significant counter-examples to each of the four accounts, and I remain personally unsatisfied with any of the four accounts.

This post will briefly review literature on this topic.

Constructive definitions of knowledge

One could view probability theory as a definition of knowledge. Probability theory would say that knowledge is a mapping from claims about the world to numerical credences in a way that obeys the laws of probability theory, and that the accumulation of knowledge happens when we condition our probabilities on more and more evidence. Probability theory is not usually presented as an account of what knowledge is, but it could be.

However, probability theory is most useful when we are designing machines from scratch rather than when we are interpreting already-constructed machines such as those produced by large-scale search in machine learning. It might be possible to reverse-engineer a neural network and determine whether it encodes a program that approximately implements Bayesian inference, but it would take a lot of work to do that, and it would have to be done on a case-by-case basis. Probability theory is not really intended to give an account of what it means for a physical system to accumulate knowledge, just as an instruction manual for assembling some particular desktop computer is not on its own a very good definition of what computation is.

In the same way we could view all the various formal logical systems, as well as logical induction and other non-probability-theory systems of credences as constructive accounts of knowledge, although these, like probability theory, do not help us much with our present task.

Philosophical analysis of knowledge

There is an extensive philosophical literature concerning the analysis of knowledge. This area received a great deal of attention during the second half of the 20th century. The basic shape of this literature seems to be a rejection of the view that knowledge is "justified true belief", followed by counterexamples to the "justified true belief" paradigm, followed by many attempts to augment the "justified true belief" definition of knowledge, followed by further counterexamples refuting each proposal, followed by whole books being written about general-purpose counterexample construction methods, followed by a general pessimism about coming to any satisfying definition of knowledge. My main source on this literature has been the SEP article on the topic, which of course gives me only a limited perspective on this enormous body of work. I hope that others better-versed in this literature will correct any mistakes that I have made in this section.

The "justified true belief" position, which much of the 20th century literature was written to either refute or fix, is that a person knows a thing if they believe the thing, are justified in believing the thing, and the thing is in fact true.

The classic counterexample to this definition is the case where a person believes a thing due to sound reasoning, but unbeknownst to them were fooled in some way in their perceptions, yet the thing itself turns out to be true by pure luck:

Imagine that we are seeking water on a hot day. We suddenly see water, or so we think. In fact, we are not seeing water but a mirage, but when we reach the spot, we are lucky and find water right there under a rock. Can we say that we had genuine knowledge of water? The answer seems to be negative, for we were just lucky. (Dreyfus 1997: 292)[1]

Counter-examples of this form are known as Gettier cases. There are many, many proposed remedies in the form of a fourth condition to add to the "justified true belief" paradigm. Some of them are:

  • The belief must not have been formed on the basis of any false premise

  • The belief must have been formed in a way that is sensitive to the actual state of the world

  • The thing itself must be true in other nearby worlds where the same belief was formed

  • The belief must have been formed by a reliable cognitive process

  • There must be a causal connection from the way the world is to the belief

The follow-up counterexamples that rebounded off these proposals eventually led to the following situation:

After some decades of such iterations, some epistemologists began to doubt that progress was being made. In her 1994 paper, "The Inescapability of Gettier Problems", Linda Zagzebski suggested that no analysis sufficiently similar to the JTB analysis could ever avoid the problems highlighted by Gettier’s cases. More precisely, Zagzebski argued, any analysans of the form JTB+X, where X is a condition or list of conditions logically independent from justification, truth, and belief, would be susceptible to Gettier-style counterexamples. She offered what was in effect a recipe for constructing Gettier cases:

(1) Start with an example of a case where a subject has a justified false belief that also meets condition X.

(2) Modify the case so that the belief is true merely by luck.

Zagzebski herself proposed "the belief must not be true merely by luck" as a possible fourth condition for the existence of knowledge, but even this fell to further counterexamples.

Overall I found this literature extremely disappointing as an attempt to really get at what knowledge is:

  • The literature seems to be extremely concerned with what certain words mean, particularly the words "knowledge" and "belief". It’s all about what is meant when someone says "I know X" or "I believe X", which is a reasonable question to ask but seems to me like just one way to investigate the actual phenomenon of knowledge out there in the world, yet the literature seems unreasonably fixated on this word-centric style of investigation.

  • The literature seems to be centered around various subjects and their cognition. The definitions are phrased in terms of a certain subject believing something or having good reason for believing something, and this notion that there are subjects that have beliefs seems to be taken as primary by all writers. I suspect that an unseen presupposition of the agent model is at work in sneaking in premises that create the confusion that everyone seems to agree exists in the field. An alternative to taking subjects as primary is to take an underlying mechanistic universe as primary, as this sequence has done. Now there are pros and cons to this mechanistic-universe approach in comparison to the subject-cognition approach. It is not that one approach is outright better than the other. But it is disappointing to see an entire field apparently stuck inside a single way of approaching the problem (the subject-cognition approach), with apparently no-one considering alternatives.

  • The literature spends a lot of time mapping out various possible views and the relationship between them. It is frequently written that "internalists believe this", "state internalism implies this", "access internalism implies that", and so on. It’s good to map out all these views but sometimes the mapping out of views seems to become primary and the direct investigation of what knowledge actually is seems to take a back seat.

Overall I have not found this literature helpful as a means to determine whether the entities we are training with machine learning understand more about the world than we expected, nor how we might construct effective goal-directed entities in a world without Cartesian boundaries.

Eliezer on knowledge

Much of Eliezer’s writing in The Sequences concerns how it is that we acquire knowledge about the world, and what reason we have to trust our knowledge. He suggests this as the fundamental question of rationality:

I sometimes go around saying that the fundamental question of rationality is Why do you believe what you believe?

However, most of Eliezer’s writings about knowledge are closer to the constructive frame than the mechanistic frame. That is, they are about how we can acquire knowledge ourselves and how we can build machines that acquire knowledge, not about how we can detect the accumulation of knowledge in systems for which we have no design blueprints.

One exception to this is Engines of Cognition, which does address the question from a mechanistic frame, and is a significant inspiration for the ideas explored in this sequence (as well as for my previous writings on optimization). In the essay, Eliezer demonstrates, eloquently as ever, that the laws of thermodynamics prohibit the accumulation of knowledge in the absence of physical evidence:

"Forming accurate beliefs requires a corresponding amount of evidence" is a very cogent truth both in human relations and in thermodynamics: if blind faith actually worked as a method of investigation, you could turn warm water into electricity and ice cubes. Just build a Maxwell's Demon that has blind faith in molecule velocities.


So unless you can tell me which specific step in your argument violates the laws of physics by giving you true knowledge of the unseen, don't expect me to believe that a big, elaborate clever argument can do it either.

In the language of the present sequence, what Eliezer points out is that information is a necessary condition for knowledge. I agree that information is a necessary condition for knowledge although I have argued in this sequence that it is not a sufficient condition.

Other writings

I suspect that this literature review has missed a lot of relevant writing both here on lesswrong and elsewhere. I would very much appreciate pointers to further resources.


This sequence has investigated the question of what it means for knowledge to accumulate within a region within a closed physical system. Our motivations were firstly to detect the accumulation of knowledge as a safety measure for entities produced by large-scale search, and secondly to investigate agency without imposing any agent/environment boundary.

The first three definitions we explored were variations on the theme of measuring mutual information between map and territory. We attempted to do this by looking at just a single configuration of the system, then by looking at a distribution over configurations, and then by looking at digital abstraction layers. The counterexamples given for each of these three proposals seem to show that none of these definitions are sufficient conditions for the accumulation of knowledge, though the second of these, and perhaps even the third, seems to be a necessary condition.

We then explored a definition of knowledge accumulation as a coupling between map and territory that shows up as a sensitivity of the part of the system that has "knowledge" to the remainder of the system.

Overall, I am not sure that any of these definitions are really on the right track, but I am very enthusiastic about the question itself. I like the frame of "what would it mean at the level of physics for…" as a means to break out of the agent-centric frame that the philosophical literature on this subject often seems to be inside. I think that keeping in mind the practical goals of AI safety helps to keep the investigation from becoming a conceptual Rorschach test.

  1. Dreyfus, George B.J., 1997, Recognizing Reality: Dharmakirti’s Philosophy and its Tibetan Interpretations, Albany, NY: SUNY Press. ↩︎

New Comment
3 comments, sorted by Click to highlight new comments since: Today at 12:40 AM

I will briefly give it a shot:

Operative definition of knowledge K about X in a localised region R of spacetime:

Number N of yes/no questions (information) which a blank observer O can confidently answer about X, by having access to R.



-Blank observer = no prior exposure to X. Obvious extension to observers which know something already about X.

-Knowledge makes sense only with respect to some entity X, and for a given observer O.

-Access to K in a given R may be very difficult, so an extension of this definition is enforcing a maximum effort E required to extract K. Max N obtained in this way is K.

-Equivalently, this can be defined in terms of probability distributions which are updated after every interaction of O with R.

-This definition requires having access to X, to verify that the content of R is sufficient to unambiguous to answer N questions. As such, it's not useful to quantify accumulation of knowledge about things we don't know entirely. But this has to be expected, I'm pretty sure one can map this to the halting problem.

Anyway, in the future it may be handy for instance to quantify if a computer vision system (and which part of it) has knowledge of objects it is classifying, say an apple.

-To make the definition more usable, one can limit the pool of questions and see which fraction of those can be answered by having access to R.

-The number N of questions should be pruned into classes of questions, to avoid infinities. (e.g. does an apple weighs less than 10kg? Less than 10.1kg? Less than  10.2kg? ...)


Regarding, your attempts at:

-Mutual information between region and environment: Enforcing a max effort E implies that rocks have small amount of knowledge, since it's very hard to reverse engineer them.

-Mutual information over digital abstraction layers: The camera cannot answer yes/no questions, so no knowledge. But a human with access to that camera certainly has more knowledge than one without.

-Precipitation of action: Knowledge is with respect to an observer. So no knowledge for the map alone.

Yeah nice, thank you for thinking about this and writing this comment, Lorenzo.

an extension of this definition is enforcing a maximum effort E required to extract K

I think this is really spot on. Suppose that I compare the knowledge in (1) a Chemistry textbook, (2) a set of journal papers from which one could, in principle, work out everything from the textbook, (3) the raw experimental data from which one could, in principle, work out everything from the journal papers, (4) the physical apparatus and materials from which one could, in principle, extract all the raw experimental data by actually performing experiments. I think that the number of yes/no questions that one can answer given access to (4) is greater than the number of yes/no questions that one can answer given access to (3), and so on for (2) and (1) also. But answering questions based on (4) requires more effort than (3), which requires more effort than (2), which requires more effort than (1).

We must also somehow quantify the usefulness or generality of the questions that we are answering. There are many yes/no questions that we can answer easily with access to (4), such as "what is the distance between this particular object and this other particular object?", or "how much does this particular object weigh?". But if we are attempting to make decisions in service of a goal, the kind of questions we want to answer are more like "what series of chemical reactions must I perform to create this particular molecule?" and here the textbook can give answers with much lower effort than the raw experimental data or the raw materials.

Would be very interested in your thoughts on how to define effort, and how to define this generality/usefulness thing.

I suspect a key piece which is missing from our definition of knowledge is a strong (enough) formalisation of the notion of computational work. In a lot of these cases the knowledge exists as a sort of crystallized computation.

The difference between a chemistry textbook and a rock is that building a predictive model of chemistry by reading a textbook requires much less computational effort than by studying the reactions in a rock.

A ship plotting a shoreline requires very little computational work to extract that map.

The computer with a camera requires very little work to extract the data as to how the room looked. Studying the computer case to get that information requires a lot of computational work.

I don't know how you'd check if a system was accumulating knowledge based on this, but perhaps doing a lot of computational work and storing the results as information might be evidence.