# 24

I think the problem of the criterion is really, really important. So important that I'm willing to risk annoying people by bringing it up all the time. I think it matters because a failure to grok it is at the heart of why we fail to notice we're confused and struggle to become less confused. But because the problem seems removed from everyday reality when described in the abstract, I think a lot of people fail to notice when they're grappling with an instance of it. Thus I think it's worth taking some time to explore why I think the problem of the criterion matters so much by looking at some examples of how it shows up.

I'll start by reminding you what the problem of the criterion is, and then spend the remainder of the post diving into some examples of how it shows up and why it matters. The goal is to spend some time seeing the problem in action so we can learn to infer its pattern and recognize when we're running into it so that we don't wander around the world more confused than we need to be.

# Recap

First, a quick recap on the problem in case you're not willing to click the link. The problem of the criterion is a problem in epistemology that initially presents with the following form:

1. A statement is known if it satisfies some criterion for assessing truth.
2. The criterion for assessing truth must be known to use it.
3. Thus the criterion of truth is known if the criterion of truth is known.

This creates a line of circular reasoning that generalizes to the following form:

1. A is B if C.
2. C is B.
3. C is B if C.

As a result, it shows up all over the place as various sorts of problems in different fields. Examples include the problem of induction, the grounding problem, and the problem of the universal prior among others. However, because the problem of the criterion is how this class of circular justification problems shows up in epistemology, it ends up underlying all the others because an ability to know lurks behind all other endeavors.

The problem of the criterion cannot be solved (see the original post for the argument of this point), yet we respond to it everyday by finding ways to function without well-grounded justification using purpose to bridge the gap. No particular purpose can be given pride of place, though—not even truth—except after having already adopted some particular purpose, thus our justifications for what we know are always contingent upon what we care about.

# How the Problem of the Criterion Shows Up

To get a better grip on the problem and why it matters, let's consider various cases where the root issue is the problem of the criterion. Feel free to jump around and read only the sections that seem most relevant to you.

## Grounding the Meaning of Words

Zack M Davis has written several posts about the meanings of words. It starts roughly here and you can find the sequels in the list of pingback links on the post. I'd summarize Zack's point as the following: words are a way of describing reality by grouping it into categories, and the categories we group things into are only partially constrained by reality as we observe it. Intentions also matter and have real consequences for how we group things together using particular words. See also Chris Leong's review of Wittgenstein's Philosophical Investigations for more on this topic.

John Wentworth has been following a different but related thread in his work on abstractions, and in particular in his work on finding natural abstractions. Although abstractions in general depend on fit to the questions being asked, John is hoping to find a way to discover natural abstractions that are approximately the same independent of the question being asked or even the entity doing the asking (so that, say, we might know if an AI is using the same abstractions as we humans are).

The problem of the criterion tells us that we should not be surprised to find some intent or purpose (like answering a question) impacts the abstractions or categories we use to describe the world. In fact, the problem of the criterion implies it must be this way, since whatever abstractions we use cannot be fully grounded in observed facts. Instead our choice of abstractions depends on some unjustified intent that shapes what is meaningful and useful.

Zack tackles this head on when he, for example, examines what it would mean to call a whale a fish. I'm unclear if John has fully grasped the situation he's in, and that any natural abstractions he finds will only be natural up to the limit of the purpose those abstractions were expected to serve, even if it's a seemingly general one like answering questions in ways that predict reality. This doesn't mean he can't find them and that they won't be useful (I hope he does, and I think they will!), only that it points out a key limit to the idea of natural abstractions: they still are only natural insofar as we've fixed what it is we think it means to be "natural".

We'll see this pattern again shortly, but if you're particularly interested in issues with abstractions and AI the section on reinforcement learning below will likely be of interest to you.

## Failure to Align Oneself to a Goal and Do It

Nate Soares wrote a whole series of posts because lots of people he knew had a problem of excessive guilt over not doing what they thought was best. The problem seems to arise like this: a person reasons out what they think is best, they find they don't want to do it despite seemingly knowing they should do it, and then don't do it even harder because they feel guilty about not doing it. Compare also the literature on akrasia and scrupulosity. His proposed solution is, at a high level, to figure out what you actually care about and do that rather than whatever you thought you should care about.

We see something similar in Kaj Sotala's posts about multi-agent models of mind: different parts of the mind not agreeing with each other that result in self-distrust that leads to suboptimal action. See also Jan Kulveit on multi-agent predictive minds and alignment, Steve Byrnes on inner alignment in the brain, David Chapman on a crisis of meaning, and Alex Flint on three enigmas that lead to deep doubt that prevents us from fully trusting our own judgement.

Similar themes, different approaches.

I see these issues with commitment and guilt as special cases of a failure to grasp the true depth of the problem of the criterion. Many people, particularly rationalists and rationalist-like people, have an expectation that their reasons for doing things should be justified in well-grounded lines of reasoning. Unfortunately, the problem of the criterion implies they cannot ground their reasoning in anything other than unjustified purposes that they care about because they care about them, so they get stuck in a guilt cycle as they spiral around their inability to find the grounding they desire. Luckily, there is an alternative: accept that we're going to ground things in what we care deeply about and that we care about those things for unjustifiable reasons like "evolution just happened to shape us this way", and move on contingently with the understanding that what matters to us hinges on the partially shared, partially individualistic nature of what we care about.

For fans of Eliezer Yudkowsky's fictional story "Three Worlds Collide", the above arguments suggest a reading of the story as an example of how the unjustified and accidental way our values were shaped lies at the core of what we care about and how those things are extremely important to us.

Speaking of values...

In theory, rational agents cannot disagree. In reality, reasonable people disagree all the time. Yes, reasonable people, no matter how well trained in the art of rationality, are not perfect rational agents. Nevertheless, reasonable people still end up disagreeing a lot more than we might expect. What's going on?

CFAR has a technique called double crux. The idea is that when two people disagree, they keep digging down on that disagreement, looking for a statement that, if they resolved it one way or the other, would make one person agree with the other depending on the outcome and vice versa. This ends up being the crux of their disagreement with each other.

Unfortunately, the double crux technique doesn't always work as we might hope. Sometimes there are only individual cruxes, no double cruxes, or people struggle to satisfy all the conditions that allow the technique to work. But a bigger problem is that sometimes people just disagree about things they can't reasonable resolve because they're matters of value—perhaps even matters of terminal values—rather than fact.

For example, perhaps the reason Alice disagrees with Bob about whether or not it's a good idea to eat vegetables has nothing to do with nutrition facts and entirely to do with differences in subjective taste experience. Alice can agree with all Bob's arguments about why eating vegetables is a good idea and still refuse to eat vegetables because she doesn't like how they taste enough to eat them anyway.

This example is a bit contrived, but it seriously looks like many political disagreements, once everyone agrees on the facts, may still be intractable because people just care about different stuff. Sure, those differences aren't necessarily fixed and values can change, but values typically shift not by changing minds but by changing hearts—that is, by appealing to emotion rather than logic and reason.

That people can agree about facts and still disagree because they have different values points straight to the problem of the criterion. Since systematic reasoning is grounded not in more systematic reasoning but in what we value, it follows naturally that reasoning is only sufficient to convince those who already agree with you on values. If you disagree on values, you must first modify what they care about if you ever hope to agree.

Agreeing on values is, at some level, the whole game when it comes to AI alignment, so we'll look at an instance of the problem of the criterion in AI next.

## No Free Lunch in Inverse Reinforcement Learning

Stuart Armstrong and Sören Mindermann's no free lunch theorem for inverse reinforcement learning says, in short, that we can't accurately learn the preferences of an agent without making normative assumptions. As a consequence, if we want to build AI that learns what humans want, we have to start out with more than zero assumptions about how humans decide what to do. This requires that we have some way to figure out those norms. Armstrong offers some direction on this, as do Abram Demski and I, and I think you can ultimately read a variety of alignment work as attempts to address the question "how do we figure out how to figure out what humans care about?".

This issue of needing to make normative assumptions to learn the preferences of an agent has the same form as the problem of the criterion. It has the same form in that learning what an agent prefers requires already knowing something about what it prefers (or meta-prefers or meta-meta-prefers if you like). And since we humans are the ones trying to build this AI (or at least build AI or other systems that will build AI for us that learns preferences), that means that whatever answers we come up to will be grounded in our knowledge and thus ultimately founded on what we care about.

I've previously—and somewhat more clumsily—addressed the above point for AI alignment in general in "Robustness to fundamental uncertainty in AGI alignment" if you're curious to dig into this more.

## Epistemic Humility, or Lack Thereof

Let's end by looking at an issue of epistemology, which is ultimately what the problem of the criterion is all about.

A founding insight of the rationality movement is that we tend to be wildly overconfident in our beliefs. For example,

• We are so overconfident in how quickly we can get work done that even when we know we are overconfident we still aren't able to correct for it. This phenomenon is so common it even has a name: Hofstadter's Law.
• People tend to be overly sure that they are better/worse than average on a wide variety of measures. It's called illusory superiority/inferiority.
• Calibration is surprisingly hard, especially at the tails, leading us to think we understand the world better than we actually do.
• And even when we do manage to notice we are confused, we often are still not noticing all of that confusion.

Overconfidence is another way of saying we lack of epistemic humility. That is, we generally fail to recognize and account accurately for uncertainty about our beliefs and the frequency with which we make mistakes. As Eliezer might put it, we fail to be the lens that sees its own flaws.

Why might this be? Again, I see the source in a lack of appreciation for the problem of the criterion. The problem of the criterion shows us that our ultimate reasons for believing things are justified by our values, purposes, and concerns rather than facts since no fact can stand on its own independent of what we care about. But we fail to fully account for this and thus tend to anchor too strongly on our assumptions about how the world works. Consequently it's harder than it otherwise would be for us to notice that we may be framing a problem wrong, and we fail to accurately anticipate the likelihood of ontological crisis.

Said more plainly, we are often more wrong than we know even when we know we're wrong because we don't truly get that what we know is contingent rather than fixed.

# Why It Matters, Summarized

Each of these examples rests on the same fundamental problem: we think we know more than we really do. The problem of the criterion shows us exactly how and why this is happening: we have to make unjustified assumptions in order to have thoughts at all. Unfortunately, these assumptions are so pervasive and invisible to us that it's hard to come to grips with the fact they exist, let alone see them and account for them. Thus, the problem of the criterion rears its head everywhere because it's the fundamental problem in epistemology, and knowing of one kind or another is at the heart of all human endeavors.

As a result we can't really hope to be right: the problem of the criterion forever locks us out from obtaining perfect knowledge of reality. But we can aspire to be less wrong! We can move ourselves in that direction by learning to see when we're encountering the problem of the criterion. Then we can engage with the fundamental uncertainty it creates rather than pretend we're more certain than we have rights to be.

Thanks to Duncan Sabien for useful feedback via Less Wrong's feedback service.

New Comment

Saying that something is true/useful/specific/good/interesting/hedonic is not very specific, there are particular senses of these qualifiers relevant in different contexts, inducing normativity in their own way. Why is a particular sense of a particular qualifier relevant in a particular context? That is often unclear, you felt it appropriate to pursue with your human mind, perhaps incorrectly.

Clear explanations are an unusual thing, they are not at all always available. There's the physical explanation, things happen because laws of physics, that's occasionally relevant. For events that have to do with cognition, such as beliefs and hypotheses and ideas, the more interesting explanations are object level (succeeding in reaching for a particular thing in the world, or for a particular abstract construction), or else general principles for maintaining performance or improving design of a mind in some way.

For things that are not instrumentally convergent, and don't chase a clearly formulated objective, relevant considerations are a matter of preference, and the only explanation is some handwaving about how they should channel preference (even though that can't be carried out in greater generality on human level), which can't be unpacked in a form that's not irreducibly messy.

I've always been confused on what people find confusing re the problem of the criterion. If I get it right, you can state it as "you must know how to assess truth in order to know how to assess truth", or something like that. My confusion about confusion lies in squaring that with the idea of embedded agency, where we are a part of partially predictable universe and contain an imperfect model of the universe. Therefore something like a criterion for assessing truth is inherent in the setup, otherwise we would not count as agents.

[-]TAG10

Everything that an embedded agent does, or needs to do comes, under usefulness. Inasmuch as truth goes beyond usefulness , arguments from agency don't work.

The criterion for assessing truth must be known to use it.

This seems like the bad assumption to me. (This is quite a bit different from your explanation of the criterion in the linked article, so I'm commenting on this one separately.)

For example, take a foundationalist bayesian epistemology, where Solomonoff induction is considered supreme. The justification of a degree-of-belief rests in (a) the observations so far, (b) the complexity of the hypotheses, and (c) the bayesian math connecting the two to produce a posterior.

The observations (a) have to be known in order to be used, but the reasoner does not need to have any explicit knowledge of the criterion (b and c) in order to use them correctly. It can simply be built into the way you reason. So within this frame, the criterion does not have to itself be known in order to be used.

More generally, if one supposes that there is a good criterion, it seems like all that is necessary (in order to arrive at normatively correct beliefs) is to apply the criterion correctly. It seems there is no additional reason why one must know the criterion explicitly.

So, no circular reasoning arises.

(But note that I'm saying this more to see what reaction it gets, rather than as a proposed solution to the criterion problem. The criterion problem seems to be pretty vague, so I'm only claiming this as a possible response to the specific version here.)

I think this is just sloppy writing on my part. When I wrote this I was trying to iron out some points of confusion and disagreement I had been addressing in comments on previous posts. I think this 3-part framing in this post is probably misleading and should be ignored.

I was trying to find a crisp way to explain why I think the thing being pointed at by the problem of the criterion is everywhere and relevant to a large number of problems. I think this particular presentation is bad.

• A is B if C.
• C is B.
• C is B if C.

Just want to note that I don't get what this is trying to do formally. Maybe some formatting is missing? It seems like complete gobbledygook to me. Are the letters supposed to be sentences? Or set variables? Or what? "A is B" seems like it's trying to say something like predicate B applies to object A, or A is in set B, something like that. But then "C is B if C" makes no sense, because C would be an element, rather than a condition that can be true/false.

My review is really long so you might want to highlight the appropriate section.

It occured to that the statement "The problem of the criterion cannot be solved" is itself subject to the problem of the criterion: One has to assume and believe in a criterion of what counts as a solution and what does not. Therefore particularism is implicitly assumed.

Yes, at least insofar as we must use words and concepts.