What Are We Actually Evaluating When We Say a Belief “Tracks Truth”?

Alex Glaucon

I’ve been thinking about this issue a while. Certainly since grad school. The classic definition of knowledge has been known to be unsatisfactory for decades - sometimes a justified true belief is just a lucky guess. Attempts to patch JTB have generally been additive – knowledge should be justified true belief plus something.

What I want to explore is what happens if knowledge is simply justified belief. What are the implications, at least for bounded agents, if we drop the truth criteria from our definition of knowledge.

I’ll try and set out below why this is not as shocking as it sounds and why a JB model of knowledge is compatible with the idea of truth. In a future post I’ll set out some of the many advantages this way of thinking about truth has, but today I want to test the underlying principles.

Let’s start from the place that LessWrong puts a lot of weight on - the idea that our beliefs should be truth tracking. But what does truth tracking actually mean? I don’t think it actually means that we are tracking truth. Truth tracking means that we have good epistemic procedures.

Consider another metaphor we use a lot of LessWrong - that we want our maps to correspond to the territory. But this metaphor could be misleading. When it comes to knowledge we don’t actually have access to the truth. It maybe that water is H₂0. We believe that is the case, and we have lots of justification. But if you ask me to do more than this, all I can do is provide further justification.

To return to the map metaphor, what we are actually doing is checking is the quality of the map, not the ground itself. We can ask ourselves questions like: when was this map last updated? What do we know about the person who made it? Have we looked for other maps, and actively compared them? But we don’t actually get to look at the underlying landscape.

When someone (including me) says “I know that p,” what they are actually claiming is that their reasons for believing p have cleared whatever justificatory bar feels appropriate given the stakes. That they have been open to defeaters, and found none.

On an icy day

Chris is an experienced guide and has taken groups across the same lake every February for twenty years. The ice always freezes thick, and this year is no different because temperatures have been solidly sub-zero for weeks. Plenty of other people have crossed recently with no trouble; indeed, Chris has crossed the lake himself many times this year.

At 2:55 pm on a cold February afternoon Chris says to a fellow guide “I know the ice is safe to cross” and takes a group out onto the frozen lake. They make it across, as usual; a good time is had by all.

Now consider this.

Unbeknownst to Chris (or anyone else) the ice had formed in an unusual way and actually had a structural flaw. This flaw had also been there the previous year, but no-one noticed. Last year, however, the ice had melted at the end of the season, and the flaw was lost history.

But this year – maybe someone stepped a little further to the left, maybe the group was a bit bigger, maybe they were just unlucky – half an hour later, on the return journey, the ice cracks and someone falls in.

What changed between 2:55 and 3:25 pm?

It wasn’t the truth of Chris’s comment that “I know the ice is safe to cross”. That hadn’t been true for a couple of years. What changed was the epistemic situation. At 3.24 pm Chris believed the ice was safe and had strong justification for thinking so. Chris had all the information that was reasonably available to a finite human being.

But the moment the live defeater became detectable, “I know the ice is safe to cross” stopped being a reasonable thing to say.

This shows how knowledge claims are sensitive to the live justificatory environment, including the presence or absence of tractable defeaters, rather than to some unchanging metaphysical fact about the ice.

Suppose there are two worlds which are epistemically indistinguishable at time t. In one, the ice contains a hidden structural flaw; in the other, it doesn’t. Chris has done the same checks, considered the same evidence, and behaved in the same way in both worlds. If we say he has knowledge in one world but not the other, solely because of a hidden fact he could not possibly access, then knowledge is depending on something that plays no role in his epistemic situation. That makes it hard to see why factivity should be built into the definition of knowledge for bounded agents like us.

Is this just Bayesianism in disguise?

I don’t think so, because there is something else important going on here.

Bayesian updating is (I think) a way to tell us how a rational agent should revise degrees of belief. But Bayes doesn’t tell us when it’s appropriate to switch from “my credence is 0.X” to simple speech act “I know.”

In everyday and in technical contexts, “I know” seems to function as permission to rely or an invitation to act and to let other people act on it. When I know something, I am no longer caveating my belief or telling people they need to double check my homework. Of course, you might still want to check my reasons. Indeed, I might want to check myself one more time.

That process feels like it depends on how high the practical stakes are, how many defeaters I’ve already ruled out, how much bandwidth I have left to keep searching for more. But at some point I need to decide if I can cross the ice (or who shot JFK, or make a call on whether water really is two atoms of hydrogen for every atom of oxygen).

I think knowledge is best understood as a normative or social threshold which is then layered on top of the graded justification. It is not a direct readout of posterior probability.

This is not an argument against Bayes, but it is asking what extra the concept of knowledge brings for finite agents who must act under uncertainty and who can be rational and yet still wrong.

Finite-time epistemology

Classic convergence theorems are limit results where, so long as the true hypothesis is in your space and you keep getting data and updating correctly forever, the posterior goes to 1 on truth.

But real agents don’t live in the limit: we are bound by time, by deadlines, by the need to act without full information. And we make mistakes.

It is possible for a belief to be rationally updated on the best available evidence and defeater-resistant, yet it can still be false. And that despite the fact that we justifiably believed it was true.

If “knowledge” required actual access to truth as a necessary condition, then in real time we could almost never be confident that we know anything. We’d only be able to hand out knowledge certificates after the fact, once the long run has done its audit.

But that’s not how we (or alignment researchers, or engineers, or historians) actually use the word. “I know the reward model points this way” or “I know Lincoln was assassinated at Ford's Theatre” or “I know the earth goes round the sun” in practice means something closer to “my current justification is thick enough, relative to the downside risk, that I’m willing to steer hard on this belief until a defeater forces me to brake.”

If knowledge were truth-gated, then in alignment debates we should refuse to attribute knowledge to systems until we could verify ground-truth correspondence. But that is something we can never be sure we’ve done.

As a matter of common practice - but not parlance - it turns out we have learned to satisfied with knowledge as justified belief where truth is the attractor not the gatekeeper.

Corrigibility is the virtue that matters at finite time

One of the healthiest things about LessWrong is the obsession with corrigibility. We are collectively committed to be willing to actually change our minds when new evidence arrives. Even when its embarrassing or challenges a core belief.

But corrigibility only makes sense if we expect sometimes to act on beliefs that might later turn out to be false. We are saying that we don’t wait to be metaphysically certain. Instead, we act on the best justified model we have, and we stay ready to pivot when the world shows us we were wrong.

This is not an argument against truth. Truth still matters enormously because it kills bad hypotheses and rewards good ones. But at the moment of decision, our epistemic evaluation seems to live almost entirely at the level of justification, calibration, defeater-sensitivity, and stakes.

Stakes matter insofar as they affect what counts as adequate defeater search under a reasonable assessment of risk. An agent can misjudge stakes, and if that misjudgement is itself unreasonable, their justification is weakened. Moreover, the presence of other agents, especially agents facing higher stakes, can itself function as a potential defeater. Their concern is evidence that further search may be warranted. This expanded search may weaken or strengthen the belief, depending on what it reveals, and may push agents with different starting stakes toward convergence. But knowledge does not depend on hidden actual stakes any more than it depends on hidden truth. What matters is the justificatory landscape as reasonably accessible to the agent at time t.

In summary whilst I’m not denying that truth defines calibration or expected utility I am proposing that at time t, truth adds no discriminating power between epistemically identical states. And because of this I'm willing to accept that knowledge can never be more than justified belief.

Some questions on the work a strict truth-condition does

If our knowledge is a primarily a function of justification, then this throws up some interesting questions:

1. Can AI ever be said to believe anything? What is justified belief in that context?

2. Is the truth-condition is mostly a retrospective audit that tells us which belief-forming processes were reliable over the long run?

3. Does truth mainly act as a selector that shapes which heuristics and priors survive cultural / memetic / evolutionary pressure?

I’m curious how other people here weigh it. How important is strict factivity to your picture of justified belief and how much of our real epistemic life can we understand just in terms of defeaters, calibration, and stakes?