Your thinking about how truth relates to AI shows good insight into what AI lacks thus far.
You might benefit a little from relating your questions to some of Tarski's writings on truth in mathematics, which show the necessity of an outside observer for truth to have adequate meaning .Per Tarski, "The grass is green" is true, if and only if the grass is green. (Unstated, implied: as known by the observer of the proposition.)
We all have a subjective viewpoint that locates us within our map of reality, and that viewpoint is what allows us to decide if a statement is true: we use our subjective viewpoint to judge it from our subjective context, which is always going to be different from its objective context within its related literature, LLM data, etc. Perhaps some kind of Gods-eye view would be identically both subjective and objective, but we and our AI are certainly not capable of that. We humans can, however, flip between our subjective map and our analysis of presented data, and AI cannot do that.
Current AI lacks any ability to take any truly subjective stance (even though it can generate a simulation of one on request). For AI to perceive something in the concrete world as true in the sense above, it would have to place itself as a self within the broader universe and take a subjective stance for itself about the data in the LLM, which current LLM's cannot really do. The closest we have as of 2026 would be a human acting as an auditor and interpreter for the LLM.
Interesting. Do you think AI structurally can’t achieve that status, or it’s just at this moment it can’t get there. I tend to the view that consciousness is just a tool agents who have feedback loops and need to model the behaviour of other agents, have evolved. I don’t see why AI can’t either.
I'm certain that LLM's alone are not going to be a way to reach subjective consciousness. I think it's still open whether or not some other tech could get there. Perhaps such AI tech could use an LLM as a language or memory module, which the greater AI would then contextualize with programming of its own.
Of course, I don't think that a subjective viewpoint is always needed or even helpful. What if the answer to "the grass is green" depended on whether the grass on the grounds of a particular data center was currently green, and we did not know that?
I think you’re right on both counts. I wonder if designers of AI will actively work to avoid creating consciousness (assuming they can work out how to do that) to avoid the issue you raise, as well as sidestepping wider ethical concerns. A conscious AI would be over-engineered for many tasks.
I’ve been thinking about this issue a while. Certainly since grad school. The classic definition of knowledge has been known to be unsatisfactory for decades - sometimes a justified true belief is just a lucky guess. Attempts to patch JTB have generally been additive – knowledge should be justified true belief plus something.
What I want to explore is what happens if knowledge is simply justified belief. What are the implications, at least for bounded agents, if we drop the truth criteria from our definition of knowledge.
I’ll try and set out below why this is not as shocking as it sounds and why a JB model of knowledge is compatible with the idea of truth. In a future post I’ll set out some of the many advantages this way of thinking about truth has, but today I want to test the underlying principles.
Let’s start from the place that LessWrong puts a lot of weight on - the idea that our beliefs should be truth tracking. But what does truth tracking actually mean? I don’t think it actually means that we are tracking truth. Truth tracking means that we have good epistemic procedures.
Consider another metaphor we use a lot of LessWrong - that we want our maps to correspond to the territory. But this metaphor could be misleading. When it comes to knowledge we don’t actually have access to the truth. It maybe that water is H20. We believe that is the case, and we have lots of justification. But if you ask me to do more than this, all I can do is provide further justification.
To return to the map metaphor, what we are actually doing is checking is the quality of the map, not the ground itself. We can ask ourselves questions like: when was this map last updated? What do we know about the person who made it? Have we looked for other maps, and actively compared them? But we don’t actually get to look at the underlying landscape.
When someone (including me) says “I know that p,” what they are actually claiming is that their reasons for believing p have cleared whatever justificatory bar feels appropriate given the stakes. That they have been open to defeaters, and found none.
On an icy day
Chris is an experienced guide and has taken groups across the same lake every February for twenty years. The ice always freezes thick, and this year is no different because temperatures have been solidly sub-zero for weeks. Plenty of other people have crossed recently with no trouble; indeed, Chris has crossed the lake himself many times this year.
At 2:55 pm on a cold February afternoon Chris says to a fellow guide “I know the ice is safe to cross” and takes a group out onto the frozen lake. They make it across, as usual; a good time is had by all.
Now consider this.
Unbeknownst to Chris (or anyone else) the ice had formed in an unusual way and actually had a structural flaw. This flaw had also been there the previous year, but no-one noticed. Last year, however, the ice had melted at the end of the season, and the flaw was lost history.
But this year – maybe someone stepped a little further to the left, maybe the group was a bit bigger, maybe they were just unlucky – half an hour later, on the return journey, the ice cracks and someone falls in.
What changed between 2:55 and 3:25 pm?
It wasn’t the truth of Chris’s comment that “I know the ice is safe to cross”. That hadn’t been true for a couple of years. What changed was the epistemic situation. At 3.24 pm Chris believed the ice was safe and had strong justification for thinking so. Chris had all the information that was reasonably available to a finite human being.
But the moment the live defeater became detectable, “I know the ice is safe to cross” stopped being a reasonable thing to say.
This shows how knowledge claims are sensitive to the live justificatory environment, including the presence or absence of tractable defeaters, rather than to some unchanging metaphysical fact about the ice.
Suppose there are two worlds which are epistemically indistinguishable at time t. In one, the ice contains a hidden structural flaw; in the other, it doesn’t. Chris has done the same checks, considered the same evidence, and behaved in the same way in both worlds. If we say he has knowledge in one world but not the other, solely because of a hidden fact he could not possibly access, then knowledge is depending on something that plays no role in his epistemic situation. That makes it hard to see why factivity should be built into the definition of knowledge for bounded agents like us.
Is this just Bayesianism in disguise?
I don’t think so, because there is something else important going on here.
Bayesian updating is (I think) a way to tell us how a rational agent should revise degrees of belief. But Bayes doesn’t tell us when it’s appropriate to switch from “my credence is 0.X” to simple speech act “I know.”
In everyday and in technical contexts, “I know” seems to function as permission to rely or an invitation to act and to let other people act on it. When I know something, I am no longer caveating my belief or telling people they need to double check my homework. Of course, you might still want to check my reasons. Indeed, I might want to check myself one more time.
That process feels like it depends on how high the practical stakes are, how many defeaters I’ve already ruled out, how much bandwidth I have left to keep searching for more. But at some point I need to decide if I can cross the ice (or who shot JFK, or make a call on whether water really is two atoms of hydrogen for every atom of oxygen).
I think knowledge is best understood as a normative or social threshold which is then layered on top of the graded justification. It is not a direct readout of posterior probability.
This is not an argument against Bayes, but it is asking what extra the concept of knowledge brings for finite agents who must act under uncertainty and who can be rational and yet still wrong.
Finite-time epistemology
Classic convergence theorems are limit results where, so long as the true hypothesis is in your space and you keep getting data and updating correctly forever, the posterior goes to 1 on truth.
But real agents don’t live in the limit: we are bound by time, by deadlines, by the need to act without full information. And we make mistakes.
It is possible for a belief to be rationally updated on the best available evidence and defeater-resistant, yet it can still be false. And that despite the fact that we justifiably believed it was true.
If “knowledge” required actual access to truth as a necessary condition, then in real time we could almost never be confident that we know anything. We’d only be able to hand out knowledge certificates after the fact, once the long run has done its audit.
But that’s not how we (or alignment researchers, or engineers, or historians) actually use the word. “I know the reward model points this way” or “I know Lincoln was assassinated at Ford's Theatre” or “I know the earth goes round the sun” in practice means something closer to “my current justification is thick enough, relative to the downside risk, that I’m willing to steer hard on this belief until a defeater forces me to brake.”
If knowledge were truth-gated, then in alignment debates we should refuse to attribute knowledge to systems until we could verify ground-truth correspondence. But that is something we can never be sure we’ve done.
As a matter of common practice - but not parlance - it turns out we have learned to satisfied with knowledge as justified belief where truth is the attractor not the gatekeeper.
Corrigibility is the virtue that matters at finite time
One of the healthiest things about LessWrong is the obsession with corrigibility. We are collectively committed to be willing to actually change our minds when new evidence arrives. Even when its embarrassing or challenges a core belief.
But corrigibility only makes sense if we expect sometimes to act on beliefs that might later turn out to be false. We are saying that we don’t wait to be metaphysically certain. Instead, we act on the best justified model we have, and we stay ready to pivot when the world shows us we were wrong.
This is not an argument against truth. Truth still matters enormously because it kills bad hypotheses and rewards good ones. But at the moment of decision, our epistemic evaluation seems to live almost entirely at the level of justification, calibration, defeater-sensitivity, and stakes.
Stakes matter insofar as they affect what counts as adequate defeater search under a reasonable assessment of risk. An agent can misjudge stakes, and if that misjudgement is itself unreasonable, their justification is weakened. Moreover, the presence of other agents, especially agents facing higher stakes, can itself function as a potential defeater. Their concern is evidence that further search may be warranted. This expanded search may weaken or strengthen the belief, depending on what it reveals, and may push agents with different starting stakes toward convergence. But knowledge does not depend on hidden actual stakes any more than it depends on hidden truth. What matters is the justificatory landscape as reasonably accessible to the agent at time t.
In summary whilst I’m not denying that truth defines calibration or expected utility I am proposing that at time t, truth adds no discriminating power between epistemically identical states. And because of this I'm willing to accept that knowledge can never be more than justified belief.
Some questions on the work a strict truth-condition does
If our knowledge is a primarily a function of justification, then this throws up some interesting questions:
1. Can AI ever be said to believe anything? What is justified belief in that context?
2. Is the truth-condition is mostly a retrospective audit that tells us which belief-forming processes were reliable over the long run?
3. Does truth mainly act as a selector that shapes which heuristics and priors survive cultural / memetic / evolutionary pressure?
I’m curious how other people here weigh it. How important is strict factivity to your picture of justified belief and how much of our real epistemic life can we understand just in terms of defeaters, calibration, and stakes?