One of the main reasons Large Language Models (LLMs) dominate contemporary generative-AI development is that natural language remains the most efficient code we possess for human knowledge and reasoning. When we train machines on language, we outsource part of our own cognitive framework. Consequently, the logical clarity of that framework sets the ceiling on the intelligence that can be built on it.
This realization invites a return to Ludwig Wittgenstein’s early work, the Tractatus Logico-Philosophicus. The Tractatus is an attempt to draw the limits of sense by mapping the logical structure of the real world onto the logical structure of language. For Wittgenstein, the world consists of facts, not things: facts are states of affairs in which objects stand in specific relations. A proposition has meaning only if it mirrors a possible state of affairs, that is, if its logical form pictures the structure of the reality it describes. Hence, meaning hinges not on truth but on testability: a sentence is meaningful only if its structure can be matched to reality and judged true or false.
From this standpoint, the elementary building blocks of language are simple, logically independent propositions. More complex claims arise through logical combinations that depict reality. When that logical architecture is missing, sentences collapse into nonsense, expressions immune to verification or falsification.
Why This Matters for AI: Modern LLMs do not reason over logical forms; they predict the next token (word) from statistical patterns. They have no built-in sense of whether the linguistic units they emit correspond to any possible facts. That indifference produces outputs that read fluently yet map onto no conceivable state of affairs, a phenomenon now labelled AI hallucinations.
AI hallucinations are often framed as a truth-monitoring problem that can be solved by aligning language models with external reality. Wittgenstein, however, reminds us of a prior requirement: before we test the truth of a sentence, we must first secure its sense, its capacity to be true or false. In this light, AI’s grammatically perfect but logically shapeless outputs are not merely unreliable, they are meaningless.
Filtering training data through a Tractarian lens would mean excluding or flagging sentences whose structure fails to picture a possible state of affairs, for instance, sentences that attempt to state ethical, aesthetic, mystical, or religious claims. Such utterances are not unimportant, yet their significance must be shown rather than said. Reducing them to propositional language yields only pseudo-statements, and feeding vast quantities of such pseudo-statements into LLMs can only amplify the problem.
Going beyond hallucinations, we can imagine two divergent paths for future AI. In one, language models remain eloquent imitators, blending fact, fiction, and errors, digitally echoing humanity’s conceptual confusions. In the other, they become disciplined advisors whose sentences possess determinate logical form, rendering each output transparent and open to empirical or theoretical refutation. This second path would also fulfill Karl Popper’s maxim that genuine knowledge grows through falsification: only testable statements can advance understanding.
Although the Tractatus is an early work by Wittgenstein, much AI theory has leaned on the later Wittgenstein and his idea of language-games, useful for modeling context, metaphor, and pragmatics. However, as AI scales to infrastructure-level importance, revisiting the early, rigorous Wittgenstein may be prudent if we want machines to serve as guardians of logic rather than generators of ever more sophisticated chatter. For this, linguistic diet must be logically rigorous.
A workable compromise can be reached. Systems tasked with technical, scientific, or safety-critical decisions should be trained on material vetted for logical form and propositional clarity, compliant with Tractatus language. Other systems aimed at art, emotional support, or cultural expression may freely explore the richer terrain of language-games. However, merging those two functions would place serious decisions in the hands of systems whose outputs, by definition, cannot be tested for truth or falsehood.
Humans are already perfectly capable of paradox, ambiguity, and spiritual craving; we do not need our machines to double down on those traits. What we lack is relentless logical discipline, something AI can supply under the right conditions, by protecting the boundary between what can be meaningfully said and what can only be shown and letting that boundary shape how we train artificial intelligence to speak.