Muddling Through Some Thoughts on the Nature of Historiography

“Did it actually happen like that?” is the key question to understanding the past. To answer this historians uncover a set of facts and then contextualize the information into an event with explanatory or narrative power.

Over the past few years I’ve been very interested in understanding the nature of “true history.” It seems more relevant every day due to the continually, and always increasingly, connected nature of our world, and the proliferation of AI tools to synthesize, generate, and review knowledge. But even though we have all of these big datasets and powerful new ways of understanding them, isn’t it interesting that we don’t have a more holistic way of looking at the actual nature of historical events?

I, unfortunately, don’t have an all-encompassing solution^[1] to this question, but do have a proposal which is to suggest we think about this in a quasi-Bayesian way at a universal scale. By assigning every computably simulatable past to a prefix-free universal Turing machine we can let evidence narrow the algorithmic-probability mass until only the simplest still-plausible histories survive.^[2] I am going to unpack and expand this idea, but first I want to provide a bit more background on the silly thought experiment that kicked off this question for me.

Consider the following: did a chicken cross the road 2,000 years ago? Assume the only record we have of the event is a brief fragment of text scribbled and then lost to history. A scholar who rediscovers the relic is immediately confronted with a host of questions: Who wrote this? Where was the road? What type of chicken was it? Did it actually happen? (and importantly why did the chicken cross the road?) These are very difficult, and in many cases, unanswerable questions because we lack additional data to corroborate the information.

In comparison, if a chicken crossed the road today, say in San Francisco, we are likely to have an extremely data-rich dataset to validate the event. We likely would have: social media posts about a chicken, Waymo sensor data showing the vehicle slowing, video from a store camera, etc. In other words, we can be much more certain that this event actually occurred, and apply much higher confidence to the details.^[3]

A commonality to both of these events is the underlying information that arises from both the observation and the thing that happened itself (in whatever way the chicken may have crossed the road.) The nature of this inquiry is basic and largely trivialized as too mundane, or implicit in the process of a historian, but I think it is a mistake to skip over too lightly.

Early historians such as Thucydides examined this problem and thought about it nearly at the start of recorded history as we commonly think about it. He attempts to evaluate the truth by stating that “...I have not ventured to speak from any chance information, nor according to any notion of my own; I have described nothing but what I either saw myself, or learned from others of whom I made the most careful and particular enquiry.”

In this essay I don’t want to attempt a comprehensive overview of the history of historiography, but I will mention that after Leopold von Ranke famously insisted on telling history “as it actually was,” it seems as though the nature of objective truth was quickly put to debate from a number of angles, including politically (say Marxian views of history), linguistically, statistically, etc.

So, shifting to 2025, where does this leave us? We have more precise and better instruments to record and analyze reality, but conceptually there is a bit of missing theoretical stitching on this more mundane topic of the truth in chronological events.

While it would be nice to know all physically possible histories, call this , it is not computable. So I think we should be content with a set $Ω s i m$ which I define as every micro-history a computer could theoretically simulate. These simulatable histories coevolve with the state of scientific knowledge as it improves.

Before we check any evidence to see how plausible each candidate history ω (a member of $Ω s i m$ ) is, we should attach it to a prefix-free universal Turing machine $U$ as our prior. In doing this there is no valid program in $U$ that is a prefix of another so Kraft's^[4] inequality allows us to not have a probability leak and not break Bayes:

$m (ω) = \sum_{U (p) = ω} 2^{- | p |} satisfies \sum_{ω} m (ω) \leq 1.$

Furthermore, define Kolmogorov complexity $\begin{matrix} K (ω) = {min}_{U (p) = ω} | p | \\ (2) \end{matrix}$ the length of the shortest $U$ program that replays a micro-history. Because every contributing program is at least $K (ω)$ bits long, we have:

$\begin{matrix} 2^{- K (ω)} \leq m (ω) \leq 2^{- K (ω) + 1}, \\ (3) \end{matrix}$

So $m (ω)$ is different from $2^{-K(\omega)}\$ only by a fixed multiplicative constant. Shorter descriptions of the past are more plausible at the outset; the reasoning being that it's sensible to approach this with basically a universal Occam’s razor. Evidence of information, like documents or video, arise from an observation map:

$G : ω ⟼ s = C (I (ω)),$

with I as instruments we use to record information, governed by the current science, and C the semantic layer that turns raw signals into understandable narratives like “a ship’s log entry” or “a post on X.” And $s$ denotes very compressed observations of ω; I imagine it’s often, but not necessarily the case that, $K (s) ≪ K (ω)$ .

Shannon separated channel from meaning,^[5] but here we let both layers carry algorithmic cost (semantics has its own algorithmic cost). As a person redefines or reinterprets an artifact of data, the are really redefining the semantic layer $C$ . It’s true that $K (C)$ is huge and not computable in practice, but the upper bounds are fair game with modern compressors or LLMs.

So a general approach would be to say a historian starts by characterizing $G^{- 1} (s) = {ω^{'} \in Ω_{sim} ∣ G (ω^{'}) = s}$ , all simulated pasts consistent with the evidence, and then updates it with Bayes:

$P (ω ∣ s) \propto P (s ∣ ω) m (ω)$

The normalizer $Z (s) = \sum_{ω} P (s ∣ ω) m (ω) \leq 1$ tends to shrink as evidence accumulates.

I came up with an “overall objectivity score" (the OOS) as a quick way of thinking about the negative log, which is the standard surprisal of the data. The higher the number, the closer data circumscribes the past: $OOS (s) = - {log}_{2} Z (s)$ .

This is gauge dependent because its absolute value makes sense only for fixed Ω𝑠𝑖𝑚, G, and universal machine U.^[6]

A big issue is that exact $K$ , $P (s ∣ ω)$ , and $Z$ are uncomputable^[7] but we could approximate them in a couple of ways: perhaps by replacing our shortest program with the minimum description length (shortest computable two-part code of model bits and residual bits) for an upper bound on $K$ ^[8] or via Approximate Bayesian Computation for $P(s\mid\omega)\$.^[9] In MDL, an explanatory reconstruction is less likely if its approximate codelength is consistently larger than a rival’s.

As technology improves we should expect to see the set of histories that we can compute expand as we have better sensors, simulators, and instruments that can record things. As better computing and AI expands $Ω s i m$ and better DNA analysis gives a new data stream to $G$ , richer evidence decreases possible histories by shrinking $G^{- 1} (s)$ .

All-in-all, I’m not sure if this thinking is useful at all to anyone out there. I started out with a super expansive vision of trying to create some super elegant and detailed way of evaluating history given all of the new data streams that we have, but it appears that many AI providers and social media platforms are already incorporating some of these concepts.

Still, I hope this line of thinking is useful to someone out there, and I would love to hear if you have been thinking about these topics too!^[10]

^{^}
I am declaring an intellectual truce with myself for the moment, one I should have ceded to a while ago. I have tried a number of different ideas over the months, including more direct modeling and an axiomatization attempt
^{^}
I am very influenced by the work of Gregory Chaitin, most recently via his wonderfully accessible "PHILOSOPHICAL MATHEMATICS, Infinity, Incompleteness, Irreducibility - A short, illustrated history of ideas" from 2023
^{^}
This assumes that we have enough shared semantic and linguistic overlap
^{^}
https://en.wikipedia.org/wiki/Kraft%E2%80%93McMillan_inequality
^{^}
"“Frequently the messages have meaning; that is, they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem." https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf
^{^}
I guess I will throw in a qualifier to say under standard anti-malign assumptions, but I don't really think this is super relevant if you happen to be thinking about this post: https://www.lesswrong.com/posts/Tr7tAyt5zZpdTwTQK/the-solomonoff-prior-is-malign
^{^}
INFORMATION THEORETIC INCOMPLETENESS - G. J. Chaitin https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=996e957c92d2c8feb3aabebe24539fde1de7c6b7
^{^}
https://en.wikipedia.org/wiki/Minimum_description_length or https://homepages.cwi.nl/~pdg/ftp/mdlintro.pdf
^{^}
https://arxiv.org/pdf/1802.09720
^{^}
I'm very willing to break my truce ;)

LESSWRONG
LW

LESSWRONG
LW

2

Muddling Through Some Thoughts on the Nature of Historiography

2

2