Just throwing this out there as an idle thought when reading this post* - I wonder if the reason linguistic phylogenetics and biological phylogenetics differ is that the phylogeny and the biological traits of the underlying species are often correlated with each other. I don't know if that's true for linguistics, and indeed I'm not sure what traits you are capable of identifying in a text or language that would affect the phylogenetic reconstruction itself.
Credit - Minna Sundberg
Methodology unearths and explores regularities, structural properties of the world that enable our methods to (sometimes) snatch victory from the jaws of ever present computational intractability.
The key intuition is that if different fields share the same regularity, then we can apply the methods from one to the others.
Yet in practice, domain-specific regularities play a significant role too, and they might alter some of the methods’ portability.
Take the Genealogical Regularity as a case study: it says that there are entities related genealogically, meaning that each entity comes from a previous entity, with some changes (for example mutations in evolution).
Wherever we find the Genealogical Regularity, we can apply some form of Comparative Method: by carefully comparing the similarities and the differences between different entities, we can infer their genealogical relations, and maybe even reconstruct a lost parent entity from which they all spring.
I expect most readers will pattern match to phylogenetics, and its famous evolution trees where the entities are whole species.
But in the spirit of comparing exploitations of the same regularity across fields, I want to introduce two other success stories of the Comparative Method: historical linguistics (which gave us the name Comparative Method) and textual criticism.
I’ve already discussed historical linguistics in a previous post: the reconstruction of past languages, and of family relationships of language, from comparing existing languages and traces of dead languages. But I want to slow down and spend a few paragraphs on textual criticism, if only to pay my respects to the lives and the ingenuity poured to save and salvage old lore from the abyss of history.
When you read a nicely printed edition of an old text, say Meditations by Marcus Aurelius, where does the text come from? In this case the work is almost 2000 years, likely written on wax tablets. Because of the perishability of such material, it is most likely disintegrated.[1] And even if it was not, the original manuscript would most likely be lost in some invasion or sacking.
And yet we can read Marcus’ words. Because they were copied by hand, over and over, across the centuries (until we reach the printing press). But as anyone who has had to copy even a small text by hand can report, it is very easy to make copy mistakes. And given the often brutal speed at which copies had to be made, and the sheer numbers of copies, we know for sure that almost all surviving manuscript will differ — as they indeed do.
The problem which textual criticism must solve is thus: what is the “right” version of the text? Is it any specific manuscript, or something that must be reconstructed by combining different manuscripts?
This is where the Genealogical Regularity and the Comparative Method come into play: textual criticism infers the stemma, or the genealogical tree of manuscripts, and uses it to reconstruct as much as possible the archetype, the parent manuscript from which all the extant manuscript come from.[2]
Looping back to the initial topic of the post, phylogenetics, historical linguistics, and textual criticism all exploit the Genealogical Regularity through the Comparative Method. Yet the implementation of the method is not the same for all of these fields, and the differences are interesting.
For example, whereas phylogenetics works well with powerful statistical methods (computational phylogenetics), textual criticism and to an extent historical linguistics have not gotten the same results. Why is that so?
The reason is that these three fields differ in regularities about how common alternative mechanisms for change are, including:
In both cases, the change is not coming from either mutation or inheritance, and thus is merely noise in the reconstruction of the genealogical relations.
Phylogenetics is the most regular here, because by and large most species cannot borrow DNA and features from others[3], and convergent evolution is usually quite rare, much rarer than shared ancestry. So almost all similarities and differences in phylogenetics will be due to genealogical processes, allowing automated statistical inference: the aggregation of a lot of relevant partial information points to the right answer, or at least moves the needle toward it.
But in textual criticism for example, this is completely reversed: by default copyists make many convergent errors (typos, losing a line or an expression…), and so most of the variations provide only noise for textual criticism.
(Paolo Trovato, Everything You Always Wanted To Know About Lachmann’s Method, p.110,115)
I don’t have a good grasp of where historical linguistics stands here, but given the susceptibility of genealogical relations to the choice of corresponding words, I would expect it is closer to textual criticism than phylogenetics.
To summarize, despite sharing a high-level regularity, statistical methods do not transfer well from phylogenetics to textual criticism because of a missing lower-level regularity: that vertically inherited changes are the vast majority of changes.
Another difference in applicability of methods is what can be done with the reconstructed tree.
Here the best case scenario is actually textual criticism: what we are approximating is an actual manuscript which existed. It might not be the original, but it was a real artifact, probably closer to the original than anything we have access to.
Whereas in historical linguistics, reconstructed proto-languages are a sort of compressed abstraction of a language:
(James Clackson, Indo-European Linguistics: An Introduction, p.16)
What about phylogenetics? Well, here the attempts at reconstructing shared ancestors strike me as far more limited. And there is no analogous approach to placing dead languages or known-but-lost manuscripts as nodes in the trees with phylogenetics, because fossils are treated as merely additional leaves of the tree that stopped evolving when they died.
(David A. Baum and Stacey D. Smith, Tree Thinking, p.41)
Why this difference? My guess is a mix of two sub-regularities:
There are many other such examples of variations in methods explained by subtler and more implementation-level disparities in regularities.
To mention just one more, textual criticism can makes far more precise inferences about how a change happened and why, because we have a much better understanding of the process of copying, and of the changes in dialect and education and all that comes into the mistake-making process.
For example, a notion of difficulty of the variants can be used (with care) to infer which variant is more likely to be original:
(Paolo Trovato, Everything You Always Wanted To Know About Lachmann’s Method, p.118)
There are things of this kind in historical linguistics, though less subtle and complex, such as analogy. But nothing that I know of in phylogenetics.
In a way, all this variation and these subregularities impacting methods so much is a blow to my apology of methodology. For if we need to map so many subtleties to know whether methods transfer, is there any hope for practical applications of methodology?
I don’t know. But from an intellectual curiosity perspective, I feel the same way as when, during my PhD, one of my clever hypotheses was proven wrong by a more intricate phenomenon: excited, for the world proved more interesting than I had guessed.
In some rare cases, even when the wax is gone, the writing can be retrieved. (Incidentally, this kind of stuff is one reason I really want to dig into and write about the black magic of epigraphy one day).
This parent manuscript can be, and is likely, different from the actual original. But it is the best that can be reconstructed given the available manuscripts.
Much more common in microorganisms.