A Word to the Wise is Sufficient because the Wise Know So Many Words

lsusr

Collect Ontologies

An ontology is a way of bucketing reality. For example, Russians distinguish голубой from синий whereas Anglos bucket both into "blue". An ontology is an implicit set of priors about the world. If you never bucket people according to skin color then you will be bad at predicting who prefers chopsticks over forks. If you always bucket people according to skin color then you will miss out on human universals.

It takes an untrained neural network a long time to distinguish poisonous snakes from nonpoisonous snakes. It will take you much less time if you treat coral snakes and milk snakes as separate species, even though they look similar to each other.

Different ontologies are appropriate for different contexts. Zero-sum models of the world work great when planning adversarial strategies. Zero-sum models of the world are counterproductive when attempting to put together a family dinner. The more different ontologies you know, the the faster you can process new data.

The more ontologies you know the more information it takes you to distinguish between them. In practice, it doesn't matter because the entropy required to distinguish ontologies is so tiny^[1].

The (Cheap) Price

How many ontologies can a person learn? Words aren't ontologies but I think the number of words a person knows provides a reasonable Fermi estimate of the number of ontologies bouncing around our heads. A normal person knows perhaps 32,768 words. If we treat each word as a distinct binary ontology of equal prior probability then it takes 15 bits of entropy to figure out which ontology to use.

The entropy of English text is estimated at 2.3 bits per letter. It takes approximately 6 letters to figure out what word a person is trying to say. Technically, this is just a roundabout way of saying the average English word is about 6 letters long but my point is even with tens of thousands of ontologies to draw from, it takes a tiny bit of entropy (on the order of one word) to find the needle in the haystack. A word to the wise is literally sufficient.

That's assuming all prior probabilities are equally probable. If we allow for unequal prior probability distributions then the median entropy cost is less than a single word. The more ontologies you learn, the smarter you become, because each ontology contains a heavy Bayesian prior you can apply to new situations you encounter. The extreme informatic efficiency of pre-learned ontologies is how you get good at small data.

More precisely, distinguishing between ontologies takes little training data entropy relative to the entropy required to operate within an ontology with $n$ tunable parameters. ↩︎

This is not really correct. An ontology is the system containing the buckets, and I would agree with your claim here that having lots of buckets is useful, but having lots of ontologies is having different ways of segmenting the same reality into sets of buckets. That doesn't help more rapidly pick out what someone else is saying, though - it instead illustrates why locating something in concept space doesn't work the way you describe.

For instance, knowing multiple color ontologies might be helpful in understanding how colors can be split up, so that in Russian, голубой is more akin to cyan, while синий is more blue, but in Hebrew, they distinguish between תכלת, light blue, and כחול, other blue colors. (They also have the transliterated word cyan, ציאן.) In this case, the benefit isn't more rapidly picking out what the other person is saying, but is instead understanding how they might be pointing to something different than what you assumed. It also lets you understand that "grue" and "bleen" might be less unnatural as categories than you would have assumed.

So having multiple ontologies actually increases ambiguity to a level that more correctly captures reality. Unfortunately, contra your claims, the amount of data it takes to correctly identify the ontology, rather than the category within a known ontology - is very large.

I really appreciate this post. In Chinese, the vocal pronouns for "he" and "she" are the same (they are distinguished in writing). It is common for Chinese ESL students to mix the words "she" and "he" when speaking. I have been trying to understand this, and relate it to my (embarrassingly recent) understanding that probabilistic forecasts (which I now use ubiquitously) are a different "epistemology" than I used to have. This post is a very concrete exploration of the subject. Thank you!

30

A Word to the Wise is Sufficient because the Wise Know So Many Words

30

Collect Ontologies

The (Cheap) Price

30

30