Namespace pollution and name collision are two great concepts in computer programming. They way they are handled in many academic environments seems quite naive to me.

Programs can get quite large and thus naming things well is surprisingly important. Many of my code reviews are primarily about coming up with good names for things. In a large codebase, every time symbolicGenerator() is mentioned, it refers to the same exact thing. If after one part of the codebase has been using symbolicGenerator for a reasonable set of functions, and later another part comes up, and it's programmer realizes that symbolicGenerator is also the best name for that piece, they have to make a tough decision. Either they could refactor the codebase to change all previous mentions of symbolicGenerator to use an alternative name, or they have to come up with an alternative name. They can't have it both ways.

Therefore, naming becomes a political process. Names touch many programmers who have different intuitions and preferences. A large refactor of naming in a section of the codebase that others use would often be taken quite hesitantly by that group.

This makes it all the more important that good names are used initially. As such, reviewers care a lot about the names being pretty good; hopefully they are generic enough so that their components could be expanded while the name remains meaningful; but specific enough to be useful for remembering. Names that get submitted via pull requests represent much of the human part of the interface/API; they're harder to change later on, so obviously require extra work to get right the first time.

To be clear, a name collision is when two unrelated variables have the same name, and namespace pollution refers to when code is initially submitted in ways that are likely to create unnecessary conflicts later on.


My impression is that in much of academia, there are few formal processes for groups of experts to agree on the names for things. There are specific clusters with very highly thought out terminology, particularly around very large sets of related terminology; for instance, biological taxonomies, the metric system, and various aspects of medicine and biology.

By in many other parts, it seems like a free-for-all among the elite. My model of the process is something like, "Someone coming up with a new theory will propose a name for it and put it in their paper. If the paper is accepted (which is typically done with details in mind unrelated to the name), and if others find that theory useful, then they will generally call it the same name as the one used in the proposal. In some cases a few researchers will come up with a few variations for the same idea, in which case one will be selected through the process of what future researchers decide to use, on an individual bases. Often ideas are named after those who came up with them to some capacity; this makes a lot of sense to other experts who worked in these areas, but it's not at all obvious if this is optimal for other people."

The result is that naming is something that happens almost accidentally, as the result of a processes which isn't paying particular attention to making sure the names are right.

When there's little or no naming processes, than actors are incentivized to chose bold names. They don't have to pay the cost for any namespace pollution they create. Two names that come to mind recently have been "The Orthogonality Thesis" or "The Simulation Hypothesis*. These are two rather specific things with very generic names. Those come to mind because they are related to our field, but many academic topics seem similar. Information theory is mostly about encoding schemes, which are now not that important. Systems theory is typically about a subset of dynamical systems. But of course, it would be really awkward for anyone else with a more sensible "Systems theory" to use that name for the new thing.

I feel like AI has had some noticeable bad examples; It's hard to look at all the existing naming and think that this was the result of a systematic and robust naming approach. The Table of Contents of AI A Modern Approach seems quite good to me; that seems very much the case of a few people refactoring things to come up with one high-level overview that is optimized for being such. But the individual parts are obviously messy. A* search, alpha-beta pruning, K-consistency, Gibbs sampling, Dempster-shafer theory, etc.


One of my issues with LessWrong is the naming system. There's by now quite a bit of terminology to understand; the LessWrong wiki seems useful here. But there's no strong process from what I understand. People suggest names in their posts, these either become popular or don't. There's rarely any refactoring.

Showing 3 of 9 replies (Click to show all)
2ozziegooen5moThanks, I didn't know. That matches what I expect from similar fields, though it is a bit disheartening. There's an entire field of library science [] and taxonomy, but they seem rather isolated to specific things.
3ozziegooen5moYep, I'd definitely agree that it's harder. That said, this doesn't mean that it's not high-ev to improve on. One outcome could be that we should be more careful introducing names, as it is difficult to change them. Another would be to work to attempt to have formal ways of changing them after, even though it is difficult (It would be worthwhile in some cases, I assume).

In a recent thread about changing the name of Solstice to Solstice Advent, Oliver Habryka estimated it would cost at least $100,000 to make that happen. This seems like a reasonable estimate to me, and a good lower bound for how much value you could get from a name change to make it worth it

The idea of lowering this cost is quite appealing, but I'm not sure how to make a significant difference there.

I think it's also worth thinking about the counterfactual cost of discouraging naming things.

As an example, here's a post with an important concept that hasn'

... (read more)

ozziegooen's Shortform

by ozziegooen 31st Aug 2019127 comments