Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal

by gwern 4 min read8th Jan 202014 comments

53


Reviving an old General Semantics proposal: borrowing from scientific notation and using subscripts like 'Gwern' for denoting sources (like citation, timing, or medium) might be a useful trick for clearer writing, compared to omitting such information or using standard cumbersome circumlocutions.

I don't believe the Sapir-Whorf hypothesis so beloved of 20th century thinkers & SF, or that we can make ourselves much more rational by One Weird Linguistic Trick. There is no far transfer, and the benefits of improved vocabulary/notation are inherently domain-specific. You think the same thoughts in English as you do in Chinese.

But, like good typography, good linguistic conventions may be worth all told, say, even as much as 5% of whatever one values---and that's not nothing. (It's definitely worthwhile to do things like spellcheck your writings, after all, even though no amount of spellcheck can rescue a bad idea.)

I already use a few unusual conventions, like attempting to use the Kesselman Estimative words to be more systematic about the strength of my claims or always linking fulltext in citations (currently upgrading to 'popups' which do not just link fulltext but present the abstract/excerpts/summary as well) or quote syntax highlighting (to distinguish literal quotes from things like paraphrases or dialogue or rhetorical questions), and I employ a few more domain-specific tricks like avoiding use of the word 'significance' in statistics contexts, automatically inflation-adjusting currencies (to avoid the trivial inconvenience of doing it by hand & so not doing it at all), or using research-specific checklists. Without straying into conlang territory or attempting to do everything in formal logic or serious eccentricity, what else could be done?

One idea for more precise English writing which I think could be usefully revived is broader use of subscripts.

The subscripting idea is derived from General Semantics* (GS), which itself borrows it from standard scientific notation, like physics/statistics/mathematics/chemistry/programming: a superscript/subscript is an index distinguishing multiple versions of something, such as quantity, location, or time, eg vs . They're typically not seen outside of STEM contexts, aside from a few obscure uses like ruby/furigana glosses.

* I am considerably less impressed by other GS linguistic suggestions like E-Prime, but subscripting seems like it may be worth rescuing.

However, there are many places we could use subscripting to be clearer & more compact about which version we are referring to, using them as evidentials, and because it's clearer & more compact, we can afford to use it more places without it wasting space/effort/patience. Citations are a good use case. Why write "Friedenbach (2012)" if we can write "Friedenbach"? The latter is shorter, easier to read, less ambiguous (especially if we use it in parentheticals, see Friedenbach (2012)), and doesn't come in a dozen different slightly-varying house styles. And why restrict it to formal publications or written documents? Apply it to any quote, statement, or opinion where variables like time might be relevant. It is a single unified notation: regardless of whether something was thought, spoken, or written by me in 2020, it gets the same notation---"Gwern". The evidential can be expanded as necessary: if it's a paper or essay, the '2020' can be a hyperlink, or if it's a 'personal communication', then there can be a bibliography entry stating as much, or if it's the author about their own beliefs/actions/statements in 2020, no further information is necessary (and it avoids awkward custom phraseology like "As I thought back in 2020 or so...."). In contrast, normal citation style cumbersomely uses a different format for each, or provides no guidance: how do you gracefully cite a paper written one year but whose author changed their mind 5 years later based on new results and who told you so 10 years after that?

Because it's already used so much in technical writing, subscripting is reasonably familiar to anyone who took highschool chemistry and can be quickly figured out from context for those who've forgotten, and it's well-supported by fonts and markup languages: it's x~t~ in Pandoc Markdown (but not Reddit/LW?), x<sub>t</sub> in HTML, x<subscript>t</subscript> in DocBook, x_t in TeX/LaTeX, x\ :sub: \t in reStructuredText, etc. So subscripting can be used almost everywhere immediately, without needing to be a universal convention.

Example: here are 3 versions of a text; one stripped of citations and evidentials, one with them in long form, and one with subscripts:

  1. I went to Istanbul for a trip, and saw all the friendly street cats there, just as I'd read about in Abdul Bey; he quotes the local Hakim Abdul saying that the cats even look different from cats elsewhere (but after further thought, I'm not sure I agree with that there). I and my wife had a wonderful trip, although while she clearly enjoyed the trip to the city, she claimed the traffic was terribly oppressive and ruined the trip. (Oh really?)
  2. In 2010, I went to Istanbul for a trip, and saw all the friendly street cats there, just as I'd read about in Abdul Bey's 2000 Street Cats of Istanbul; he quotes the local Hakim Abdul in 1970 saying that the cats even look different from cats elsewhere (but after further thought as I write this now in 2020, I'm not sure I agree with Bey (2000)). I and my wife had a wonderful trip, although while she clearly enjoyed the trip to the city, on Facebook she claimed the traffic was terribly oppressive and ruined the trip. (Oh really?)
  3. I went to Istanbul for a trip, and saw all the friendly street cats there, just as I'd read about in Abdul Bey (Street Cats of Istanbul); he quotes the local Hakim Abdul saying that the cats even look different from cats elsewhere (but after further thought, I'm not sure I agree with Bey). I and my wife had a wonderful trip, although while she clearly enjoyed the trip to the city, she claimed the traffic was terribly oppressive and ruined the trip. (Oh really?)

In the first version, suppressing the metadata leads to a confusing passage. What did Bey write? We don't learn when Abdul expressed his opinion---which is important because Istanbul, as a large fast-growing metropolis, may have changed greatly over the 40 years from quote to visit. When did the speaker become skeptical of the claim Istanbul cats both act & look different? What might explain the wife's inconsistency, and which version should we put more weight on?

The second version answers all these questions, but at the cost of considerable prolixity, jamming in comma phrases to specify date or source. Few people would want to either write or read such a passage, and the fussiness has a distinctly pseudo-academic air. Unsurprisingly, few people will bother with this---any more than they will bother providing inflation-adjusted dollar amounts of something from a decade ago (even though that's misleading by a good 15% or so, and compounding), or they'd want to check a paywalled paper, or redo calculations in Roman numerals.

The third version may look a little alien because of the subscripts, but it provides all the information of the second version plus a little more (by making explicit the implicit '2020'), in considerably less space (as we can delete the circumlocutions in favor of a single consistent subscript), and reads more pleasantly (the metadata is literally out of the way until we decide we need it).

Compare and contrast this easily-understood & compact subscripting approach with another possible notation for disambiguating, the "X!Y" notation (derived ultimately from UUCP bang notation, AFAICT), which is associated with online fandoms & fanfiction, and gives notation like "2020!gwern". This notation puts the metadata first, which is confusing yodaspeak (what does the '2020' refer to? it dangles until you read on); it makes it inline & full-sized, and then tacks on an additional character just to take up even more space; it's confusing and unusual to anyone who isn't familiar with it from online fanfiction already, and to those who are familiar, it is low-status and has bad connotations.

The major downside, of course, is that it is novel and weird. It at least is not associated with fanfics like "!", and is associated with science & technology, but I'm sure it will deter readers anyway. Does it do enough good to be worth using despite the considerable hit to weirdness points? That I don't know.

53