It isn't obvious to me why this is better than e.g. what at the end you quote Fischer 2023 as doing, which (1) feels to me less like a special convention that might need explaining and (2) works fine without needing to be able to write subscripts and without running into gotchas related to how subscripts are implemented (e.g., if you do them with Unicode subscripts then I think searching for "80%" will not find an "80%" subscript, because those are different characters).
What advantage do you see to using subscripts that outweighs those factors?
e.g., if you do them with Unicode subscripts then I think searching for "80%" will not find an "80%" subscript, because those are different characters
Browsers unify many characters for search purposes (or strip them out), but it looks like Unicode sub/superscripts are sometimes but not always considered equivalent. You can test this out in your own browser by going to https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts#Superscripts_and_subscripts_block and doing C-f '2' or '3' or something. I get no hits in Firefox, but I do in Chromium. (And even if your browser does, what about all your other tools? Stuff like grep
sure won't treat them as equivalent without a lot of work. Or they will get stripped out, or turn into mojibake, or...) So, not something you can count on.
Aside from weirdness like that, I also think that the Unicode sub/superscript characters tend to look jarring and out-of-place. I don't know if the fonts are bad, or they omit it & the fallback is bad, or if they are 'typographically correct' but we are so unfamiliar with 'proper' sub/superscript compared to HTML ones that they look wrong to us, or what. There are many places where Unicode works well for fancier typography, but between the omission of many letters*, breaking tools, and bad appearance, the Unicode sub/superscripts are a bad solution if you have anything better available.
I'd consider using them only if I was restricted to pure UTF-8 text, with nothing else. (For example, a link tooltip. Or maybe a machine-learning context where the model can't handle HTML formatting.)
* Which you'd want for... a lot of things. For example, you could write with Unicode subscripts 'Foo 2023a', but not 'Foo 2023b'. Because there's a subscript 'a' but not a subscript b' (or 'c', or 'd'). Yeah, I know. So if you absolutely insist on Unicode subscripts, now you need a new way to disambiguate, like 'Foo 2023-1' vs 'Foo 2023-2' or something.
I hadn't thought about the issue with searching, that's a pretty good counterargument. (I am not able to search for the probabilities in this document either, because the isn't searchable :-/)
Ultimately it comes down to an aesthetic preference for me: I will use these because they look kind of neat. But perhaps applying the reversal test to something like footnotes is interesting here: Imagine one was always writing "more specialized predators have bigger prey (see footnote 3)" instead of "more specialized predators have bigger prey³". The latter is more compact, but not searchable.
Obviously there are switching costs associated with this. But perhaps the compactness that's an advantage for footnotes is a similar advantage here, that's why I'm trying it out.
The latter is more compact, but not searchable.
That still has search problems! Consider: "see footnotes 3, 9, and 11–13". How do you search for any of those 4 footnotes? The natural language approach is inherently ambiguous for such a hypertext problem which requires some formal support.
(The real solution there is footnote backlinks, like we have on Gwern.net: you can search for all references - site-wide, too - to a footnote by simply going to the footnote in question. If you're not up to that, then a lightweight HTML approach would be to simply wrap each footnote number in a span and hide the text from display, but not search, so C-f 'footnote 3' would always hit the "prey³" construct.)
here's the non-quantified meaning in terms of wh-movement from right to left:
for conlanging, i like this set of principles:
so to quantify sentence , i prefer ur suggestion "I think it'll rain tomorrow". the percentage is supposed to modify "I think" anyway, so it makes more sense to make them adjacent. it's just more work bc it's novel syntax, but that's temporary.
otoh, if we're specifying that subscripts are only used for credences anyway, there's no reason for us to invoke the redundant "I think" image. instead, write
it'll rain tomorrow
in fact, the whole circumfix operator is gratuitously verbose![1] just write:
rain tomorrow
so to quantify sentence S, i prefer ur suggestion "I think it'll rain tomorrow". the percentage is supposed to modify "I think" anyway, so it makes more sense to make them adjacent. it's just more work bc it's novel syntax, but that's temporary.
The principles you propose make a lot of sense! Dropping "I think" or "My best guess" is then for the best.
Also, the underset/underbraces stuff is promising but too much to spend weirdness points on.
I think the goal of making communicating (un-)certainties costs less bandwidth is a worthy one, and quite like this proposal. I think I would have understood exactly what it meant without an explanation, but by explaining first this post never gave a chance to find out, which is a mild shame. Purely aesthetically, I would put a space between the last word of the sentence and the credence.
Hm, my aesthetics object to the space between the last word of the sentence and the credence as plenken—but then again, I'm also a fan of inordinately compact programming languages.
This is wonderful; feels much more friendly, practical, and conducive to ideal speech situations. If someone tries to attack me for a wrong probability, I can respond "I'm just talking but with additional clarity; no one is perfect."
cross-posted from niplav.github.io
Gwern has wondered about a use-case for subscripts in hypertext. While they have settled on a specific use-case, namely years for citations, I propose a different one: reporting explicit probabilities.
Explicitely giving for probabilities in day-to-day English text is usually quite clunky: "I assign 35% to North Korea testing an intercontinental ballistic missile until the end of this year" reads far less smoothly than "I don't think North Korea will test an intercontinental ballistic missile this year".
And since subscripts are a solution in need of a problem, one can wonder how well those two fit together: Quite well, I claim.
In short, I propose to append probabilities in subscript after a statement using standard HTML subscript notation (or LATEX as a fallback if it's available), with the probability possibly also being a link to a relevant forecasting platform with the same question:
This is almost as readable as the sentence without the probability.
There are some complications with negations in sentences or multiple statements. For the most part, I'll simply avoid such cases ("Doctor, it hurts when I do this!" "Don't do that, then."), but if I had to, I'd solve the first problem by declaring that the probability applies to the literal meaning of the previous sentence, including all negations; the problem with multiple statements is solved by delimiters.
As an example for the different kinds of negation: "The train won't come more than 5 minutes late90%" would (arguendo) mean the same thing as "I don't think the train will come more than 5 minutes late90%" means the same as "The train will take more than 5 minutes to arrive10%" equivalent to "I assign 90% probability to the train arriving within the next 5 minutes".
With multiple statements, my favorite way of delimiting is currently half brackets: "I think ⸤it'll rain tomorrow⸥55%, but ⸤Tuesday is going to be sunny⸥80%, but I don't think ⸤your uncle is going to be happy about that⸥15%."
The probabilities in this context aren't quite evidentials, but neither are they veridicals nor miratives, I propose the world "credal" for this category.
Enumerating Possible Notations
The exact place of insertion is subtle: In sentences with a single central statement, there are multiple locations one could place the probability.
This becomes trickier in sentences with multiple statements.
Since the people writing the text reporting probabilities are probably logically non-omniscient bounded agents, it might as well be useful to report the time or effort one has spent on refining the reported probability: "I reckon humanity will survive the 21st century55%:20h", indicating that the speaker has reflected on this question for 20 hours to arrive at their current probability (something akin to reporting an "epistemic effort" for a piece of information). I fear that this notation is getting into cumbersome territory and won't be using it.
Notation Options and Difficulties
There are three available options: Either ones writing platform supports HTML, in which case one can use the
<sub>18</sub>
tags (giving 18%), or it supports LATEX, which creates a sligthly fancier looking but also more fragile notation using_{18\%}
(resulting in 18%), or ones platform directly supports subscripting, such as pandoc with~18%~
, but not Reddit Markdown (which does support superscript). More info about other platforms here.Ideally one would simply use Unicode subscripts, which are available for all digits, but tragically not for the percentage sign '%' or a simple dot '.'. Perhaps a project for the future: After all, they did include a subscript '+'₊, a subscript '-'₋, equality sign '='₌ and parentheses '()'₍₎, but many subscript letters (b, c, d, f, g, j, q, r, u, v, w, y and z) are still missing…
Applications
I've used this notation sparingly but increasingly, a good example of a first exploration is here.
Fischer 2023 uses a different notation:
The notation proposed here would change the text: