(I wrote this post for my own blog, and given the warm reception, I figured it would also be suitable for the LW audience. It contains some nicely formatted equations/tables in LaTeX, hence I've left it as a dropbox download.)

Logarithmic probabilities have appeared previously on LW here, here, and sporadically in the comments. The first is a link to a Eliezer post which covers essentially the same material. I believe this is a better introduction/description/guide to logarithmic probabilities than anything else that's appeared on LW thus far.

Introduction:

Our conventional way of expressing probabilities has always frustrated me. For example, it is very easy to say nonsensical statements like, “110% chance of working”. Or, it is not obvious that the difference between 50% and 50.01% is trivial compared to the difference between 99.98% and 99.99%. It also fails to accommodate the math correctly when we want to say things like, “five times more likely”, because 50% * 5 overflows 100%.

Jacob and I have (re)discovered a mapping from probabilities to log- odds which addresses all of these issues. To boot, it accommodates Bayes’ theorem beautifully. For something so simple and fundamental, it certainly took a great deal of google searching/wikipedia surfing to discover that they are actually called “log-odds”, and that they were “discovered” in 1944, instead of the 1600s. Also, nobody seems to use log-odds, even though they are conceptually powerful. Thus, this primer serves to explain why we need log-odds, what they are, how to use them, and when to use them.

Comments:

Log base ten may be more intuitive for conversion purposes. Then adding another 9 corresponds to adding 1.

"Five times more likely"

shouldoverflow for probabilities greater than 0.2. This is because the terminology "times more likely" is usually used in the context of decision-making, so it manipulates the linear probabilities because that's what goes into the expected utility.Yeah, I was definitely thinking about that. The mathematician in me won out in the end.

It occurs to me that a lot of people have probably thought about this, and they have alternately used base 2, base e, and base 10. Unless we get the entire LW community to standardize on one base, we won't be able to coherently communicate with one another using log-probabilities, and therefore log-probabilities will stay relegated to the dustbin.

base 2 - advantages, we can talk about N bytes' worth of evidences.

base e - mathematician's base

base 10 - common layperson can understand it, advantages with the 9's and 0's.

Actually, I think you're right, log base 10 is probably better. If others agree, I'll rewrite the article in base 10.

What's the specific benefit of base

efor log-odds, though? Baseehas lots of special properties that make it useful in many areas of mathematics (e^xis its own derivative, de Moivre's formula,&c.), but is this one of them? (It could be; I don't know.)To quote Jaynes, p.91 of PT:TLoS:

So to answer your question, the only advantage of base e is that "ln" looks tidier than "log10".

Apart from being more intuitively understandable to humans, using base 10 also allows us to multiply by 10 and measure evidence in the familiar unit of decibels.

The natural unit of ratio, the neper (Np), is easier to interpret for small ratio contributions, where the derivative of exp(x) is ≈1:

-0.1Np = exp(-0.1) ∶ 1 ≈ 0.9 ∶ 1

This could make for an easy upgrade path to use of nepers or centinepers instead of percents in comparatives involving rates, which would reduce semantic confusion. "50% faster" can mean "gets 150% as far" (so .41Np faster, or 41 cNp, or perhaps 41Np%) or "takes 50% as much time" (so .69Np faster, or 69cNp, or 69Np%). That's an argument for using nepers as a standard base outside communications of probability.

(trivia: Nepers and radians are each other turned sideways, being respectively the real and imaginary parts of eigenvalues of linear differential equation systems.)

Wouldn't it be easier to talk about N bytes worth of evidence in base 256?

Bitsof evidence seems the more useful metric!Article is rewritten in base 10, and I rewrote some of the explanation for Bayesian updates. Enjoy!

I would like to see the article in base 10.

I don't think this word means what you think it means.

(Also I didn't know you were on Less Wrong. I had previously plugged this summary of log-odds on my blog and was considering mentioning it here.)

Sorry for the necro -- the linked article is 404'd. I uploaded a backup here. I didn't find it on the author's site but did find a copy through Web Archive; still, maybe my link will save someone else the hassle.

Good work! You might mention that the reason why log-odds are awful for things like adding probabilities of two disjoint events is that there's not a nice formula for log(x+y). That's the price of turning multiplication into addition.

I find it interesting that you lack familiarity with log-odds? What field are you in? Statisticians will usually be familar with them, as the logit is the canonical link function for the binomial function when using general linear modeling. Cut of (some) jargon, if I have a data set with binomial outcomes, and I wish to model my data as having normal errors, and the predictors as having linear effect on the outcome, I'd convert my data by using log odds. So, for instance, if I was looking at age as a predictor for diabetes (which is a yes no outcome)

I have a very strong competition math background from high school, but my primary field is chemistry.

Of all the weird coincidences - I rediscovered this myself the week before last. (likewise inspired by previous LW discussion of log-odds, which seemed intuitively correct but not rigorously or symmetrically defined...)

What I failed to do, shamefully in view of your example, was to write everything up concisely and clearly to share with others. Thank you for being less short-sighted or less selfish.

It's a good article for learning about log odds, but I disagree with some of the justification. Yes it is easy to say something has a 110% chance of working, but a nonsensical lie like this is better than a plausible lie which may trick you into believing it.

It seems to me that this doesn't have any real advantage over odds ratios. If I want to do a Bayesian update, I multiply the odds by the relative likelihood. In the example in the article (1/10,000 chance of having the disease, 3% false positive, and 1% false negative), You just take 1:9999 and multiply it by 0.99/0.03 = 33:1 for each successful test. Then you have 33:9999 = 1:303, then 33:303 = 11:101, and finally 363:101 for the final test. Then to change back, you just take 363/(363+101) = 78.23%. The calculations are slower (two multiplications vs. one addition), but it's much easier and more intuitive to convert between them and traditional probabilities.

What you've described is in fact, exactly the same thing as log-odds - they're simply separated by a logarithm/exponentiation. Thus, all the multiplications you describe are the counterpart of the additions I describe. I agree, we could work with odds ratio, without taking the logarithm - but using logarithms has the benefit of linearizing the probability space. The distance between 1 L% and 5 L% is the same as the distance between 10 L% and 14 L%, but you wouldn't know it by looking at 2.72:1 and 150:1 versus 22,000:1 and 1,200,000:1.

Pick up Jaynes'

Probability Theoryand turn to the section ondecibels of evidence, an even more convenient measure. Or for a summary see Eliezer's 0 And 1 Are Not Probabilities in the sequences.(Downvoted; the OP already linked to that exact post.)