Book: AKA Shakespeare (an extended Bayesian investigation)

26Benya

6Dentin

23Benya

22gwern

5JoshuaZ

5gwern

2Dentin

3ESRogs

4gwern

6novalis

6JoshuaZ

4novalis

3Douglas_Knight

New Comment

Beatrice and Claudia end up agreeing that the leading candidate is de Vere, with Ignotus second and Stratford a very distant third. Beatrice’s entries lead to a final probability of 10−13 (one chance in ten million million) that Shakespeare was the gentleman from Stratford-upon-Avon. Claudia’s entries lead to an even smaller probability.

I don't think I want to use this book to try to learn how to produce to well-calibrated probability estimates.

Why not? The first time I cranked through an example of this nature, it was -extremely- educational. If you're beyond this level, that's fine, but in that case your message doesn't contribute anything to the conversation.

(Okay, I'll unpack the implication:) Assigning a probability of 10^(-3) would mean being really, really, really, really sure that the hypothesis is wrong. To be well-calibrated, you would have to be able to make ten thousand similar judgments with similar strengths of evidence and only be wrong about ten times, and if you can do that, you're *very* good at this sort of thing.

Assigning 10^(-13) -- i.e., suggesting that you're so good that you can do this and only be wrong one in ten million million times -- is just obviously wrong.

So I was implying that the fact that the book suggests that this kind of number can be a plausible outcome means that it isn't a very good place to learn the art of making Bayesian probability estimates. To learn to make well-calibrated estimates, I should try to learn from people who stand a snowball's chance in hell of making such estimates themselves.

For an example from someone who has a claim to actually being *good* at this sort of thing, see Gwern's Who wrote the Death Note script?.

Assigning 10^(-13) -- i.e., suggesting that you're so good that you can do this and only be wrong one in ten million million times -- is just obviously wrong.

And worse:

Claudia’s entries lead to an even smaller probability.

Yes... 10^-13 is incredibly absurd. It isn't consistent with our background knowledge: there aren't even that many books - Google estimates there's something like 400m, and there's only 2m being added per year, so even if each book had 1 unique author (no one published multiple books etc) there still wouldn't be that many authors and 1 error of identifying authors (for an error rate of 1/400m) would blow away the confidence interval for being able to do <10^-13. We don't have that level of confidence in *undisputed* authors being who we think they are, because they could turn out to be someone else (see: the entire ghostwriting industry for all of history)! Any method which produces such an extreme confidence has simply disproven itself.

Without reading the book, my guess is that all the differentials are systematically exaggerated upwards by perhaps an order of magnitude - for example, the emphasis put on education strikes me as playing on naive beliefs and overconfidence, and if anyone did a systematic sample of undisputed Elizabethan and pre-Elizabethan writers, education would be found to be far weaker than whatever odds the characters give - and each assumed to be independent & uncorrelated, without awareness of how this biases upwards the results. (I also did this in my essay but I specifically highlighted it as a serious issue and tried to counter it with low estimates; so I wound up with high but not absurdly high estimates that I found intuitively acceptable after adjusting down only a relatively small amount.)

Which of course is not to say that the book couldn't be educational and interesting, but it should definitely be approached with an adversarial attitude of 'this is wrong; let me see what I can learn from it and how it went wrong for my own analyses'.

The education thing is actually an old issue. When the idea that Shakespeare didn't write the plays first started coming up in the 19th century, it was heavily based on the education argument, which to some extent was possibly a proxy for British class issues- people in the nobility and upper classes not liking the idea that he wrote the plays. That aspect is still heavily present in a lot of the arguments about this.

Yes, I've heard the claim made that a man with 'small Latin and smaller Greek' (or however that went) could not have written Shakespeare's plays; having read them, I don't find the claim at all compelling, but my assumption is that by this point in the controversy, *someone* has compiled a representative selection of authors and estimated their education which would allow a direct empirical estimate of what the true correlation is.

Thanks for the clarification. I generally consider that kind of 'failure in the analysis process' to be a second order effect, something to be taught after the audience is familiar and comfortable with handling the numbers at all. While a little bit of knowledge is dangerous, it's a phase everyone must pass through and is unavoidable.

a claim to actually being

goodat this sort of thing

If I'm reading the chart on that page correctly, Gwern is *extremely* well calibrated. Is the accuracy row for each confidence column telling us what fraction of predictions Gwern assigned a given confidence to have been right? He's got 50% - 44%, 60% - 64%, 70% - 71%, 80% - 83%, 90% - 92%, and 100% - 96%. That's incredible!

Is the accuracy row for each confidence column telling us what fraction of predictions Gwern assigned a given confidence to have been right?

Yes, something like that. I forget the exact details of how it bins.

That's incredible!

Thank you. That's years of practice and some useful heuristics at work there.

Well-calibrated means that your certainty matches your odds of correctness. Do we really think that Beatrice can make ten trillion statements of this form and have only one of them be wrong? Even if she uses "Bayesian" methods? Or, if you prefer the wagering approach -- do you really think she would bet at those odds?

The point is that these are extremely small probabilities even in more rigorous areas. The idea that one can get probabilities on the order of 10^-13 strongly suggests that something is going wrong here, probably with a few helpings of motivated cognition.

I applaud the authors for putting their argument in a bayesian form. At the very least, it forces them to make their argument explicit. Also, in theory it lets one describe disagreement in terms of values of numbers (priors and conditional probabilities). However, I suspect that this is rarely going to happen in historical disagreements and the qualitative structure will capture most of the disagreement. (In principal, anyone should be able to supply numbers someone else's structure, but I'm skeptical that it is a useful comparison.)

Disclaimer:I have not read this book. I'm posting it in the expectation that others may enjoy it as much as I'm sure I would if I had time to read it myself.This looks interesting as an extended worked example of Bayesian reasoning (the "scientific approach" of the title).

Edited to add:There are many signs in the above block of text that this book is not up to Lesswrong standards. As gwern suggests, reading it should be done with an adversarial attitude.

I propose some more useful goals than finding someone for whom we can cheer loudly as a properly qualified member of our tribe: find worked examples that let you practice your art; find structured activities that will actually lead you to practice your art; try to critically assess arguments that use the tools we think powerful, then discuss your criticism on a forum like Lesswrong where your errors are likely to be discovered and your insights are likely to be rewarded (with tasty karma).