Mentioned in

Case Study: the Death Note Script and Bayes

6AlexSchell

1gwern

2AlexSchell

0gwern

6Sniffnoy

0Sniffnoy

0Kaj_Sotala

4Kindly

0gwern

4Kindly

0gwern

2Kindly

0gwern

2Unnamed

5benelliott

0gwern

0benelliott

0gwern

0benelliott

5pleeppleep

9gwern

4Nisan

2[anonymous]

-2A1987dM

0gwern

0A1987dM

3beoShaffer

3gwern

3Kindly

3gwern

0MinibearRex

0gwern

0gwern

0gwern

0gwern

0gwern

0gwern

-1A1987dM

24Kaj_Sotala

0A1987dM

0ygert

12gwern

19dhoe

-1ygert

New Comment

44 comments, sorted by Click to highlight new comments since: Today at 10:53 PM

Nicely done. Since this was presumably partly intended as a Bayes tutorial, it might benefit from an explanation of the role your assumption of conditional independence plays in your calculations, and how much more complicated this would have been without that assumption.

Speaking of this, I personally would have liked a back-of-the-envelope calculation on how much of an effect the independence assumption has on your results, maybe by differentiating between "highly competent fake" and "normal fake" hypotheses and continuing to assume independence.

how much more complicated this would have been without that assumption.

I'll add a footnote mentioning it.

Speaking of this, I personally would have liked a back-of-the-envelope calculation on how much of an effect the independence assumption has on your results, maybe by differentiating between "highly competent fake" and "normal fake" hypotheses and continuing to assume independence.

I'm not sure what that calculation would look like. I don't think I've ever tried conditionals before.

I would have thought more than a footnote would have been helpful. To avoid lazy other-optimizing, I've written some content below which you may use/adapt/modify as you see fit.

The odds form of Bayes' theorem is this:

P(a|b)/P(~a|b) = P(a)/P(~a) x P(b|a)/P(b|~a)

In English, the ratio of the posterior probabilities (the *posterior odds* of *a*) equals the product of the ratio of the prior probabilities and the likelihood ratio.

What we are interested in is the likelihood ratio p(e|is-real)/p(e|is-not-real), where *e* is all external and internal evidence we have about the DN script.

*e* is equivalent to the conjunction of each of the 13 individual pieces of evidence, which I'll refer to as *e1* through *e13*:

*e* = *e1* & *e2* & ... & *e13*

So the likelihood ratio we're after can be written like this:

p(e|is-real)/p(e|is-not-real) = p(e1&e2&...&e13|is-real)/p(e1&e2&...&e13|is-not-real)

I abbreviate p(b|is-real)/p(b|is-not-real) as LR(b), and p(b|is-real&c)/p(b|is-not-real&c) as LR(b|c).

Now, it follows from probability theory that the above is equivalent to

LR(e) = LR(e1) * LR(e2|e1) * LR(e3|e1&e2) * LR(e4|e1&e2&e3) * ... * LR(e13|e1&e2&...&e12)

(The ordering is arbitrary.)

Now comes the point where the assumption of conditional independence simplifies things greatly. The assumption is that the "impact" of each evidence (i.e. the likelihood ratio associated with it) does not vary based on what other evidence we already have. That is, for any evidence *ei* its likelihood ratio is the same no matter what other evidence you add to the right-hand side:

LR(ei|c) = LR(ei) for any conjunction *c* of other pieces of evidence

Assuming conditional independence simplifies the expression for LR(e) greatly:

LR(e) = LR(e1) * LR(e2) * LR(e3) * ... * LR(e13)

On the other hand, the conditional independence assumption is likely to have a substantial impact on what value LR(e) takes. This is because most pieces of evidence are expected to correlate positively with one another instead of being independent. For example, if you know that the script is a 20,000 word long Hollywood plot and that the stylometric analysis seems to check out, then if you are dealing with a fake script (*is-not-real*) it is an extremely elaborate fake, and (e.g.) the PDF metadata are almost certain to "check out" and so provide much weaker evidence for *is-real* than the calculation assuming conditional independence suggests. On the other hand, the evidence of legal takedowns seems unaffected by this concern, as even a competent faker would hardly be expected to create the evidence of takedowns.

[The suggested back-of-the-envelope calculation could go along the lines of the last paragraph, or as I said in the grandparent you might get rid of most of the problematic correlations by considering 2-3 hypotheses about the faker's level of skill and motivation (via a likelihood vector instead of ratio). My own guess is that stylometrics pretty much screens off all other internal evidence as well as dating and (most of) credit, but leaves takedown unaffected.]

Note to self: consider testing the obvious conspiracy theory here.

OK, I think the correct probability here is 1/57. According to OEIS (it cites Stanley as a reference; I haven't taken the time to try to understand why this would be the case), the number of unordered binary trees on a set of n+1 labelled leaves is given by 1*3*...*(2n-1). If we want to count how many of these have two particular leaves directly next to each other, well, we're essentially merging them into one super-leaf; thus we want the same thing on one fewer leaf. Hence the number we want is (1*3*...*55)/(1*3*...*57)=1/57. More generally, if we had n leaves, we'd have 1/(2n-3).

**Edit**: OK, not going to write out the whole thing here unless someone really wants, but for those skeptical of the above formula, you can prove it with exponential generating functions.

But you don't add the different probabilities for where the first item can be. No matter where the first item in the pair occurs, there is a 1/29 chance the second item will be next to it.

Another way of thinking about it. For any given item, there are 29 other items. Only one of these can be paired with the first, and all these events are equally likely. The probabilty has to be 1/29 and not 1/15, because 29 copies of 1/15 don't add up to 1.

Actually, the probability is slightly lower, because some items are not leaves at all. If we take the tree in the article as representative, then we expect roughly 10 pairs among the 30 items, which gives a probability of 2/87: with probability 2/3, the first item ends up as half of a pair, and with probability 1/29, the second item ends up as the other half of that same pair.

In the movie subtree, we have 12 items, so the probability of being paired is 2/33 rather than 1/6.

Edit: Laplace-adjusting the "is a random item in a pair" probability, we get 11/32 as an estimate instead, and 1/16 for the final answer. Note that because of the reasonably large sample size, this doesn't make a huge difference.

there is a 1/29 chance the second item will be next to it.

'Next to it', perhaps, but wouldn't that other alternative be putting it on an entirely different branch and so less similar as it's not in the same cluster? `movie-fearandloathing`

may be 'next to' `fanfiction-remiscent-afterthought-threecharacters`

in the clustering, but not nearly as similar to it as `movie-1492conquestparadise`

... so I think that analysis is less right than my own simple one.

Not all items have another item paired with them, which is where the correction factor of 2/3 comes from.

Ah, I see. I'm not sure how I should deal with the non-pairing or multiple node groups; I didn't take them into account in advance, and anything based on observing the tree that *was* generated feels ad hoc. So if the odds of the pairing given random chance is overestimated, that means the strength of the pairing is being underestimated, right, and the likelihood ratio is weaker than it 'should' be? I'm fine with leaving that alone: as I said, when possible I tried to make conclusions as weak as possible.

If you took 30 people, and randomly put them into 15 pairs, then the probability that Person A would be paired with Person Z is 1/29. Person A is equally likely to be paired with any of the 29 other people.

If you took 15 women & 15 men, and randomly put them into 15 woman-man pairs, then the probability that Woman A would be paired with Man Z is 1/15. Woman A is equally likely to be paired with any of the 15 men.

The stylometrics analysis resembles the former situation, with p=1/29. The script could've been paired with any of the 29 other items.

On thing that struck me, using Bayes separately on all those pieces of evidence assumes independance, but it seems that conditioning on it being a fake, lots of the observations used as evidence all correlate with the faker being generally competent and fastidious, e.g. the sort of person who would get the address right is more likely to also get the authorship, formatting, PDF software and timezone right.

Yes, that was an error; I actually made a counterbalancing error there, where I flipped two arguments in the last two... My own ineptitude never ceases to impress me sometimes. (It's a good thing that was a hypothetical section that wasn't used in the full chained of posterior/prior calculations, because I'd've hated to have to redo them all. Again.)

We finish with high confidence in the script's authenticity

If you're already familiar this particular leaked 2009 live-action script, please write down your current best guess as to how likely it is to be authentic.

Unless someone already tried to come up with an explicit probability, this ordering will bias the results. Ask people for their guesses before you tell them what you have already written on the subject.

HN submission: http://news.ycombinator.com/item?id=5010846 >30 comments; hit #1 on the front page.

I disagree: Bayes is a big part of Less Wrong, and this is an excellent worked out example of how one could try to apply it in practice. If my pretty-poorly-written, qualitative-claims-only Applied Bayes' Theorem: Reading People got promoted, so should this.

Look, this is certainly a interesting post, and I enjoyed reading it. But that is not a sufficient criterion for a post being in Main. Compare this to the other recent posts in Main, and you will see a big stylistic difference. A worked out example of using Bayes is very interesting and insightful, but it is not anything "new". To use an analogy, if the other posts in Main are the content of a textbook, this is one of the worked-out sample exercises to show you how the exercises in the book are actually done. That is no less valuable, but it is simply not the same class, and a distinction is necessary.

I've never seen this distinction before, and I don't think my essay is remotely like the usual fare of Discussion.

EDIT: especially if something like http://lesswrong.com/lw/g7y/morality_is_awesome/ gets *3*x the net upvotes...

I think it does. Bayes gets mentioned a lot around here, but there are not that many clear and accessible examples on how to go and analyze a real question; I recently read *Proving History*, despite no particular interest in the topic (Jesus' historicity), just to get a better idea of how people do it in practice.

"Who wrote the

Death Notescript?"If you're already familiar this particular leaked 2009 live-action script, please write down your current best guess as to how likely it is to be authentic.

This is intended to be easy to understand and essentially beginner-level for Bayes's theorem and fermi estimates, like my other

Death Noteessay (information theory, crypto) or my console insurance page (efficient markets, positive psychology, expected value).Be sure to check out the controversial twist ending!

(I'm sorry to post just a link, but I briefly thought about writing it and all the math in the LW edit box and decided that cutting my wrists sounded both quicker and more enjoyable. Unfortunately, there seems to be a math problem in the Google Chrome/Chromium browser where fractions simply don't render, apparently due to not enabling Webkit's MathML code; if fractions don't render for you, well, I know the math works well in my Iceweasel and it seems to work well in other Firefoxes.)