I finished creating the 2012 edition of the Best of Rationality Quotes collection. (Here is last year's.)
Best of Rationality Quotes 2012 (500kB page, 434 quotes)
and Best of Rationality Quotes 2009-2012 (1200kB page, 1140 quotes)
The page was built by a short script (source code here) from all the LW Rationality Quotes threads so far. (We had such a thread each month since April 2009.) The script collects all comments with karma score 10 or more, and sorts them by score. Replies are not collected, only top-level comments.
As is now usual, I provide various statistics and top-lists based on the data. (Source code for these is also at the above link, see the README.) I added these as comments to the post:
- Top quote contributors by total karma score collected
- Top quote contributors by karma score collected in 2012
- Top quote contributors by statistical significance level (See this comment for a description of this metric.)
- Top original authors by number of quotes
- Top original authors by total karma score collected
Top original authors by number of quotes. (Note that authors and mentions are not disambiguated.)
Top original authors by karma collected.
Top quote contributors by total karma score collected:
Top quote contributors by statistical significance level:
Top quote contributors by karma score collected in 2012:
The link to this year's best of collection is giving me a 404 Not Found error. The 2009-2012 collection is accessible however.
Fixed, thanks!
Thank you for the data and collections of quotes! However, the link to the source code is pointing to the directory where you have the html files for the 2012 and 2009-2012 "Best of" collections, not to any .zip or .gz of the source code itself, and it seems to be pulling up a default page with the unstyled HTML version of the 2012 collection.
I removed the broken index.html, sorry. Now you can see the whole (messy) directory. The README is actually a list of commands with some comments, the source code consists of parse.py and convolution.py.
I reformatted "Best of Rationality Quotes 2009-2012" and put them into a fortunes file, which is available here.
Neat. Does the quote karma follow something like the exponential distribution?
I tried some stuff in R. While it looks exponential, none of the code or fitting functions gave good results on the highest-karma quotes - I guess because all the other thousand quotes look so linear. Of course, I could have just messed up in any of the following:
Open http://people.mokk.bme.hu/~daniel/rationality_quotes_2012/rq.html in Firefox; C-a; then:
It is roughly exponential in the range between 3 and 60 karma.
You can find the raw data here.
Edit: I didn't spot gwern's more careful analysis. I am still digesting it. gwern, you should use the above link, it contains the below-10 quotes, too.
The extra data doesn't seem to make much difference:
Eyeballing it, looks like the previous fit crosses around 40.
The fit looks much better:
I am afraid I don't understand your methodology. How is a rank versus value function supposed to look like for an exponentially distributed sample?
How else would you do it?
When I stated that the middle is roughly exponential, this was the graph that I was looking at:
d <- density(karma)
plot(log(d$y) ~ d$x)
I don't do this for a living, so I am not sure at all, but if I really really had to make this formal, I would probably use maximum likelihood to fit an exponential distribution on the relevant interval, and then Kolmogorov-Smirnoff. It's what shminux said, except there is probably no closed formula because the cutoffs complicate the thing. And at least one of the cutoffs is really necessary, because below 3 it is obviously not exponential.
I expected something like this or the section thereafter.