Best of Rationality Quotes 2009/2010

Best of Rationality Quotes 2009/2010 (Warning: 750kB page, 774 quotes)

The year's last Rationality Quotes thread has calmed down, so now it is a good time to update my Best of Rationality Quotes page, and write a top post about it. (The original version was introduced in the June 2010 Open Thread.)

The page was built by a short script (source code here) from all the LW Rationality Quotes threads so far. (We had such a thread each month since April 2009.) The script collects all comments with karma score 4 or more, and sorts them by score.

There is a minor complication: The obvious idea is to consider only top-level comments, that is, comments that are not replies to other comments. Unfortunately, good quotes are sometimes replies to other quotes. Of course, even more often, replies are not quotes. This is a precision-recall trade-off. Originally I went for recall, because I liked many replied quotes such as this. But as JGWeissman noted in a comment below, to build the precise version, only a trivial modification of my script is needed. So I built it, and I preferred it to the noisy version after all. So now at the top of this post we have the filtered version, and here is the original version with even more good quotes, but also with many non-quotes:

Best of Rationality Quotes 2009/2010, including replied comments (Warning: 1.3MB page, 1358 quotes)


UPDATE: I changed the links and rewrote the above when I decided to filter replied comments.

UPDATE 2: Added a comment listing the personal quote collection pages of top quote contributors.

UPDATE 3: Responding to various requests by commenters, added several top-lists:

48 comments, sorted by
magical algorithm
Highlighting new comments since Today at 9:24 PM
Select new highlight date

Unfortunately some of the collected comments are not actual quotes, but highly voted replies to quotes, the script can't tell the difference.

It looks like all reply comments are contained (indirectly) by a div with a class name of "child", and top level comments are not. So your script could filter replies by ignoring anything contained in a "child" div.

I investigated this, but it turns out many good quotes are replies. I preferred false positives to false negatives.

I quickly implemented your suggestion, had a look, and decided that I prefer the cleaner version after all. Many-many good quotes are lost this way, but we don't have to think of that while browsing the list. :) I rewrote the article accordingly.

While you have the software open... :-)

Top average score? (total / number of quotes)

Top people quoted?

Top people quoted?

To properly do this, you have to do named entity recognition and normalization. I just collected the most frequent capitalized words, threw away the ones recognized by my morphological analyzer, and did a small amount of manual postprocessing. Note that Bacon, Wells and Hawking are recognized by my morphological analyzer.

  • 16 Russell
  • 12 Nietzsche
  • 12 Feynman
  • 11 Pratchett
  • 10 Einstein
  • 9 Chesterton
  • 9 Asimov
  • 8 Taleb
  • 8 Scott
  • 8 Johnson
  • 8 Heinlein
  • 8 Dennett
  • 7 Wilson
  • 7 Voltaire
  • 7 Dawkins
  • 6 Thoreau
  • 6 Rochefoucauld
  • 6 Neumann
  • 6 Marx
  • 6 Gould
  • 6 Dijkstra
  • 6 Binmore
  • 5 Jaynes
  • 5 Huxley
  • 5 Galileo
  • 5 Egan
  • 5 Descartes
  • 5 Darwin
  • 5 Buffett
  • 5 Ayn
  • 5 Aristotle
  • 4 Yudkowsky
  • 4 Wittgenstein
  • 4 Wilde
  • 4 Thompson
  • 4 Suzumiya
  • 4 Simpson
  • 4 Schopenhauer
  • 4 Sagan
  • 4 Rommel
  • 4 Rollins

You definitely used up all your wishes. :) The above list reordered by total karma collected:

  • 158 Russell
  • 109 Pratchett
  • 106 Asimov
  • 101 Dennett
  • 100 Chesterton
  • 82 Buffett
  • 81 Egan
  • 79 Nietzsche
  • 77 Feynman
  • 72 Voltaire
  • 66 Scott
  • 66 Neumann
  • 66 Descartes
  • 61 Heinlein
  • 59 Dijkstra
  • 58 Marx
  • 57 Aristotle
  • 52 Darwin
  • 49 Galileo
  • 48 Einstein
  • 46 Taleb
  • 46 Binmore
  • 45 Johnson
  • 43 Jaynes
  • 42 Rollins
  • 39 Sagan
  • 34 Wilde
  • 34 Dawkins
  • 28 Gould
  • 25 Wilson
  • 25 Rochefoucauld
  • 23 Huxley
  • 22 Ayn
  • 15 Simpson
  • 13 Wittgenstein
  • 13 Schopenhauer
  • 12 Yudkowsky
  • 12 Thoreau
  • 11 Thompson
  • 8 Suzumiya
  • 2 Rommel

OK. Who quoted Yudkowsky? Hopefully it was quotes from elsewhere. :)

Hacker News, for one - I don't know where the other eight points may be from.

Edit: Six more points from Methods of Rationality

You definitely used up all your wishes. :)

Yeah, I wasn't precise enough on that second wish. Oh well, World Peace will have to wait.

While you have the software open... :-)

The source code is open, too. :) Anyway:

Top average score:

  • 54 in 1: michaelkeenan
  • 23 in 1: Vlad
  • 22.6667 in 3: Tesseract
  • 22 in 1: DaveInNYC
  • 20 in 1: CSmith
  • 19.5 in 2: knb
  • 19 in 1: Marcello
  • 18.8 in 5: Unnamed
  • 18.3333 in 3: Kyre
  • 18.25 in 4: sketerpot
  • 18 in 1: cata
  • 17 in 1: MarcTheEngineer
  • 16 in 3: Hariant
  • 16 in 1: Tyrrell_McAllister
  • 16 in 1: CaptainOblivious2
  • 15.5294 in 17: Yvain
  • 15.5 in 4: Lightwave
  • 15 in 1: teageegeepea
  • 15 in 1: Patrick
  • 15 in 1: Nisan
  • 15 in 1: loqi
  • 15 in 1: Automaton
  • 14 in 3: MichaelHoward
  • 14 in 3: jaimeastorga2000
  • 14 in 1: torekp
  • 14 in 1: sparrowsfall
  • 14 in 1: Sniffnoy
  • 14 in 1: Shalmanese
  • 14 in 1: Kobayashi
  • 14 in 1: bogus
  • 13.5 in 4: komponisto
  • 13.5 in 4: Apprentice
  • 13.5 in 2: JamesAndrix
  • 13.2857 in 21: RobinZ
  • 13.1481 in 27: MichaelGR
  • 13 in 2: BenAlbahari
  • 13 in 1: KatjaGrace
  • 13 in 1: josht
  • 12.8571 in 7: Kutta
  • 12.5714 in 7: wuwei

This reflects particularly well on Yvain, Robin and Michael, all of whom managed to be both prolific and reliable in providing value with their quotes. I'm trying to think of a suitable metric by which I can formalise my intuitive evaluation.

I consider quotes with 0 votes to be a net negative contribution and it also raises the chance that other quotes by the poster are faux-wisdom. That is, that they appear deep at first glance for a casual reader but wouldn't stand up to scrutiny by someone who is paying close attention to actual meaning. That is, I would rate the comments that are posted via an 'accuracy by volume' approach as even worse than the average suggests because it signals a greater degree of superficiality bias.

Above considerations aside volume does provide some degree of increased value. In considering the question "Which contributor's page should I read in order to absorb the greatest improvement in quotey wisdom?" i may be better off with "16 in 5" than "22 in 2". On the other hand reading a "5 in 50" page may make me net sillier as I unconsciously absorb nonsense. Perhaps the ranking I'm looking for could be something as trivial as "Sum - Count * 4".

I think a good metric is this: Assuming we independently draw from the observed distribution of achieved karma scores, what is the probability that someone gets at least as much karma as Yvain when she posts as many quotes as Yvain? You can calculate this by iterated convolution. The assumption of total independence heavily favors Yvain, but I am fine with that.

I loaded the actual observed distribution, and calculated this score:

  • 0.00008 (12.48 in 54): Rain
  • 0.00066 (15.53 in 17): Yvain
  • 0.00128 (13.15 in 27): MichaelGR
  • 0.00174 (54.00 in 1): michaelkeenan
  • 0.00312 (13.29 in 21): RobinZ
  • 0.00766 (22.67 in 3): Tesseract
  • 0.00836 (18.80 in 5): Unnamed
  • 0.01499 (18.25 in 4): sketerpot
  • 0.02368 (10.15 in 47): Eliezer_Yudkowsky
  • 0.02473 (18.33 in 3): Kyre
  • 0.03460 (19.50 in 2): knb
  • 0.03831 (15.50 in 4): Lightwave
  • 0.04265 (23.00 in 1): Vlad
  • 0.04817 (16.00 in 3): Hariant
  • 0.05266 (12.86 in 7): Kutta
  • 0.05396 (22.00 in 1): DaveInNYC
  • 0.06051 (12.57 in 7): wuwei
  • 0.06789 (20.00 in 1): CSmith
  • 0.07663 (13.50 in 4): Apprentice
  • 0.07663 (13.50 in 4): komponisto
  • 0.08094 (19.00 in 1): Marcello
  • 0.08622 (14.00 in 3): jaimeastorga2000
  • 0.08622 (14.00 in 3): MichaelHoward
  • 0.09554 (11.38 in 8): billswift
  • 0.10009 (18.00 in 1): cata
  • 0.11401 (17.00 in 1): MarcTheEngineer
  • 0.12449 (8.77 in 81): RichardKennaway
  • 0.12763 (12.00 in 4): SilasBarta
  • 0.13055 (16.00 in 1): CaptainOblivious2
  • 0.13055 (16.00 in 1): Tyrrell_McAllister
  • 0.13092 (13.50 in 2): JamesAndrix
  • 0.13828 (12.33 in 3): Randaly
  • 0.14534 (15.00 in 1): Automaton
  • 0.14534 (15.00 in 1): loqi
  • 0.14534 (15.00 in 1): Nisan
  • 0.14534 (15.00 in 1): Patrick
  • 0.14534 (15.00 in 1): teageegeepea
  • 0.14695 (13.00 in 2): BenAlbahari
  • 0.15183 (10.83 in 6): DSimon

I don't quite understand the methodology - how do you determine the karma distribution for each poster? And how is the list sorted?

I am afraid I don't understand either of your questions. I work with the karma distribution only in the quotes domain. It doesn't have to be determined, I collected all the data myself. The list is sorted by p-value.

We have the total list of quotes, with scores and posters. We know that Kutta scored 90 points from 7 quotes. Our null hypothesis is that he randomly selected 7 quotes from the total set of 1138 quotes. The p-value is the probability that he could achieve at least 90 points by this process. If his actual method yields better scores then random drawing, then the p-value will be low.

I have very low opinion of classical frequentist statistics, but it seemed to be very suitable for this task. I am sure that there is already a name for this method I reinvented. Of course, the null hypothesis is ridiculous, so we shouldn't assign much meaning to these numbers. It is just one of the many ways we can solve this ranking task.

Okay, that makes sense - the number is the probability that they could have picked up as many points as they did by picking randomly from the set of all quotes. I understand now.

That's brilliant. I like the theory and the ranking matches about what my intuitive manual ranking would have been too.

If I were to venture a suggestion: statistical significance may be relevant to your valuation of high-average high-number posters like Yvain, MichaelGR, and myself over higher-average low-number posters like michaelkeenan. If poorly-selected quotes nevertheless have a small but significant probability of being highly ranked (but a simultaneous large probability of being low-ranked) and most quoters select poorly, someone with only one high-rated quote is not much likelier to be a good selector of quotes than not. In contrast, someone with many quotes, most of which are highly regarded, could be expected to be unusually discerning, as the probability of this result by chance is low.

I garnered much more karma than I thought I did from the quotes; must be all the low-ranked ones since I don't have all that many highly ranked quotes.

Reading the top quotes, I found What is Wrong With Our Thoughts by David Stove again, quoted by EY. At some point I read that, but forgot the metadata and was unable to find it again when I went looking for it later.

In that chapter, written in 1991, David Stove calls for a nosology of thought - that is, a classification of the many ways human thought can go wrong. It seems to me that Less Wrong's sequences contain just such a classification, and that explaining what's wrong with the many examples about three should not be so difficult anymore.

do you need the sequences or is it enough to say that the beliefs espoused in the three examples do not pay rent? it seems to me that what is missing from the vast epistemological debate throughout human history is the idea that truth only means anything in relation to goals.

That's an interesting idea -- it might make a good page on the Less Wrong Wiki, consisting just of his "three" examples and everyone can add links to LW articles explaining the wrongness.

Personally though, I'm still stumped by most of them. :)

Such a page could also be useful for pointing out areas which haven't been discussed here yet. For instance, there is surely much to say about "The proposition that 3 is the fifth root of 243 is a tautology, just like 'An oculist is an eye-doctor.'", but I can't think of any LW posts talking about it.

The Best of Rationality Quotes 2009/2010 thread has calmed down, so now it is a good time to write a comment about how observation has changed the observed. People looking at the quote collection went on up- and downvoting comments. 173 upvotes and 11 downvotes were passed out in the last week. (Not counting votes for December 2010 quotes, which are still in flux.) Here are the comments with the greatest number of new upvotes:

The bottom of the list is less interesting, none of the quotes got more than one downvote.

Source for my quotes, by general scoring:

  • Slashdot
  • Personal quotes archive (years of internet reading)
  • iGoogle Quotes of the Day
  • Movies and TV
  • Book of Wisdom (quote book)

Unfortunately, I ran out of backlog material a couple months ago, and I find less than one good quote a month. I figure that's a very poor return on my investment for hours and hours of internet use, and suggest that others do not attempt to follow my example, unless they already have a quotes archive.

I do regret not having saved many of the good quotes from books I'm sure I've read...

With 1361 quotes your "Best of" isn't very selective.

It's not, but I find that many of the quotes lower down on the list are very good themselves. Being upvoted a lot is not just a function of being appealing to LW members, but also being posted at a time and place where many members will take notice, and I think some of these quotes were shortchanged in that regard.

When I just saw short segments of the sorted list, without karma scores, I was unable to guess whether the segment was from the top or the bottom of the list.

Being upvoted a lot is not just a function of being appealing to LW members, but also being posted at a time and place where many members will take notice, and I think some of these quotes were shortchanged in that regard.

I had the same observation. I found myself opening up rather a lot of permalinks in tabs so that I could vote a whole lot up and more than a few down a well.

We probably also lost a lot of value when it comes to quotes in replies. This one by logi is at 29 but probably is not scraped.

made a quick attempt, but conversion to the wiki format is nontrivial, and I wasn't sure it is worth the effort.

With the right tool, nothing is impossible! I believe Pandoc (available in a Debian/Ubuntu near you) should be able to handle all the Markdown->MediaWiki. eg. it converts this:

 > foo bar


 <blockquote>foo bar
 -[ baz]

| With the right tool, nothing is impossible!

-- gwern

I like it! Provided the caveat is kept in mind that sometimes getting the right tool is itself impossible...

Why I hastened to mention the precompiled and widely available binaries.

(OP is on Windows? Well, there's a limit to how far I will bend over backwards to accommodate someone using the wrong tool...)

While Windows has it's share of flaws, I can't help but wonder if a system in which it's noteworthy to have "precompiled and widely available binaries" (so the USER doesn't have to compile the app before he uses it) isn't just as wrong, only in different ways.

The thing many nix fans overlook is that most people just want to USE a computer, and one that's good enough is, well, good enough. From that perspective, nix isn't so much a tool as it is a toy: something from which people derive more entertainment than utility.

The precompiled note is because Pandoc is written in Haskell for ghc; the Haskell toolchain is far from being universally available (as, say, C toolchains using gcc are) and is somewhat immature on Mac and Windows. The Pandoc source is available, of course, but one may legitimately not wish to install everything necessary to compile the source oneself. (If Pandoc were written in C, then I might not bother specifying that there are trustworthy binaries available.) There's a reason not everyone uses source-based distros like Gentoo or Arch.

Thanks for the tip! You don't have to bend over, OP is on Ubuntu. I implemented JGWeissman's suggestion, which made the issue kind of moot, as the filtered version does not really need manual postprocessing.

Why not adding some quotes to Wikiquote? (It's also easier to update.)

Is Wikiquote worth working on? I work on Wikipedia articles because I know that they are incredibly popular and will be read by hundreds of thousands in the long run, but Wikiquote has always been a backwater. (And it's not like quotes need much updating.)

Look at their statistics:

Their most popular article is 1,994 hits a day. I can get that many hits in a month with an article on an anime: Or I can get 1k hits a day sometimes with a relatively obscure programming language:

Some not especially prominent Wikipedia articles are within a factor or order of magnitude with Wikiquote's most popular article. That's really really sad. (And if you ever search for quotes, you will rarely see Wikiquote come up, too.)

Thanks for putting this together, I'm definitely saving it for later perusal!

Apparently, I contributed 27 quotes this year.

Apparently, I contributed 27 quotes this year.

Here they are: MichaelGR's quotes.

This is not just for you, I built this for all the top 40 quote karma scorers.

MichaelGR isn't working for me - all I'm getting is a page full of names interspersed with numbers.

Edit: The same problem seems to afflict all the individual files for me

Sorry, I messed up all of them. I will correct it soon.

EDIT: Fixed.

I found 18 from me in the lists - one in a reply to another poster and one which duplicated a quote EY had posted in a top-level post.