Best of Rationality Quotes 2009/2010

DanielVarga

Best of Rationality Quotes 2009/2010 — LessWrong

32 Best of Rationality Quotes 2009/2010

by DanielVarga

18th Dec 2010

1 min read

32

Best of Rationality Quotes 2009/2010 (Warning: 750kB page, 774 quotes)

The year's last Rationality Quotes thread has calmed down, so now it is a good time to update my Best of Rationality Quotes page, and write a top post about it. (The original version was introduced in the June 2010 Open Thread.)

The page was built by a short script (source code here) from all the LW Rationality Quotes threads so far. (We had such a thread each month since April 2009.) The script collects all comments with karma score 4 or more, and sorts them by score.

There is a minor complication: The obvious idea is to consider only top-level comments, that is, comments that are not replies to other comments. Unfortunately, good quotes are sometimes replies to other quotes. Of course, even more often, replies are not quotes. This is a precision-recall trade-off. Originally I went for recall, because I liked many replied quotes such as this. But as JGWeissman noted in a comment below, to build the precise version, only a trivial modification of my script is needed. So I built it, and I preferred it to the noisy version after all. So now at the top of this post we have the filtered version, and here is the original version with even more good quotes, but also with many non-quotes:

Best of Rationality Quotes 2009/2010, including replied comments (Warning: 1.3MB page, 1358 quotes)

UPDATE: I changed the links and rewrote the above when I decided to filter replied comments.

UPDATE 2: Added a comment listing the personal quote collection pages of top quote contributors.

UPDATE 3: Responding to various requests by commenters, added several top-lists:

Rationality Quotes

Personal Blog

32

New Comment

53 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:40 PM

[-]JGWeissman16y70

Unfortunately some of the collected comments are not actual quotes, but highly voted replies to quotes, the script can't tell the difference.

It looks like all reply comments are contained (indirectly) by a div with a class name of "child", and top level comments are not. So your script could filter replies by ignoring anything contained in a "child" div.

[-]DanielVarga16y30

I investigated this, but it turns out many good quotes are replies. I preferred false positives to false negatives.

[-]DanielVarga16y10

I quickly implemented your suggestion, had a look, and decided that I prefer the cleaner version after all. Many-many good quotes are lost this way, but we don't have to think of that while browsing the list. :) I rewrote the article accordingly.

[-]DanielVarga16y60

Top quote contributors by total karma score collected:

[-]MichaelHoward16y50

While you have the software open... :-)

Top average score? (total / number of quotes)

Top people quoted?

[-]DanielVarga16y50

Top people quoted?

To properly do this, you have to do named entity recognition and normalization. I just collected the most frequent capitalized words, threw away the ones recognized by my morphological analyzer, and did a small amount of manual postprocessing. Note that Bacon, Wells and Hawking are recognized by my morphological analyzer.

16 Russell
12 Nietzsche
12 Feynman
11 Pratchett
10 Einstein
9 Chesterton
9 Asimov
8 Taleb
8 Scott
8 Johnson
8 Heinlein
8 Dennett
7 Wilson
7 Voltaire
7 Dawkins
6 Thoreau
6 Rochefoucauld
6 Neumann
6 Marx
6 Gould
6 Dijkstra
6 Binmore
5 Jaynes
5 Huxley
5 Galileo
5 Egan
5 Descartes
5 Darwin
5 Buffett
5 Ayn
5 Aristotle
4 Yudkowsky
4 Wittgenstein
4 Wilde
4 Thompson
4 Suzumiya
4 Simpson
4 Schopenhauer
4 Sagan
4 Rommel
4 Rollins

[-]MichaelHoward16y10

Cool :)

Top people quoted by total karma?

[-]DanielVarga16y70

You definitely used up all your wishes. :) The above list reordered by total karma collected:

158 Russell
109 Pratchett
106 Asimov
101 Dennett
100 Chesterton
82 Buffett
81 Egan
79 Nietzsche
77 Feynman
72 Voltaire
66 Scott
66 Neumann
66 Descartes
61 Heinlein
59 Dijkstra
58 Marx
57 Aristotle
52 Darwin
49 Galileo
48 Einstein
46 Taleb
46 Binmore
45 Johnson
43 Jaynes
42 Rollins
39 Sagan
34 Wilde
34 Dawkins
28 Gould
25 Wilson
25 Rochefoucauld
23 Huxley
22 Ayn
15 Simpson
13 Wittgenstein
13 Schopenhauer
12 Yudkowsky
12 Thoreau
11 Thompson
8 Suzumiya
2 Rommel

[-]wedrifid16y20

OK. Who quoted Yudkowsky? Hopefully it was quotes from elsewhere. :)

[-]RobinZ16y10

Hacker News, for one - I don't know where the other eight points may be from.

Edit: Six more points from Methods of Rationality

[-]MichaelHoward16y20

You definitely used up all your wishes. :)

Yeah, I wasn't precise enough on that second wish. Oh well, World Peace will have to wait.

[-]DanielVarga16y50

While you have the software open... :-)

The source code is open, too. :) Anyway:

Top average score:

54 in 1: michaelkeenan
23 in 1: Vlad
22.6667 in 3: Tesseract
22 in 1: DaveInNYC
20 in 1: CSmith
19.5 in 2: knb
19 in 1: Marcello
18.8 in 5: Unnamed
18.3333 in 3: Kyre
18.25 in 4: sketerpot
18 in 1: cata
17 in 1: MarcTheEngineer
16 in 3: Hariant
16 in 1: Tyrrell_McAllister
16 in 1: CaptainOblivious2
15.5294 in 17: Yvain
15.5 in 4: Lightwave
15 in 1: teageegeepea
15 in 1: Patrick
15 in 1: Nisan
15 in 1: loqi
15 in 1: Automaton
14 in 3: MichaelHoward
14 in 3: jaimeastorga2000
14 in 1: torekp
14 in 1: sparrowsfall
14 in 1: Sniffnoy
14 in 1: Shalmanese
14 in 1: Kobayashi
14 in 1: bogus
13.5 in 4: komponisto
13.5 in 4: Apprentice
13.5 in 2: JamesAndrix
13.2857 in 21: RobinZ
13.1481 in 27: MichaelGR
13 in 2: BenAlbahari
13 in 1: KatjaGrace
13 in 1: josht
12.8571 in 7: Kutta
12.5714 in 7: wuwei

[-]wedrifid16y00

This reflects particularly well on Yvain, Robin and Michael, all of whom managed to be both prolific and reliable in providing value with their quotes. I'm trying to think of a suitable metric by which I can formalise my intuitive evaluation.

I consider quotes with 0 votes to be a net negative contribution and it also raises the chance that other quotes by the poster are faux-wisdom. That is, that they appear deep at first glance for a casual reader but wouldn't stand up to scrutiny by someone who is paying close attention to actual meaning. That is, I would rate the comments that are posted via an 'accuracy by volume' approach as even worse than the average suggests because it signals a greater degree of superficiality bias.

Above considerations aside volume does provide some degree of increased value. In considering the question "Which contributor's page should I read in order to absorb the greatest improvement in quotey wisdom?" i may be better off with "16 in 5" than "22 in 2". On the other hand reading a "5 in 50" page may make me net sillier as I unconsciously absorb nonsense. Perhaps the ranking I'm looking for could be something as trivial as "Sum - Count * 4".

[-]DanielVarga16y40

I think a good metric is this: Assuming we independently draw from the observed distribution of achieved karma scores, what is the probability that someone gets at least as much karma as Yvain when she posts as many quotes as Yvain? You can calculate this by iterated convolution. The assumption of total independence heavily favors Yvain, but I am fine with that.

I loaded the actual observed distribution, and calculated this score:

0.00008 (12.48 in 54): Rain
0.00066 (15.53 in 17): Yvain
0.00128 (13.15 in 27): MichaelGR
0.00174 (54.00 in 1): michaelkeenan
0.00312 (13.29 in 21): RobinZ
0.00766 (22.67 in 3): Tesseract
0.00836 (18.80 in 5): Unnamed
0.01499 (18.25 in 4): sketerpot
0.02368 (10.15 in 47): Eliezer_Yudkowsky
0.02473 (18.33 in 3): Kyre
0.03460 (19.50 in 2): knb
0.03831 (15.50 in 4): Lightwave
0.04265 (23.00 in 1): Vlad
0.04817 (16.00 in 3): Hariant
0.05266 (12.86 in 7): Kutta
0.05396 (22.00 in 1): DaveInNYC
0.06051 (12.57 in 7): wuwei
0.06789 (20.00 in 1): CSmith
0.07663 (13.50 in 4): Apprentice
0.07663 (13.50 in 4): komponisto
0.08094 (19.00 in 1): Marcello
0.08622 (14.00 in 3): jaimeastorga2000
0.08622 (14.00 in 3): MichaelHoward
0.09554 (11.38 in 8): billswift
0.10009 (18.00 in 1): cata
0.11401 (17.00 in 1): MarcTheEngineer
0.12449 (8.77 in 81): RichardKennaway
0.12763 (12.00 in 4): SilasBarta
0.13055 (16.00 in 1): CaptainOblivious2
0.13055 (16.00 in 1): Tyrrell_McAllister
0.13092 (13.50 in 2): JamesAndrix
0.13828 (12.33 in 3): Randaly
0.14534 (15.00 in 1): Automaton
0.14534 (15.00 in 1): loqi
0.14534 (15.00 in 1): Nisan
0.14534 (15.00 in 1): Patrick
0.14534 (15.00 in 1): teageegeepea
0.14695 (13.00 in 2): BenAlbahari
0.15183 (10.83 in 6): DSimon

[-]RobinZ16y00

I don't quite understand the methodology - how do you determine the karma distribution for each poster? And how is the list sorted?

[-]DanielVarga16y20

I am afraid I don't understand either of your questions. I work with the karma distribution only in the quotes domain. It doesn't have to be determined, I collected all the data myself. The list is sorted by p-value.

We have the total list of quotes, with scores and posters. We know that Kutta scored 90 points from 7 quotes. Our null hypothesis is that he randomly selected 7 quotes from the total set of 1138 quotes. The p-value is the probability that he could achieve at least 90 points by this process. If his actual method yields better scores then random drawing, then the p-value will be low.

I have very low opinion of classical frequentist statistics, but it seemed to be very suitable for this task. I am sure that there is already a name for this method I reinvented. Of course, the null hypothesis is ridiculous, so we shouldn't assign much meaning to these numbers. It is just one of the many ways we can solve this ranking task.

[-]RobinZ16y10

Okay, that makes sense - the number is the probability that they could have picked up as many points as they did by picking randomly from the set of all quotes. I understand now.

[-]wedrifid16y00

That's brilliant. I like the theory and the ranking matches about what my intuitive manual ranking would have been too.

[-]RobinZ16y00

If I were to venture a suggestion: statistical significance may be relevant to your valuation of high-average high-number posters like Yvain, MichaelGR, and myself over higher-average low-number posters like michaelkeenan. If poorly-selected quotes nevertheless have a small but significant probability of being highly ranked (but a simultaneous large probability of being low-ranked) and most quoters select poorly, someone with only one high-rated quote is not much likelier to be a good selector of quotes than not. In contrast, someone with many quotes, most of which are highly regarded, could be expected to be unusually discerning, as the probability of this result by chance is low.

[-][anonymous]16y00

While you have the software open... :-)

Top average score? (total / number of quotes) Top people quoted?

[-]gwern16y00

I garnered much more karma than I thought I did from the quotes; must be all the low-ranked ones since I don't have all that many highly ranked quotes.

[-]jimrandomh16y60

Reading the top quotes, I found What is Wrong With Our Thoughts by David Stove again, quoted by EY. At some point I read that, but forgot the metadata and was unable to find it again when I went looking for it later.

In that chapter, written in 1991, David Stove calls for a nosology of thought - that is, a classification of the many ways human thought can go wrong. It seems to me that Less Wrong's sequences contain just such a classification, and that explaining what's wrong with the many examples about three should not be so difficult anymore.

[-]nazgulnarsil16y00

do you need the sequences or is it enough to say that the beliefs espoused in the three examples do not pay rent? it seems to me that what is missing from the vast epistemological debate throughout human history is the idea that truth only means anything in relation to goals.

[-][anonymous]16y00

Stove died in 1994. But his article would be a useful start on such a nosology.

[-]Pfft16y00

That's an interesting idea -- it might make a good page on the Less Wrong Wiki, consisting just of his "three" examples and everyone can add links to LW articles explaining the wrongness.

Personally though, I'm still stumped by most of them. :)

Such a page could also be useful for pointing out areas which haven't been discussed here yet. For instance, there is surely much to say about "The proposition that 3 is the fifth root of 243 is a tautology, just like 'An oculist is an eye-doctor.'", but I can't think of any LW posts talking about it.

[-]billswift16y40

With 1361 quotes your "Best of" isn't very selective.

[-]Vaniver16y110

The sorting helps quite a bit.

[-]Desrtopa16y90

It's not, but I find that many of the quotes lower down on the list are very good themselves. Being upvoted a lot is not just a function of being appealing to LW members, but also being posted at a time and place where many members will take notice, and I think some of these quotes were shortchanged in that regard.

[-]DanielVarga16y50

When I just saw short segments of the sorted list, without karma scores, I was unable to guess whether the segment was from the top or the bottom of the list.

[-]wedrifid16y20

Being upvoted a lot is not just a function of being appealing to LW members, but also being posted at a time and place where many members will take notice, and I think some of these quotes were shortchanged in that regard.

I had the same observation. I found myself opening up rather a lot of permalinks in tabs so that I could vote a whole lot up and more than a few down a well.

We probably also lost a lot of value when it comes to quotes in replies. This one by logi is at 29 but probably is not scraped.

[-]DanielVarga16y30

The Best of Rationality Quotes 2009/2010 thread has calmed down, so now it is a good time to write a comment about how observation has changed the observed. People looking at the quote collection went on up- and downvoting comments. 173 upvotes and 11 downvotes were passed out in the last week. (Not counting votes for December 2010 quotes, which are still in flux.) Here are the comments with the greatest number of new upvotes:

The bottom of the list is less interesting, none of the quotes got more than one downvote.

[-]RobinZ16y00

*upvotes by Eliezer_Yudkowsky*

[-]Rain16y30

Source for my quotes, by general scoring:

Slashdot
Personal quotes archive (years of internet reading)
iGoogle Quotes of the Day
Movies and TV
Book of Wisdom (quote book)

Unfortunately, I ran out of backlog material a couple months ago, and I find less than one good quote a month. I figure that's a very poor return on my investment for hours and hours of internet use, and suggest that others do not attempt to follow my example, unless they already have a quotes archive.

I do regret not having saved many of the good quotes from books I'm sure I've read...

[-]gwern16y30

made a quick attempt, but conversion to the wiki format is nontrivial, and I wasn't sure it is worth the effort.

With the right tool, nothing is impossible! I believe Pandoc (available in a Debian/Ubuntu near you) should be able to handle all the Markdown->MediaWiki. eg. it converts this:

 > foo bar
 -[baz](http://www.google.com)

 <blockquote>foo bar
 </blockquote>
 -[http://www.google.com baz]

[-]DSimon16y00

| With the right tool, nothing is impossible!

-- gwern

I like it! Provided the caveat is kept in mind that sometimes getting the right tool is itself impossible...

[-]gwern16y00

Why I hastened to mention the precompiled and widely available binaries.

(OP is on Windows? Well, there's a limit to how far I will bend over backwards to accommodate someone using the wrong tool...)

[-]CaptainOblivious216y30

While Windows has it's share of flaws, I can't help but wonder if a system in which it's noteworthy to have "precompiled and widely available binaries" (so the USER doesn't have to compile the app before he uses it) isn't just as wrong, only in different ways.

The thing many nix fans overlook is that most people just want to USE a computer, and one that's good enough is, well, good enough. From that perspective, nix isn't so much a tool as it is a toy: something from which people derive more entertainment than utility.

[-]gwern16y00

The precompiled note is because Pandoc is written in Haskell for ghc; the Haskell toolchain is far from being universally available (as, say, C toolchains using gcc are) and is somewhat immature on Mac and Windows. The Pandoc source is available, of course, but one may legitimately not wish to install everything necessary to compile the source oneself. (If Pandoc were written in C, then I might not bother specifying that there are trustworthy binaries available.) There's a reason not everyone uses source-based distros like Gentoo or Arch.

[-]DanielVarga16y00

Thanks for the tip! You don't have to bend over, OP is on Ubuntu. I implemented JGWeissman's suggestion, which made the issue kind of moot, as the filtered version does not really need manual postprocessing.

[-]Nemo_bis16y20

Why not adding some quotes to Wikiquote? (It's also easier to update.) http://www.wikiquote.org/

[-]gwern16y50

Is Wikiquote worth working on? I work on Wikipedia articles because I know that they are incredibly popular and will be read by hundreds of thousands in the long run, but Wikiquote has always been a backwater. (And it's not like quotes need much updating.)

Look at their statistics: http://wikistics.falsikon.de/latest/wikiquote/en/

Their most popular article is 1,994 hits a day. I can get that many hits in a month with an article on an anime: http://stats.grok.se/en/201011/Neon_Genesis_Evangelion_%28TV%29 Or I can get 1k hits a day sometimes with a relatively obscure programming language: http://stats.grok.se/en/201011/Haskell%20%28programming%20language%29

Some not especially prominent Wikipedia articles are within a factor or order of magnitude with Wikiquote's most popular article. That's really really sad. (And if you ever search for quotes, you will rarely see Wikiquote come up, too.)