Here's something to pick our collective spirits up:

According to Google's infallible algorithms, 20% of the content on LessWrong.com falls within the 'Advanced' reading level. For comparison, another well-known bastion of intelligence on the internets, Hacker News, only has 4% of it's content in that category.

Strangely, inserting a space before the name of the site in the query tends to reduce the amount of content that falls in the highest bucket, but I am told that highly trained Google engineers are interrogating the bug in a dimly lit room as we speak, and expect it to crack soon.

New Comment
24 comments, sorted by Click to highlight new comments since: Today at 7:14 AM
[-][anonymous]13y140

Why would that pick anyone's spirits up? Surely in an ideal world, where you want to actually communicate, you want the reading level to be the lowest possible one that would get the idea across? Making something actively difficult to read is a good way to confine your ideas to an in-group...

[-]Jack13y30

Yes. In an ideal world everything interesting and important would be comprehensible to a ten-year-old. But since we don't live in an ideal world and many interesting and important ideas require difficult concepts and complex vocabulary we can be pleased with this evidence that we are rather unique in our ability and propensity to talk about important, difficult ideas.

[-][anonymous]13y60

It's not evidence of any such thing. Read Orwell's Politics And The English Language - http://www.mtholyoke.edu/acad/intrel/orwell46.htm . Every example of bad writing he gives there would show up as 'advanced'.

[-]Jack13y50

I have no idea how google's algorithms work. If they're counting syllables per word or evaluating vocabulary then ranking as advanced is evidence for both the claim that we use too much jargon and the claim that we talk about difficult and complex ideas. But that isn't the only evidence to consider. We've both read much of Less Wrong and can evaluate the difficulty and complexity of the ideas we discuss here. Do you not think we talk about difficult and complex ideas here? If so, what makes you think the 'advanced' rating is a product of poor writing rather than attempts to grapple with complexity?

I bet my use of the word 'algorithm' in my first sentence increases our rating. Would you like to suggest another word to replace it?

'method', or maybe 'system'

Nobody's making anything actively difficult to read, nor did I advocate working to increase the reading level.

However, one-shot unexpected measurements of proxies do carry useful information. In this case, this is weak objective evidence that the conversation on LW is of especially high quality compared to other esteemed communities. That is all.

[-][anonymous]13y150

If anything, it's weak evidence that the conversation is of poor quality. Doing the same search for TrueOrigin.org, a creationist site, for example, shows that it gets 70% 'advanced', 29% 'intermediate' and 0% basic.

'Advanced' reading level is almost always a pretty good proxy for obfuscation, rather than for intelligence.

Most reading level metrics are calculated with something like 206.835 - 1.015(total words/total sentences) - 84.6(total syllables/total words)*. Others involve long paragraphs and whatnot.

Besides being an amalgamation of funky constants to get answers the way they want (100 is easy, 0 is best for college-educated folks) it favors run-on sentences with polysyllabic words.

I think that most of the time, short well-phrased sentences are more understandable.

Long sentences of big words seem to be reminiscent of the incomprehensible journal article that takes effort to understand the language of, or the papers that kids in school throw together without regard for conveying an understanding of the subject, let alone editing for clarity.

*http://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_test

Your last sentence there has a complexity level of -105.87. Your overall level may be higher because your other sentences aren't over twenty words, but most of it came from your syllables.

Perhaps you could follow your own advice, and use shorter words?

I agree that these metrics are bad at finding depth and good at finding obfuscation.

Dang, I was hoping to get away with only two syllable words.

Also, upvoted for giving a link to a readability test.

Perhaps you could follow your own advice, and use shorter words?

The comprehensibility problem in the last sentence seems to be the grammar!

[-][anonymous]13y60

I thought it was for humour actually. Demonstrating the problem he was talking about in the very sentence he was talking about it..

I thought it was for humour actually. Demonstrating the problem he was talking about in the very sentence he was talking about it..

That was the intent. I probably could've done it better though.

The comprehensibility problem in the last sentence seems to be the grammar!

I agree with that, but the readability metric doesn't seem to deduct that much for grammar. Instead it just looks for long sentences, then docks for that. I don't think that it would actually be able to detect a long but readable sentence, and deduct fewer points for it.

Okay, so now I want to see how many words I can fit into a sentence without it getting too confusing to be read by someone who is pretty young or perhaps new to English; what sorts of ideas might you, or anyone else, have to make a sentence keep working as long as possible?

As to the original comment, sorry I guess I explained your joke.

Well, you did, but I was probably going to anyway at that point.

Really long descriptions seem to work well for making long sentences. Aside, do you want to do this with or without semicolons?

Does the algorithm count semicolons as creating new sentences? The purpose here remains to defeat the algorithm, correct?

I don't know actually. I'd guess not, but it might vary by implementation.

I think these metrics are best for discovering which text would most benefit from efforts to simplify it. (Something I should do myself when writing for an audience.)

Thanks for posting the formula. I think it makes it much clearer what its limitations are, as compared to the opaque description "it measures reading level".

It tends to indicate intricate grammar. Tangled sentences are a hazard of attempting precision in English.

It'd be interesting to run the numbers on the top-rated comments, to see how much of a problem this is considered in practice.

Of course, that's just the front page of top comments, so these numbers are purely entertainment. But someone could do it more robustly if they care to.

[-]Jack13y40

'Advanced' reading level is almost always a pretty good proxy for obfuscation, rather than for intelligence.

It's a good proxy for both. I don't see much obfuscation here. I do see a lot of people trying to precisely nail down ideas that are difficult to express.

It's a good proxy for both.

Which makes it a bad proxy for either.

Only if they were the only options, mutually exclusive, and equally likely.

It's a good proxy for both.

Then the obvious next question would be: what's the differential for intelligence vs. obfuscation?

[-][anonymous]13y00

Looks like text that is not forgotten tends to be Intermediate or Advanced.