Detecting Web baloney with your nose?

by uzalud1 min read10th Nov 201221 comments


Is there a useful heuristic for detecting rationally-challenged texts (as in Web pages, forum posts, facebook comments) which takes relatively superficial attributes such as formatting choices, spelling errors, etc. as input? Something a casual Internet reader may use to detect possibly unworthy content so they can suspend their belief and research the matter further. Let's call them "text smells" (analogue to code smells), like:

  1. too much emphasis in text (ALL CAPS, bold, color, exclamations, etc.);
  2. walls of text;
  3. little concrete data/links/references;
  4. too much irrelevant data and references;
  5. poor spelling and grammar;
  6. obvious half-truths and misinformation.

Since many crackpots, pseudoscientific con artists, and conspiracy theorists seem to have cleaned up their Web sites in recent years, I wonder do these low-cost baloney detection tools might be of real value. Does anyone know of any studies or analyses of correlation between these basic metrics and the actual quality of the content? Can you think of some other smells typical of Web baloney?


Remember, there's unlimited reading material to choose from; your not-worth-reading detector should be sensitive, because false negatives cost much more than false positives. When reading an author for the first time, unless I have a strong recommendation or other quality signal, I will stop if the first incidence of stupidity precedes the first insight, or if there are no good insights in the first 500 words or so.

For superficial signals like spelling and overuse of emphasis, I divide them into two categories: things a good writer would do if they were rushed, and things a good writer wouldn't ever do. Typos, missing words, few citations? You're looking at an unedited draft; whether that's okay or not depends on the context. Bold italic all-caps large font? Crackpot.

"Proper" spelling and grammar are some sort of indication of conscientiousness that the writer has put into ① their education, and ② the text itself. However, it's a pretty noisy signal; there are plenty of properly-spelled Bible study guides out there.

Also, there are a lot of insightful people who focused on learning other things (its amazing how little non-code writing even good cs programs will let you get away with) and/or who write in english because its common rather than because its what they were educated in.

its amazing how little non-code writing even good cs programs will let you get away with

And yet the good ones leave one with an appreciation for syntax that transfers itself naturally to the written word.

There are a few other "crackpot indices" around. John Baez has a famous one, and Scott Aaronson has one in that vein (mostly specific to mathematics papers though).

In defense of crackpots, many of the canonical writers here would ping the crackpot meter of most people, as would most of the LW contributors.

Korzybski is a prime example. If I hadn't had a very strong prior from personal discussions, there is no way I would have made it 10 pages into Science and Sanity.

For serious reading, my priors are more important than typesetting. For web blogs and filtering forums, it's a decent way to filter complete unknowns.

Number 7: comic sans

I count three that apply to Eliezer's sequences and another that can be applied to lukeprog's posts. And in addition to all four of these a fifth (poor spelling) that apply to my own posts.

  • two types of emphasis at once, such as underlined italic bold text
  • a product to be sold, such as a book written by a mistaken genius

Would you care to clarify how much you mean "... so Eliezer and Luke are crackpotty" and how much you mean "... so these aren't a very good guide"? (For the avoidance of doubt, I don't think either argument is obviously crazy, though actually I think Eliezer and Luke aren't crackpots and those are useful crackpot indicators.)

Which one applies to Luke?

Too much irrelevant data and references.

Which three?

Well Eliezer was found of italicizing words in his text, doesn't provide references for most of his statements and wrote quit a few walls of text. I mean the sequences are huge.

I wouldn't call Eliezer's emphasis excessive, nor would I call the sequences "walls of text". This is an example of both:

My question is: if you didn't know any English, could you still infer that this is more likely to be baloney, or not?

Without knowing English, I would suggest that only the excessive repeated bangs and interrogation marks are high-value. The excessive ALL CAPS is likely mid-value, and the lack of paragraph breaks is low-value.

That extreme? Yes it is evidence that the author has low competence and that is evidence of being false.