The horrifying importance of domain knowledge

by NancyLebovitz1 min read30th Jul 2015238 comments

25

Personal Blog

There are some long lists of false beliefs that programmers hold. isn't because programmers are especially likely to be more wrong than anyone else, it's just that programming offers a better opportunity than most people get to find out how incomplete their model of the world is.

I'm posting about this here, not just because this information has a decent chance of being both entertaining and useful, but because LWers try to figure things out from relatively simple principles-- who knows what simplifying assumptions might be tripping us up?

The classic (and I think the first) was about names. There have been a few more lists created since then.

Time. And time zones. Crowd-sourced time errors.

Addresses. Possibly more about addresses. I haven't compared the lists.

Gender. This is so short I assume it's seriously incomplete.

Networks. Weirdly, there is no list of falsehoods programmers believe about html (or at least a fast search didn't turn anything up). Don't trust the words in the url.

Distributed computing Build systems.

Poem about character conversion.

I got started on the subject because of this about testing your code, which was posted by Andrew Ducker.

238 comments, sorted by Highlighting new comments since Today at 10:33 AM
New Comment
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

LWers try to figure things out from relatively simple principles

If you only do this you are going to fail. This is why Aristotle got physics so badly wrong, right? Reasoning has to be a cyclical process of compressing complex realities into sets of simple principles and only then are you licensed to decompress those principles into making statements about reality.

3NancyLebovitz5yThank you. That was there (plus some extras-- see the article now) in my original post at lj, but somehow it got lost as I was bullying the html to work here.

Addressing common misconceptions is worthwhile, and lists are a good way to do it. You can find similar lists in the academic literature for many different subjects, e.g., here's an article about common misconceptions in thermodynamics. I've also mentioned math books listing counterexamples before. Many counterexamples address common misconceptions.

I like to create Anki cards for common misconceptions, to make sure I don't makes these mistakes myself. One issue with this is whether correcting specific cases causes my brain to recognize a more general case.... (read more)

So while many of these false beliefs are worth noting, it's worth thinking why programmers make these mistakes in practice. While it might be the case that names can include numbers, it's probably also going to be the case that the majority of numbers get into names via user error. Depending on the purpose of your database, it might be more valuable to avoid user error, than avoid a minority of users being excluded.

The reason I mention this is a lot of things in life are a study in trade offs. Quantum mechanics isn't very good at describing how big things work, and classical mechanics isn't very good at describing how small things work.

Weirdly, there is no list of falsehoods programmers believe about html (or at least a fast search didn't turn anything up).

A lot of programmers believe they can parse HTML with regular expressions.

7DanArmak5yA lot of programmers believe they can parse HTML at all. Go read the official W3C parser algorithm [http://www.w3.org/TR/html5/syntax.html], I'll wait. First thing you'll notice is that there is no formal grammar - the spec is of the actual parser state machine. Then you notice each past-and-present HTML version has its own parser algorithm spec, and there is no official documentation on the differences between them, never mind rationale. Then you realize that HTML5 is now a "living spec", so the parser algorithm at that link occasionally changes, and past versions and changelogs are deliberately not published... HTML is a parseable format like PHP is a programming language. There is no spec, there is only whatever bugs and quirks a particular browser version happens to contain. (Oh, you thought browsers actually follow any of those published W3C specs? HAHAHAHAHA sob.)
-2eternal_neophyte5yHTML is indeed a turd of a standard.
3fubarobfusco5yHey, can we try not invoking the blasphemous things that dangle from the staked corpses o̭̙̥͚̘͍̠f dead universes?
2DanArmak5yIf I were feeling snarky I'd add, "a lot of people believe they are programmers." (To adapt a quotation, "Anyone can code. That doesn't mean anyone should.")

From Falsehoods Programmers Believe About Names:

anything someone tells you is their name is — by definition — an appropriate identifier for them.

There should be a list of false things people coming from common law jurisdictions believe about how choice of identity works on the rest of the globe.

3NancyLebovitz5yI'm not sure what's intended by "appropriate" there-- it might not be so much a claim about the law as a claim that it's a name the person wants to use and you shouldn't argue with them about it. Even then, impersonation is an issue..

Sometimes a user puts something in a "name" field that they do not actually intend to be used to identify themselves.

They may be trying to get that string displayed to other users in a highlighted fashion. If someone puts "Wal-Mart Sucks" in the name field on a blog comment, it isn't because they seriously want to be identified by the surname of Sucks. They're just saying that Wal-Mart sucks, in a dramatic way.

They may be trying to break the system in one way or another. If someone puts their name as "Robert'; drop table students; --" then depending on the social and technical context they might be giving themselves a clever alias; or they might be trying to attack the database.

3NancyLebovitz5yAll fair enough. There's also the possibility of accidentally entering wrong characters-- I assume this is unlikely since people should know how to type their names, but people have to type their names so much that even a low chance of fumble-fingers is going to occur now and then.
0TrE5yOr their mom might be a hacker. Incidentally, there are many cases where I don't care about my username at all and have to come up with something. I'd find it acceptable if they'd just give me a number and a password, or let me register just with a password (perhaps provided by them?), maybe plus e-mail.

I think it's worth noting that, yes, if you want your database of names/addresses/times/etc. to be fully robust, you need to essentially represent these items as unconstrained strings of arbitrary length (including zero).

However, in practice, most likely you're not building a fully robust database. For example, you are not solving the problem of, "how can I fully represent all of the marvelous variety of human names and addresses ?", but rather, "how can I maximize the changes that the packages my company is shipping to customers will actual... (read more)

2Lumifer5yAs a nitpick -- yes, he can. In the third world it's not uncommon to NOT have a working system of usual addresses and locations are typically specified as town -- local landmark -- directions from that local landmark.
0Bugmaster5yRight, I was thinking in the context of our Western society. But in the third world, as you said, the opposite is true: an address like "123 Main St., Sometown Somecountry" simply does not work. So it is still not the case that you need to implement a fully general address database that covers all possible cases; you only need to cover the cases that you personally care about.
-3VoiceOfRa5yOf course, you're much less likely to be shipping a package to those kinds of third world countries, and even if you did, you'd have trouble making sure it gets delivered to its destination for other reasons.

Also, there is a place where the sun rises in the west and sets in the east.

(Gur fha evfrf orpnhfr bs gur Rnegu'f ebgngvba, ohg rira va gur nofrapr bs gur Rnegu'f ebgngvba vg jbhyq evfr naq frg bapr n lrne orpnhfr bs ubj gur Rnegu'f nkvf cbvagf. Vs lbh ner pybfr rabhtu gb gur Abegu Cbyr, gur ebgngvba unf artyvtvoyr pbagevohgvba gb gur fha'f evfvat naq frggvat naq gur fha evfrf naq frgf cerggl zhpu bayl orpnhfr bs gur nkvf. Tb gb gur Abegu Cbyr, frr jurer gur fha evfrf, naq tb njnl sebz gur cbyr n fubeg qvfgnapr va n qverpgvba fhpu gung guvf cbvag vf gb gur jrfg.)

A problem of believing something wrong could also be a problem of classification or problem of language. There is so much lost in translation and as far as a programmer is concerned they deal with a very formal language. So they are not as good at informal languages and nuances.

I'm not a programmer, so my view may be a little bit skewed.. but this seems like a list of things that may or may not be correct (and I'd like to put things in a continuum, or percentages, or some other more practically observable other than a list) but it doesn't really get any further than that. How substantial many of the claims are?

I honestly doubt "programmers" get all these wrong. I'm not going to link to a post (I've seen some people replying with a post as if it's a substitute for actually saying what's wrong) or even say that "prog... (read more)

5NancyLebovitz5yYou may be pointing out a problem with English. "Programmers" could simply mean more than one programmer, but there's an implication that all programmers have those problems or perhaps that the problems are pervasive. I get the impression that these are mistakes that someone has seen (or made) at least once. Some of them may mostly be made by programmers who are beginners. I agree that it would be nice to get percentages, even if there are large error bars. I think the main point of these lists is to warn programmers that they may need much more specific knowledge than they think they need. For what it's worth, the commenters on those lists are programmers, and I'm not seeing comments which say "That never happens!".
3ChristianKl5yIf I ask most programmer or people of other domains whether all months have more than 27 days I think most of them would say, of course all months have more than 27 days. That if wednesday is the second of a month the following friday out to be the 4th of the months. You actually need to know about the strangeness of september 1752 to know that there this one month in the calender who had less days. It had no days between the 2nd and the 14th. Children get taught in school that minutes have 60 seconds and not that there a body that decides 18 months in advance whether the minute is supposed to have 59, 60, 61 or 62 seconds.
0LessWrong5yThe september 1752 example sounds like something you'll find on a trivia show. It's not really such a good example. It's the exception rather than the rule. When I read this I feel like I'm back in elementary school being the detail obsessed nerd. I can't say anything about the minute example but seeing the trend is to take some obscure occurrence, pointing fingers and saying "how can you not know that?" and looking like a special snowflake to every regular person. In practical terms, what are the merits of all those examples? Going back to the lists, some of them are probably bad design[1], like one example that a backup is a string of numbers 053901011991.html so let's not focus on them. [1] What constitutes "bad design" may vary; some people could probably easily filter through many files like that using ls. Some people prefer minimalism, others don't feel compelled to use their processing power so sparingly, and would rather get the job done more quickly. (It seems like this has some time implications. If you have cleaner code, you can work with it more easily in the future, if you just want a task done and forget about it) So if I were to describe "bad design" in a way that holds some water, I would say that it hurts productivity.
3ChristianKl5yThe whole point of the list is that there are exceptions to rules that most people consider to be true in all cases. If you program systems than you get bugs because of corner cases that you don't anticipate. You need domain knowledge to know all the corner cases. Leap seconds manage to crash real world computer systems because their designers didn't handle them properly. You don't want any software that has a calendar to crash simply because a user asks the system to show september 1752.
1Lumifer5yActually, the proper solution is to practice defensive programming, not trust the user input, and be generous with sanity checks. Failing gracefully is much easier when your software knows it's not in Kansas any more.
0HungryHobo5yWhich sounds nice right up until a production system shuts itself down gracefully a few hours before a daylights saving time switch purely because of tests which turn out to be more picky than the actual thing they're supposed to be protecting. Multi-byte characters can do surprising things to scripts designed to truncate logs written by someone who didn't take into account the maximum size of characters and chinese production server names. A lot of a programmers day can end up being related to fixing bugs due to incorrect assumptions or failing to take edge cases into account and knowing lots of edge cases and handling a reasonable portion of them right away is far better than making the most restrictive possible assumptions off the bat. You don't want to end up running into the Y2Gay problem:http://qntm.org/gay [http://qntm.org/gay]
0Lumifer5yDo you know that "the system" can handle that input fine? If you do, why did you sanity check reject it? Sanity checks are just code -- you certainly can write bad ones. So? That post argues via mind-numbingly stupid strawmen (or should that be strawschemas?). Yes, you should try to be not stupid, most of the time, to the best of your ability. I agree :-/
-1Jiro5yThis is no different from asking someone the fastest route to get to the store and being told "go a mile down that road and take a left" even though the person didn't check to see if the road was temporarily blocked. If you ask someone directions, they're probably not going to add "unless the road isn't temporarily blocked" and "unless a meteor hit the store last night and I didn't know about it yet" and "unless there's a quantum fluctuation that will move all your molecules right next to the store".
0ChristianKl5yIn many cases Google Maps has disclaimers like that. In programming you usually do care more about edge cases then you care in daily life.
0Lumifer5yOf course not. Many hackers are programmers, few programmers are hackers. People fitting esr's description are rare and cubicle peons with a keyboard and a screen are legion.
[-][anonymous]5y 0

Could you turn the tracking links into direct ones?

[This comment is no longer endorsed by its author]Reply