13 On Rigorous Error Handling

by Martin Sustrik

17th Nov 2018

250bpm

7 min read

4

13

Frontpage

13

New Comment

4 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:53 PM

[-]johnswentworth7y50

I generally agree with the problem described, and I agree that "small amount of well-defined failure modes" is a necessary condition for the error codes to be useful. But that doesn't really tell us how to come up with a good set of errors. I'll suggest a more constructive error ontology.

When an error occurs, the programmer using the library mostly needs to know:

Is it my mistake, a bug in the library, or a hardware-level problem (e.g. connection issue)?
If it's my mistake, what did I do wrong?

Why these questions? Because these are the questions which determine what the programmer needs to do next. If you really want to keep the list of errors absolutely minimal, then three errors is not a bad starting point: bad input, internal bug, hardware issue. Many libraries won't even need all of these - e.g. non-network libraries probably don't need to worry about hardware issues at all.

Which of the three categories can benefit from more info, and what kind of additional info?

First, it is almost never a good idea to give more info on internal bugs, other than logging it somewhere for the library's maintainers to look at. Users of the library will very rarely care about why the library is broken; simply establish that it is indeed a bug and then move on.

For hardware problems, bad connection is probably the most ubiquitous. The user mostly just needs to know whether it's really a bad connection (e.g. comcast having a bad day) or really the user's mistake (e.g. input the wrong credentials). Most libraries probably only need at most one actual hardware error, but user mistakes masquerading as hardware problems are worth looking out for separately.

That just leaves user mistakes, a.k.a. bad inputs. This is the one category where it makes sense to give plenty of detail, because the user needs to know what to fix. Of course, communication is a central problem here: the whole point of this class of errors is to communicate to the programmer exactly how their input is flawed. So, undocumented numerical codes aren't really going to help.

(Amusingly, when I hit "submit" for this comment, I got "Network error: Failed to fetch". This error did its job: I immediately knew what the problem was, and what I needed to do to fix it.)

Reply