I've written about the cost of abstraction before. Once you are in the IT industry for couple of decades and once you've read couple of millions lines on legacy code you become healthily suspicious of any kind of abstraction. Not that we can do without abstraction. We need it to be able to write code at all. However, each time you encounter an abstraction in the code that could have been avoided you get a little bit sadder. And some codebases are sadder than Romeo and Juliet and King Lear combined.

Remember reading an unfamiliar codebase the last time? Remember how you've thought that the authors were a bunch of incompetent idiots?

People may argue that this is because legacy stuff is necessarily convoluted, but hey, at that point you were just skimming through the codebase and you weren't understanding it deep enough to tell your typical enterprise legacy monstrosity from a work of an architectural genius. The reason you were annoyed was because you were overwhelmed by the sheer amount of unfamiliar abstraction. (To prove that, consider what was your opinion of the codebase was few months later, after getting familiar with it. It looked much better, no?)

Keep that feeling in mind. Think of it when writing new code. How will a person who doesn't know first thing about this codebase feel when reading it?

The options are not palatable. Either you try to be clever, use abstraction a lot and they'll think you are a moron. Or you get rid of all unnecessary abstraction. You'll make their life much less frustrating but they'll think you are some kind of simpleton. (And they'll probably refactor the code to make it look more clever.)

I want to give a very basic example of the phenomenon.

Imagine that the requirements are that your program does A, B, C, D and E, in that order.

You can do it in the dumbest possible way:

 void main() { // Do A. ... // Do B. ... // Do C. ... // Do D. ... // Do E. ... } 

Or maybe you notice that B, C and D are kind of related and comprise a logical unit of work:

 void foo() { // Do B. ... // Do C. ... // Do D. ... } void main() { // Do A. ... foo(); // Do E. ... } 

But C would probably be better off as a stand-alone function. You can imagine a case where somewhene would like to call it from elsewhere:

 void bar() { // Do C. ... } void foo() { // Do B. ... bar(); // Do D. ... } void main() { // Do A. ... foo(); // Do E. ... } 

Now think of it from the point of view of casual reader, someone who's just skimming through the code.

When they look at the first version of the code they may thing the author was a simpleton, but they can read it with ease. It looks like a story. You can read it as if it were a novel. There's nothing confusing there. The parts come in the correct order:

 A B C D E 

But when skimming through the refactored code that's no longer the case. What you see is:

 C B D A E 

It's much harder to get the grip of what's going on there but at least they'll appreciate author's cleverness.

Or maybe they won't.

January 27th, 2019

by martin_sustrik

New to LessWrong?

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 7:26 PM

Upvoted for

some codebases are sadder than Romeo and Juliet and King Lear combined

Also, readability vs boilerplating/code duplication is not an obvious trade-off

What you describe, seems to me a reaction to bottom-up ordering and imprecise naming, not the number of abstraction layers per se.

I find top-down ordering and good naming a tremendous time saver when reading code, so I ask about it in every code review. It allows you to stop scrolling at the level of detail, that answers your questions. Just think how much faster you would go through a number of modules if the functionality that you are debugging was at the top most of the time. Many small functions are actually of benefit given the above as they are easier to read and test.

Top-down sorting is one of the safest and easiest refactorings you can do, even without IDE tools. It's a bit harder in languages like C, where you need additional header file to allow for top-down ordering, but still worth it.

I don't know, this doesn't jive with my experience of abstractions.

Yes, structuring code with abstractions rather than just directly doing the thing you're trying to do makes the code more structurally complex and yes sometimes it is unnecessary and yes more structural complexity means it's harder to tell what any individual chunk of code does in isolation, but I think your example suggests you're engaging with abstractions very differently from I do.

When I write code and employ abstraction, it's usually not that I just think "oh, how could I make this more clever", it's that I think, "geez, I'm doing the same thing over and over again here, duplicating effort; I should abstract this away so I only have to say something about what's different rather than repeatedly doing what's the same". Some people might call this removing boilerplate code, and that's sort of what's going on, but I think of boilerplate as more a legacy of programming languages where for toolchain reasons (basically every language prior to so-called 4th gen languages) or design reasons (4th gen languages like Python that deliberately prevent you from doing certain things) you needed to write code that lacked certain kinds of abstractions (what we frequently call metaprogramming). Instead I think of this as the natural evolution of the maxim "Don't Repeat Yourself" (DRY) towards code that is more maintainable.

Because when I really think about why I code with abstractions, it's not to show off or be efficient with my lines of code or even to just make things pretty, it's to write code that I can maintain and work with later. Well designed abstractions provide clear boundaries and separation of concerns that make it easy to modify code to do new things as requirement change and refactor parts of the code. Combined with behavioral test driven development, I can write tests to the expected behavior of these concerns, and know I can trust the tests to let me change the code and still pass so long as the behavior doesn't change, and to let me know if I accidentally break the behavior I wanted in the code.

Yes, I often don't do it perfectly, but when it works it's beautiful. My experience is that mainly the people who dislike it are new grads who have spent all their time coding toys who haven't much had to deal with large, complex systems; everyone seems to understand that learning about system-specific abstractions is just naturally what we must do to be kind to ourselves and future programmers who will work with this code. To do otherwise is to do the future a disservice.

Isn’t that what comments in the code are for?