Taking the outside view on code quality

[-]gjm5y90

I'm aware that the currentDate versus yyyymmdd thing is only an example, but I'm not sure it's a good example because it's not obvious to me that currentDate is necessarily better.

If this thing is a string describing the current date then there are at least two separate pieces of information you might want the name to communicate. One is that it's the current date rather than some other date. The other is that it's in yyyymmdd format rather than some other format.

Whether currentDate or yyyymmdd is more informative depends on (1) which of those two things is easier to infer from context (e.g., maybe this is a piece of software that does a lot of stuff with dates in string form and they're always yyyymmdd; or maybe the only date it ever has any reason to consider is the current date) and (2) which of them is more important in the bit of code in question (e.g., if what you're doing is working out which month it is, that operation is the same whether you're dealing with today's date or something else, but it depends a lot on the format of the input).

It might actually be better in some cases to call the variable something like yyyymmdd_now or currentDate_ymd8 (the latter only makes sense if in your code there are a few different string formats in use for some hopefully-good reason (maybe you need to interoperate with multiple other bits of date-handling software), so that giving them codenames makes sense).

[-]Adam Zerner5y30

Agreed! FWIW, I did realize that there are those issues with my example and that the post would be improved by using a better one (in addition to using multiple examples instead of just a single one). But I had trouble thinking of good examples and knew of the current one from here.

[-]gjm5y110

In that example I see that the actual format is yyyy/mm/dd rather than yyyymmdd. I definitely don't like the name yyyymmdd in that case; to me it suggests no separators. (I might advocate for switching to yyyy-mm-dd and using a name like currentDate_iso8601, though that's a bit unwieldy.)

[-]Adam Zerner5y20

Ah. I didn't even notice that but that's a great point. I also think that yyyymmdd suggests no separators.

[-]gjm5y80

I'm not sure inside/outside is what's mostly going on when you're on the fence about whether making a minor name improvement is worth it. It seems to me more like the following things:

Looking at a single decision rather than the policy it implies. (Cf. "How I lost 100 pounds using TDT".)
Changing things has costs as well as benefits; if you rename the variable there's a (hopefully small) chance that you screw it up somehow and break things. Note that this needs to be considered even when you zoom out, even when you consider policies as well as individual decisions, and even when you take the outside view. (Would you rather work on a stable codebase or one where things keep being renamed as other people decide that some name is better? Would you rather concentrate on fixing bugs and adding features, or would you rather keep having meetings where everyone discusses ten variables they think have slightly the wrong names? Would you rather have bugs turn up every now and then because someone renamed a variable but forgot about one place where it's used, or didn't update a bit of documentation?)

[-]Adam Zerner5y20

Looking at a single decision rather than the policy it implies.

Hm. So if you look at a single decision like "it isn't worth refactoring this", and then you extrapolate out into the policy it implies ("it isn't worth refactoring for the most part"), you're still left with the question of what to do with your macro-level conclusion of "it isn't worth refactoring for the most part". Is it a good conclusion or a bad one? You could just use a reducto ad absurdum argument of "of course that's a bad conclusion", but I feel like looking at other things in your reference class is (a big part of) the way to go.

Changing things has costs as well as benefits

Yeah, great point. I agree that those are important things to consider.

[-]Brendan Long5y40

This is only tangentially related, but in cases like this, the strategy of improving variable names when you're working on a piece of code is significantly more valuable than searching for code to refactor and improve.

It's true that improving a random variable name in your code base is not a big win, but:

Since you're already looking at this piece of code and presumably making a change, the cost of changing the variable name is lower than if you were changing a random part of the code.
The fact that you're looking at this piece of code and not a different one is evidence that this is something people are more likely to look at than usual, so the benefit of improving it is higher than improving a randomly chose variable name.

Because of these two things, the procedure "improve code you're working on" is signifantly more valuable than you'd expect if you think the procedure you're following is "improve all the code".

[-]Adam Zerner5y30

Oh yeah, that's something I've actually been thinking about recently. Unfortunately, I think it isn't very compatible with the way management works at most companies. Normally there's pressure to get your tickets done quickly, which leaves less time for "refactor as you go". And then if you're lucky, they'll allocate some time for tech debt. But as you say, that's less efficient than "refactor as you go" because you have to load all that context back in to your working memory.

All of this is a big part of what I had in mind in writing this post though. If managers/decision makers took the outside view on code quality, maybe they would encourage developers to take their time and refactor as they go rather than having pressure to finish tickets quickly.

[-]SatvikBeri5y*40

Unfortunately, I think it isn't very compatible with the way management works at most companies. Normally there's pressure to get your tickets done quickly, which leaves less time for "refactor as you go".

I've heard this a lot, but I've worked at 8 companies so far, and none of them have had this kind of time pressure. Is there a specific industry or location where this is more common?

[-]Adam Zerner5y40

Interesting. My impression is that it's pretty widespread across industries and locations. It's been the case for me in all four companies I've worked at. Two of which were startups, two mid-sized, and each was in a different state.

[-]ChristianKl5y20

Improving code you work on is also good because you are likely better understand the purpose of the code when you are working on it then when you look at a random part of your application.

[-]Darmani5y40

I think it's simpler than this: renaming it is a small upfront cost for gradual long-term benefit. Hyperbolic discounting kicks in. Carmack talks about this in his QuakeCon 2013, saying "humans are bad at integrating small costs over time": https://www.youtube.com/watch?v=1PhArSujR_A

But, bigger picture, code quality is not about things like local variable naming. This is Mistake #4 of the 7 Mistakes that Cause Fragile Code: https://jameskoppelcoaching.com/wp-content/uploads/2018/05/7mistakes-2ndedition.pdf

[-]Adam Zerner5y20

I think it's simpler than this: renaming it is a small upfront cost for gradual long-term benefit.

Yes, but at some point the cost starts to outweigh the benefit. Eg. going from yyyymmdd to currentDate is worthwhile, but going from currentDate to betterName, or from betterName to evenBetterName might not be worthwhile. And so I think you do end up having to ask yourself the question instead of assuming that all code quality improvements are worthwhile. Although I also think there's wisdom in using heuristics rather than evaluating whether each and every case is worthwhile.

But, bigger picture, code quality is not about things like local variable naming. This is Mistake #4 of the 7 Mistakes that Cause Fragile Code: https://jameskoppelcoaching.com/wp-content/uploads/2018/05/7mistakes-2ndedition.pdf

I agree with the big picture point that things that are sort of siloed off aren't as important for code quality. I chose this example because I thought it would be easiest to discuss. However, although I don't think they are as important, or even frequently important, I do think that stuff like local variable names end up often being important. I'm not sure what the right adjective is here, but I guess I can say I find it to be important enough where it's worth paying attention to.

[-]Darmani5y10

It's a small upfront cost for gradual long-term benefit. Nothing in that says one necessarily outweighs the other. I don't think there's anything more to be had from this example beyond "hyperbolic discounting."

[-]ChristianKl5y20

My own relationship to naming is more about taste. I want to be person who doesn't write crappy code but who writes good code and thus I don't commit code with crappy names.

[-]TruePath5y10

I feel there is something else going on here too.

Your claimed outside view asks us to compare a clean codebase with an unclean one and I absolutely agree that it's a good case for using currentDate when initially writing code.

But you motivated this by considering refactoring and I think things go off the rails there. If the only issue in your codebase was you called currentDate yyymmdd consistently or even had other consistent weird names it wouldn't be a message it would just have slightly weird conventions. Any coder working on it for a non-trivial length of time would start just reading yyymmdd as current date in their head.

Tge codebase is only messy when you inconsistently use a bunch of different names for a concept that aren't very descriptive. But now refactoring faces exactly the same problem working with the code does..the confusion coders experience seeing the variable and wondering what it does becomes ambiguity which forces a time intensive refactor.

Practically the right move is probably better stds going forward and to encourage coders to fix variable names in any piece of code they touch. But I don't think it's really a good example of divergent intuitions once you are talking about the same things.

[-]Adam Zerner5y20

Perhaps. yyyymmdd to currentDate is just an example though. In practice I expect that codebases would have a variety of different issues.

LESSWRONG
LW

LESSWRONG
LW

11

Taking the outside view on code quality

11

11