I Don’t Know How To Count That Low

Elizabeth

Edit: greetings Hacker News. This is a cross-post from my own blog, AcesoUnderGlass.com. If you enjoy this post, consider checking that out as well.

Back when I was at Google we had a phrase, “I don’t know how to count that low”. It was used to dismiss normal-company-sized problems as beneath our dignity to engage with: if you didn’t need 100 database shards scattered around the globe, were you even doing real work?

It was used as a sign of superiority within Google, but it also pointed at a real problem: I once failed a job interview at a start-up when I wondered out loud if the DB was small enough to be held in memory, when it was several orders of magnitude lower than when I should even have begun worrying about that. I didn’t know the limit because it had been many years since I’d had a problem that could be solved with a DB small enough to be held in its entirety in memory. And they were right to fail me for that: the fact that I was good at solving strictly more difficult problems didn’t matter because I didn’t know how to solve the easier ones they actually had. I could run but not walk, and some problems require walking.

It’s a problem, but it can be a pleasant kind of problem to have, compared to others. Another example: my dad is a Ph.D. statistician who spent most of his life working in SAS, a powerful statistical programming language, and using “spreadsheet statistics” as a slur. When I asked permission to share this anecdote he sent me a list of ways Excel was terrible.

Then he started consulting for me, who was cruelly unwilling to pay the $9000 license fee for SAS when Google Sheets was totally adequate for the problem (WHO HAS FOOD AT HOME NOW DAD?!?).*

My dad had to go through a horrible phase of being bad at the worse tool, and found a lot of encouragement when I reframed “I could have done this with one line in SAS and am instead losing to this error-riddled child’s toy” to “I didn’t know how to count that low, but now that it matters I am learning”. And then he tried hard and believed in himself and produced that analysis of that informal covid study that was wonderful statistically and super disappointing materially. And I retrained on smaller numbers and got that job at that start-up.

These are the starkest examples of how I’ve found “I don’t know how to count that low” useful. It reframes particularly undignified problems as signs of your capacity rather than incapacity, without letting you off the hook for solving them. Given how useful it’s been to me and how little I’ve seen of it in the wild, I’d like to offer this frame to others, to see if it’s useful for you as well.

*If any of you are going to bring up R: yes, it’s free, and yes, he has some experience with it, but not enough to be self-sufficient, I knew Sheets better, and I knew it was totally adequate for what we were doing or were likely to do in the future.

Appendix: I know you’re going to ask, so here is his abbreviated of grievances with Excel. Note that this was Excel in particular; I have no idea if it applies to Google Sheets. I also would allow that this must have been years ago and Excel could have gotten better, except AFAIK they never fixed the problem with reading genes as dates so they get no benefit of a doubt from me.

I attended a talk by a statistician at Microsoft. He said that Microsoft had decided that there was no competitive advantage in making Excel statistics better because no statistician used it for serious problems except for data entry, so:

1. he was the only statistician at Microsoft
2. he knew of seven serious statistical problems in Excel, but they wouldn’t give him the money to fix them.
3. Excel’s problems fell into two categories:
3a. terrible numerical analysis: it was widely verified if you took a number of single-digit numbers and calculated their standard deviation, and then took the same numbers and added a million to them, the standard deviation was often different, when it should be exactly the same.
3b.

statistical errors – like not understanding what you’re copying out of a textbook and getting it wrong.

Thanks to Ray Arnold and Duncan Sabien for beta-reading, and my dad for agreeing have his example shared.

it was widely verified if you took a number of single-digit numbers and calculated their standard deviation, and then took the same numbers and added a million to them, the standard deviation was often different, when it should be exactly the same.

This bug does not exist in Excel for Mac 2011.

Virtue points for checking! For the sake of thoroughness: how many values did you use? A comment from my FB wall suggests this comes from a hack for handling very large data sets.

The numerically-unstable stdev is a classic. I've run into it in ML stuff. It happens when you try to be clever and compute the stdev in one pass through the data without computing the mean beforehand... Unless you use the really cool black-magic trick that lets you do this in a more stable way.
In an ML context this happens if you want the running stdev of a stream of data you can't afford to store in memory (so you can't find the mean and then go back and start from the beginning). Excel has no such excuse.

Update: I just tried it with 100,000 values. Still no bug.

Update 2: I’ve now tried it with 1,000,000 values. Everything is still correct.

I tried a set of 104,857,600 values in Excel 2010 for Windows and the standard deviations diverge at the 9th decimal:

Count 1: 104,857,600
Count 2: 104,857,600

Sum 1: 5,242,888,545
Sum 2: 104,862,842,888,545

Average 1: 50.00008149
Average 2: 1,000,050.00008149

Stdev.p 1: 29.15276094096330000
Stdev.p 2: 29.15276094160490000

As a software developer, I completely expect this; it's just an everyday consequence of limited-precision math. You'll get the same kind of anomalies in almost any programming language, and other kinds of calculations can produce far worse inaccuracies than this (seems kind of odd to single out standard deviations which are relatively well-behaved, but maybe Excel used to have a worse algorithm). John Gustafson proposed using clusters of unums and then posits to mitigate the problem (preferably in hardware), but it's not practical to eliminate it entirely.

I used 100 values. How many is “very large”? 1,000? 1,000,000? More?

I appreciate the need for a phrase or concept to refer to instances when the “easier” thing is harder for you than the “hard” thing, so thank you for pointing me towards that idea. It reminds me of mathematicians who have gotten remarkably bad at arithmetic because it simply doesn’t come up in their studies.

That said, it’s difficult for me to disentangle the phrase “I don’t know how to count that low” from its somewhat elitist origins. It seems to bring the unworthiness of the task into focus instead of the person’s competence at the task. Indeed, the google engineers in the story seem to remark upon their distance from the task as a point of pride in your story, that the numbers are so low that bothering to count them or even knowing how to count them is a mark of low status. Perhaps the ego-saving of looking down on the task is part of the appeal? Or perhaps I am reading too much into the Google story. Something like “I’ve forgotten how to walk” appeals much more to me since it emphasizes my present lack of skills.

Something like “I’ve forgotten how to walk” appeals much more to me since it emphasizes my present lack of skills.

I would suggest "Forgot how to sit down".

This post strikes me as a bit arrogant. Doesn't seem like an easier problem just a different one. Not all problems have to do with the scale, that seems like a pretty narrow way to think of difficulty in engineering. What about latency? Some really smart people are involved with shaving microseconds off of stock transaction latency, so is that problem beneath you as well?

At the end of the day, for me it's about keeping an open mind, and solving problems that you face, so you can accomplish your goals. Doesn't matter if its a "worthy" problem or not. If you can't solve it or struggle to solve it, then its worthy.