[ Question ]

What are some triggers that prompt you to do a Fermi estimate, or to pull up a spreadsheet and make a simple/rough quantitative model?

by Eli Tyre1 min read25th Jul 202115 comments

37

Fermi EstimationTrigger-Action PlanningWorld Modeling
Frontpage

I'm currently viscerally feeling the power of rough quantitative modeling, after trying it on a personal problem to get an order of magnitude estimate and finding that having a concrete estimate was surprisingly helpful. I'd like to make drawing up drop-dead simple quantitative models more of a habit, a tool that I reach for regularly. 

But...despite feeling how useful this can be, I don't yet have a good handle on in which moments, exactly, I should be reaching for that tool. I'm hoping that asking others will give me ideas for what TAPs to experiment with.

What triggers, either in your environment or your thought process, incline you to start jotting down numbers on paper on in a spreadsheet?

Or as an alternative prompt: When was the last time you made a new spreadsheet, and what was the proximal cause?

New Answer
Ask Related Question
New Comment

9 Answers

I usually don't use paper or spreadsheet for Fermi estimates; that would make them too expensive. Also, my Fermi estimates tend to overlap heavily with big-O estimates.

When programming, I tend to keep a big-O/Fermi estimate for the runtime and memory usage in the back of my head. The big-O part of it is usually just "linear-ish" (for most standard data structure operations and loops over nested data structures), "quadratic" (for looping over pairs), "cubic-ish" (matrix operations), or "exponential" (in which case I usually won't bother doing it at all). The Fermi part of it is then, roughly, how big a data structure can I run this on while still getting reasonable runtime? Assume ~1B ops per second, so for linear-ish I can use a data structure with ~1B entries, for cubic-ish ~1k entries, for exponential ~30 entries.

This obviously steers algorithm/design choice, but more importantly it steers debugging. If I'm doing a loop which should be linear-ish over a data structure with ~1M elements, and it's taking more than a second, then something is wrong. Examples where this comes up: 

  • scikit implementations of ML algorithms - twice I found that they were using quadratic algorithms for things which should have been linear. Eventually I gave up on scikit, since it was so consistently terrible.
  • SQL queries in large codebases. Often, some column needs an index, or the query optimizer fails to use an existing index for a complicated query, and this makes queries which should be linear instead quadratic. In my experience, this is one of the most common causes of performance problems in day-to-day software engineering.
  • Aside from programming, it's also useful when using other peoples' software. If the software is taking visible amounts of time to do something which I know should be linear, then the software is buggy, and I should maybe look for a substitute or a setting which can fix the problem.

I also do a lot of Fermi estimates when researching a topic or making a model. Often these estimates calculate what a physicist would call "dimensionless quantitites" - we take some number, and express it in terms of some related number with the same units. For instance:

  • If I'm reading about government expenditures or taxes, I usually want it as a fraction of GDP.
  • When looking at results from a linear regression, the coefficients aren't very informative, but the correlation is. It's essentially a dimensionless regression coefficient, and gives a good idea of effect size.
  • Biological examples (the bionumbers book is great for this sort of thing):
    • When thinking about reaction rates or turnover of proteins/cells, it's useful to calculate a half-life. This is the rough timescale on which the reaction/cell count will equilibrate. (And when there are many steps in a pathway, the slowest half-life typically controls the timescale for the whole pathway, so this helps us narrow in on the most important part.)
    • When thinking about sizes or distances in a cell, it's useful to compare them to the size of a typical cell.
    • When thinking about concentrations, it's useful to calculate number of molecules per cell. In general, there's noise of order sqrt(molecule count), which is a large fraction of the total count when the count is low.
  • On the moon, you can get to orbit by building a maglev and just accelerating up to orbital speed. How long does the track need to be, assuming we limit the acceleration (to avoid pancaking an passengers)? Turns out, if we limit the acceleration to n times the surface gravity, then the distance needs to be 1/n times the radius of the moon. That's the sort of clean intuitive result we hope for from dimensionless quantities.

In general, the trigger for these is something like "see a quantity for which you have no intuition/poor intuition", and the action is "express it relative to some characteristic parameter of the system".

I notice that with regards to many things I always think of at least one of the following aspects:

  • Money
  • Time
  • Risk

As each of those is quantifiable, it prompts me to actually put some numbers on the given problem.

This is a particularly helpful answer for me somehow. Thanks.

I think I might add one more: probability. For instance, "what are the base rates for people meeting good cofounders (in general, or in specific contexts)?" Knowing the answer to this might tell you how much you should make tradeoffs to optimize for working with possible cofounders. 

Though, probably "risk" and "probability" should be one category.

Whenever I want to 'optimize' something I stop and do the following 'calculation':

  1. How long does it take to do the optimization? (including this calculation)
  2. What is the effect size?
  3. Subtract one from two

I find this helps curb over-analysis, procrastination, and masturbatory optimization. Technical explanation here. There are many XKCD comics also.

I last made a spreadsheet because I received a medical bill and wanted to calculate the correct amount and estimate what the insurance company should pay.

I probably do basic sanity checks moderately often, just to see if something makes sense in context. But that's already intuition-level, almost. 

Last time I actually pulled an excel was when Taleb was against IQ and said its only use is to measure low IQ. I wanted to see if this could explain (very) large country differences. So I made a trivial model where you have parts of the population affected by various health issues that can drop the IQ by 10 points. And the answer was yes, if you actually have multiple causes and they stack up, you can end up with the incredibly low averages we see (in the 60s for some areas). 

It's an interesting example because on one hand it sounds trivial: you have shitty living conditions, you end up with shitty results. But on the other hand my mind didn't want to accept the end result of an under 80 average until I had the numbers in front of me. 

Last time I actually pulled an excel was when Taleb was against IQ and said its only use is to measure low IQ. I wanted to see if this could explain (very) large country differences. So I made a trivial model where you have parts of the population affected by various health issues that can drop the IQ by 10 points. And the answer was yes, if you actually have multiple causes and they stack up, you can end up with the incredibly low averages we see (in the 60s for some areas). 

I'm glad that I asked the alternative phrasing of my question, because this anecdote is informative!

I probably do basic sanity checks moderately often, just to see if something makes sense in context. But that's already intuition-level, almost. 

If it isn't too much trouble, can you give four more real examples of when you've done this? (They don't need to be as detailed as your first one. A sentence describing the thing you were checking is fine.)

I do it pretty rarely, so maybe not the best answerer. But I often do it when I feel like I want to compare long-term plans and one of them has a clear price while the other one only maybe does. Trying to estimate prices I'd put on things is one of a couple different decision-making tools.

A pretty good trigger for me is whenever I ask myself: "Is that plausible?" 

Recent spreadsheet situations:

  • I had a free day and I didn't have much inspiration on how to spend it. So I decided to sit down and rethink my goals, habits. This made me realize that I have a time-tracking record collected from the last 8 months, so it's a good moment to put them in a spreadsheet and analyze
  • I was organizing a birthday party and wanted to invite various groups of friends. I didn't want the party to be too big, but I wanted to know when I still have space to invite something more. So I created a spreadsheet, grouping people in columns with the probability that they will actually show up (got an invitation but no answer: 30%, confirmed no: 0%, confirmed yes: 90%).
  • I was organizing a weekend stag do for a friend, with ~10 participants. There were various activities and costs involved, and different people paid for different things. So I asked everyone to put the costs in a spreadsheet and then made a detailed calculation about who put how much money, who should carry costs for what (not everyone used both nights of accommodation for example), calculated a balance for everyone involved.

The rule I try to apply for myself is: whenever it is at all possible to open a spreadsheet and/or calculator app. On the rare occasion it's not possible (or would be impolite) the extra experience and intuition will be valuable. There's much more risk that I will underuse it than overuse it.

Can you be more specific? Presumably it was possible to open a spreadsheet when you were typing this answer, but I'm guessing that you didn't?

1Trevor Hill-Hand2moHrm, that is a good point. I suppose if I try to be more strict, it's when there is a question of what to do, there are two or more approaches, and there is some difference in quantifiable risk and/or reward between the options, and I haven't already pre-determined a best approach in advance that applies to the situation.