98

LESSWRONG
LW

97
Fermi EstimationWorld Modeling
Frontpage

3

[ Question ]

How to estimate confidence intervals for fermi estimate?

by Sønderjye
20th Feb 2022
1 min read
A
3
7

3

3

How to estimate confidence intervals for fermi estimate?
5Garrett Baker
3TLW
3SarahNibs
1TLW
2SarahNibs
1TLW
2gbear605
New Answer
New Comment

3 Answers sorted by
top scoring

Garrett Baker

Feb 18, 2022

50

You could use a monte-carlo simulation. That is, if you have a distribution over the possible values of parameters for your model, you can randomly select from that distribution for each parameter many, many times, then apply the model to those point values, and aggregate the results.

This is done automatically for you in the program guesstimate.

Add Comment

TLW

Feb 19, 2022

30

Can you estimate a 90% CI of candyfloss sold over a month based on that information?

Not without additional assumptions, some of which are "obviously" incorrect. In particular:

  1. If the estimates are correlated, combining estimates does not improve the CI as much as otherwise.
    1. (In the worst case, combining estimates may not improve the CI at all!)
    2. Consider, for instance, if your estimate of the number of candyfloss purchases over the month was based on a sample including the same shop as your other estimate.
  2. You need to make assumptions (or have information about) the distribution, not just the extremes.
    1. A common (and bad) assumption is that everything is a Normal distribution (or some simple transformation of a normal distribution, like log-normal).

Unfortunately, in most cases the product of two distributions is a mess. (If you want a pointer, look here.)

One notable set of exceptions is log-transformations of various distributions. (This is because in logspace the multiplication turns into a convolution, which is often a whole lot easier to calculate.)

For instance: the product of two log-normal distributions is "easy". (Of course, then you need to with with the distributions not straight CIs). Beware correlations however.

Add Comment

SarahNibs

Feb 18, 2022

30

I'm confused about what concrete question about candyfloss the example is trying to answer, But my usual heuristic for combining estimates is that in the absence of more information (or more realistically, more desire to investigate), I will assume a uniform distribution over some natural scale. For example 10k-40k is on magnitude, so pretend it's uniform over log(10k) to log(40k). 50-8000 is also on magnitude. 

Add Comment
[-]TLW4y10

Unfortunately, the product of two log-uniform distributions is not a log-uniform distribution...

Reply
2SarahNibs4y
I assumed the question was "I have two endpoints of several intervals but that's not enough to combine intervals". My answer is "assume uniform over some natural scale". If the actual question is "I don't know how to combine distributions, help" then I think my answer is "if you don't already know the answer then probably you should simulate and if you can't do that then I guess use Guesstimate because anything else will take too much scaffolding to reasonably learn".
1TLW4y
My point is precisely that "[...] for combining estimates [...] assume a uniform distribution over some natural scale"  doesn't accomplish your stated goal of being able to combine estimates.
Rendering 1/3 comments, sorted by
top scoring
(show more)
Click to highlight new comments since: Today at 9:55 AM
[-]gbear6054y20

I might just be confused here, but aren't "the number of candyfloss purchases for the month" and "the number of candyfloss sold over a month" two different ways to say the same thing?

Reply
Moderation Log
More from Sønderjye
View more
Curated and popular this week
A
3
1
Fermi EstimationWorld Modeling
Frontpage

Suppose you want to take a guess at the number of candyfloss sold over a month in some area and you would like a 90% confidence interval(CI) intead of a point estimate. You fermi estimates for two central subcomponents are:

(1) a 90% CI of the number of candyfloss a single candyfloss seller sells per month, say 10k-40k.
(2) a 90% CI of the number of candyfloss that a professional candyfloss seller sells during a month, say 50-8000.

Can you estimate a 90% CI of candyfloss sold over a month based on that information? If not, could you if you made some assumptions about the distribution of (1) or (2)(e.g. could you do it if they were uniformly distributed)? Could you use the percentiles of the root or squared fo the extremes(e.g. combining either the 0.25th(5%^2) or the 23th(square root of 5%) percentiles)?

My intuition is that you can't just multiply the extremes of (1) and (2) but I'm not confident in what you need to make event approximately correct claims.

Edited to fix an error in (1)