According to the Wiki page, it looks like if you are doing max(a, .9a + .5b), that the worst case error is more like 2%, as you can see from their second table.
I'd also note that a + 1/2 b has worst case error of 12%, which might be good enough if you are doing Fermi estimates. It's also easily explainable: sqrt(a^2 + b^2) = a sqrt(1 + (b/a)^2) whose first order taylor series approximation is a(1 + 1/2 b/a) = a + 1/2 b. You'll want a to be the max and b to be the min for a better approximation. This will always overshoot, as it's an alternating series.
If you've already used the (1 + x)^r ≈ 1 + rx trick before, then you don't have to remember anything new. You can then see the alpha=.9 as correcting for the overshoot, and the max(a, .9a + .5b) as correcting for the obviously wrong results when b is small (as the Pythagorean sum can't be smaller than the maximum of the inputs!).
sqrt(a^2 + b^2) = a sqrt(1 + (b/a)^2) whose first order taylor series approximation is a(1 + 1/2 b/a) = a + 1/2 b
isn't the first order approximation a(1 + 1/2 b^2/a^2)?
What I do mentally is:
Examples
Not sure how they compare in accuracy but it seems like your method is simpler, at least if you remember that they cross when b is 20% of a
TL;DR: Instead of labouriously computing , we can mentally calculate using the alpha-max plus beta-min algorithm, by estimating
and this will be very close to the actual . This is useful for adding up sources of variance, or figuring out radiuses, or other such things.
Background
The mathematical relationship is surprisingly common. It happens among other things in
When it shows up, it’s often because one of the variables is unknown, i.e. we have either
The annoying part is that these are hard to mentally calculate, even when one is good at estimating squares and square roots (e.g. because of previous logarithm practice) because numbers grow large when squared.
Insight
I just had a flash of insight. Maybe the problem is thinking of this as three separate operations (square, add, take the root). What if instead we think of it as one fundamental, composite operation? We could call it ⊞ (Unicode name apt: squared plus), and define it as
and then we could use spaced repetition to train ourselves in evaluating it mentally, much like we would do with multiplication tables and logarithms. Then we’d never have to deal with this annoyance again! Given two sources of variation measured in standard deviations, we would instantly know the total variation – again, in standard deviations. That would be much more intuitive.
The one problem is that we’d also have to learn the inverse operation,
The ⊞ operation should be fairly easy to learn, because its contour lines form concentric circles radiating out from the origin. The ⊟ operation might be trickier, because its contour lines are a more weirdly shaped conic section.
Prior art
It turns out I’m not the first person to have thought about this. There’s a research paper out of IBM from the early 1980’s where the authors have come up with a method for computers to evaluate ⊞ with a high rate of convergence.[1] The method is very cool. Given a point (x,y), the authors have found a way to nudge that point along the radius of a circle down toward the abscissa, so that when the y-value is sufficiently small, the x-value is equal to the radius. However, iterative algorithms like this aren’t well suited for mental arithmetic.[2]
However, there’s also a great method to do it as a human. To evaluate , assuming is the larger number (if it is not, swap them):
This is an estimation and it does come with an error, but the error is at worst 3 %, and on average it is 1.5 %. That’s remarkable for such an easy procedure. To be clear, we are only shaving a tenth off of the larger number, and adding back in half of the smaller number, and this is very close to being the square root of the sum of their squares!
The reason this method is called alpha-max plus beta-min is that while we used and because that was convenient for mental maths, other parameters exist, and some are slightly more accurate.
Inverting
The great thing about this algorithm is that it’s easy to invert, too. If we only have the total and one of the terms , we can subtract and get either
or
depending on whether we have the small or the large term.
Example
For a concrete example of how to use this, let’s say we know men are on average 12 cm taller than women. The average height of a person with known sex then corresponds to a coin toss that can land either +6 or −6, which gives it a standard deviation of 6 cm. We also know that within the groups of men and women separately, the standard deviation of stature is 7 cm.
Then the total variation of stature, across both men and women, ought to be around
and that would indeed be what we found if we went out and randomly picked people across the globe and measured their height. How cool is that? I did not expect to be able to mentally add sources of variance.
Replacing Square Roots by Pythagorean Sums; Moler & Morrison; IBM Journal of Research and Development; 1981.
I do hear about people who can refine approximations by running a couple of iterations of Newton–Raphson in their heads. I want to be like those people, but I am not.