An incomplete list of caveats to Sharpe off the top of my head:
This is very, very cool. Having come from the functional programming world, I frequently miss these features when doing machine learning in Python, and haven't been able to easily replicate them. I think there's a lot of easy optimization that could happen in day-to-day exploratory machine learning code that bog standard pandas/scikit-learn doesn't do.
If N95 masks work, O95-100 and P95-100 masks should also work, and potentially be more effective - the stuff they filter is a superset of what N95 filters. They're normally more expensive, but in the current state I've actually found P100s cheaper than N95s.
I don't really understand what you mean by "from first principles" here. Do you mean in a way that's intuitive to you? Or in a way that includes all the proofs?
Any field of Math is typically more general than any one intuition allows, so it's a little dangerous to think in terms of what it's "really" doing. I find the way most people learn best is by starting with a small number of concrete intuitions – e.g., groups of symmetries for group theory, or posets for category theory – and gradually expanding.
In the case of Complex Analysis, I find the intuition of the Riemann Sphere to be particularly useful, though I don't have a good book recommendation.
One major confounder is that caffeine is also a painkiller, many people have mild chronic pain, and I think there's a very plausible mechanism by which painkillers improve productivity, i.e. just allowing someone to focus better.
Anecdotally, I've noticed that "resetting" caffeine tolerance is very quick compared to most drugs, taking something like 2-3 days without caffeine for several people I know, including myself.
The studies I could find on caffeine are highly contradictory, e.g. from Wikipedia, "Caffeine has been shown to have positive, negative, and no effects on long-term memory."
I'm under the impression that there's no general evidence for stimulants increasing productivity, although there are several specific cases, such as e.g. treating ADHD.
One key dimension is decomposition – I would say any gears model provides decomposition, but models can have it without gears.
For example, the error in any machine learning model can be broken down into bias + variance, which provides a useful model for debugging. But these don't feel like gears in any meaningful sense, whereas, say, bootstrapping + weak learners feel like gears in understanding Random Forests.
I think it is true that gears-level models are systematically undervalued, and that part of the reason is because of the longer payoff curve.
A simple example is debugging code: a gears-level approach is to try and understand what the code is doing and why it doesn't do what you want, a black-box approach is to try changing things somewhat randomly. Most programmers I know will agree that the gears-level approach is almost always better, but that they at least sometimes end up doing the black-box approach when tired/frustrated/stuck.
And in companies that focus too much on short-term results (most of them, IMO) will push programmers to spend far too much time on black-box debugging than is optimal.
Perhaps part of the reason why the choice appears to typically be obvious is that gears methods are underestimated.
Black-box approaches often fail to generalize within the domain, but generalize well across domains. Neural Nets may teach you less about medicine than a PGM, but they'll also get you good results in image recognition, transcription, etc.
This can lead to interesting principal-agent problems: an employee benefits more from learning something generalizable across businesses and industries, while employers will generally prefer the best domain-specific solution.
Nit: giving IQ tests is not super cheap, because it puts companies at a nebulous risk of being sued for disparate impact (see e.g. https://en.wikipedia.org/wiki/Griggs_v._Duke_Power_Co.).
I agree with all the major conclusions though.
For the orthogonal decomposition, don't you need two scalars? E.g. x=ay+bz . For example, in R2 , let x=(2,2),y=(0,1),z=(1,0). Then x=2y+2z, and there's no way to write x as y+az.