SatvikBeri

# Posts

Sorted by New

An incomplete list of caveats to Sharpe off the top of my head:

• We can never measure the true Sharpe of a strategy (how it would theoretically perform on average over all time), only the observed Sharpe ratio, which can be radically different, especially for strategies with significant tail risk. There are a wide variety of strategies that might have a very high observed sharpe over a few years, but much lower true Sharpe
• Sharpe typically doesn't measure costs like infrastructure or salaries, just losses to the direct fund. So e.g. you could view working at a company and earning a salary as a financial strategy with a nearly infinite sharpe, but that's not necessarily appealing. There are actually a fair number of hedge funds whose function is more similar to providing services in exchange for relatively guaranteed pay
• High-sharpe strategies are often constrained by capacity. For example, my friend once offered to pay me $51 on Venmo if I gave her$50 in cash, which is a very high return on investment given that the transaction took just a few minutes, but I doubt she would have been willing to do the same thing at a million times the scale. Similarly, there are occasionally investment strategies with very high sharpes that can only handle a relatively small amount of money

This is very, very cool. Having come from the functional programming world, I frequently miss these features when doing machine learning in Python, and haven't been able to easily replicate them. I think there's a lot of easy optimization that could happen in day-to-day exploratory machine learning code that bog standard pandas/scikit-learn doesn't do.

If N95 masks work, O95-100 and P95-100 masks should also work, and potentially be more effective - the stuff they filter is a superset of what N95 filters. They're normally more expensive, but in the current state I've actually found P100s cheaper than N95s.

Learning Abstract Math from First Principles?

I don't really understand what you mean by "from first principles" here. Do you mean in a way that's intuitive to you? Or in a way that includes all the proofs?

Any field of Math is typically more general than any one intuition allows, so it's a little dangerous to think in terms of what it's "really" doing. I find the way most people learn best is by starting with a small number of concrete intuitions – e.g., groups of symmetries for group theory, or posets for category theory – and gradually expanding.

In the case of Complex Analysis, I find the intuition of the Riemann Sphere to be particularly useful, though I don't have a good book recommendation.

Is daily caffeine consumption beneficial to productivity?

One major confounder is that caffeine is also a painkiller, many people have mild chronic pain, and I think there's a very plausible mechanism by which painkillers improve productivity, i.e. just allowing someone to focus better.

Anecdotally, I've noticed that "resetting" caffeine tolerance is very quick compared to most drugs, taking something like 2-3 days without caffeine for several people I know, including myself.

The studies I could find on caffeine are highly contradictory, e.g. from Wikipedia, "Caffeine has been shown to have positive, negative, and no effects on long-term memory."

I'm under the impression that there's no general evidence for stimulants increasing productivity, although there are several specific cases, such as e.g. treating ADHD.

Gears-Level Models are Capital Investments

One key dimension is decomposition – I would say any gears model provides decomposition, but models can have it without gears.

For example, the error in any machine learning model can be broken down into bias + variance, which provides a useful model for debugging. But these don't feel like gears in any meaningful sense, whereas, say, bootstrapping + weak learners feel like gears in understanding Random Forests.

Gears-Level Models are Capital Investments

I think it is true that gears-level models are systematically undervalued, and that part of the reason is because of the longer payoff curve.

A simple example is debugging code: a gears-level approach is to try and understand what the code is doing and why it doesn't do what you want, a black-box approach is to try changing things somewhat randomly. Most programmers I know will agree that the gears-level approach is almost always better, but that they at least sometimes end up doing the black-box approach when tired/frustrated/stuck.

And in companies that focus too much on short-term results (most of them, IMO) will push programmers to spend far too much time on black-box debugging than is optimal.

Perhaps part of the reason why the choice appears to typically be obvious is that gears methods are underestimated.

Gears-Level Models are Capital Investments

Black-box approaches often fail to generalize within the domain, but generalize well across domains. Neural Nets may teach you less about medicine than a PGM, but they'll also get you good results in image recognition, transcription, etc.

This can lead to interesting principal-agent problems: an employee benefits more from learning something generalizable across businesses and industries, while employers will generally prefer the best domain-specific solution.

Inefficient Doesn’t Mean Indifferent

Nit: giving IQ tests is not super cheap, because it puts companies at a nebulous risk of being sued for disparate impact (see e.g. https://en.wikipedia.org/wiki/Griggs_v._Duke_Power_Co.).

I agree with all the major conclusions though.

Insights from Linear Algebra Done Right

For the orthogonal decomposition, don't you need two scalars? E.g. . For example, in , let Then , and there's no way to write as