When I was deciding whether to work for Wave, I got very hung up on the fact that my “total compensation” would be “lower.”

The scare quotes are there because Wave and my previous employer, Theorem, were both early-stage startups that were paying me mostly in fake startup bucks equity. To figure out the total compensation, I tried to guess how much money the equity in each company was worth, with a thought process something like:

  • Both of these companies have been invested in by reputable, top-tier venture capitalists.
  • The market for for-profit investments is pretty efficient, and most people who think they can do better are being overconfident.
  • Who am I, a lowly 22-year-old programmer, to disagree with reputable top-tier venture capitalists? I should defer to them about the valuations.

So I valued the equity by taking the valuation each company’s VCs had invested at, and multiplied it by the fraction of the company my shares represented. That number was higher for Theorem than for Wave.

Seven years on, the Wave equity turned out to be… a lot more valuable. That raises the question: how dumb was my take? Was the actual outcome predictable if I’d thought about it in the right way?

I don’t think it was perfectly predictable, but I do think I shouldn’t have been that anchored to the market-efficiency reasoning. Those respectable, top-tier VCs had YOLOed those valuations after a couple one-hour meetings, because that’s how early-stage VC works. Meanwhile, I had worked at Theorem for a year and my then-partner had worked at Wave for nine months. Heck, I had gotten more founder time than those VCs had just during my interview process. I had way more information than “the market.”

If I’d had the confidence to use that information, I might have thought something like:

  • After its funding round, Wave continued to add users at one of the fastest paces their investors had ever seen, whereas Theorem is struggling to grow.
  • Theorem is constrained by its ability to do sales, and the founders don’t seem to be acting with enough focus or urgency to unblock that constraint. Instead, they’re distracting themselves with things like hiring machine learning interns (i.e. me).
  • The founders of Wave seem much smarter, more relentlessly resourceful, and more trustworthy.
  • Given the above, I should value the Wave equity way more even though its naive expected value is less than the Theorem equity.

Fortunately, I chose Wave for other reasons. But this thought pattern—throwing away most information in fear of using it to make overconfident judgments—shows up all the time. I’m here to tell you why I hate it.


In January 2020, my entire Twitter timeline was freaking out about a novel-seeming respiratory disease spreading in Wuhan.

Part of me thought:

  • All the reputable, top-tier technocrats are ridiculing the freaked-out people.
  • Usually, when a ragtag band of Internet weirdos thinks they know better than a large group of reputable, top-tier technocrats, the Internet weirdos are being overconfident.
  • So the technocrats are probably right on this one.

Another part of me thought:

  • Huh, the simple model of “this thing has a fast exponential growth rate and spreads when people are asymptomatic so it’s very hard to stop” seems like a compelling reason to think things will be quite bad.
  • When reputable, top-tier technocrats say not to freak out, they don’t usually address the best arguments in favor of freaking out, and they often seem like they don’t understand how exponential growth works.
  • Maybe I’ll buy a lot of beans in case everything goes to shit.

(I also contemplated the fact that the stock market didn’t seem to be freaking out, but I decided that since most people can’t beat the stock market, I probably wouldn’t either. Some braver souls than I bought puts on the S&P 500 and made a killing.)


In both of these situations, I had some mental model of what was going on (“this epidemic is growing exponentially,” “this startup seems good”) based on the particulars of the situation, but instead of using my internal model to make a prediction, I threw away all my knowledge of the particulars and instead used a simple, easy-to-apply heuristic (“experts are usually right,” “markets are efficient”).

I frequently see people leaning heavily on this type of low-information heuristic to make important decisions for themselves, or to smack down overconfident-sounding ideas from other people.

  • This startup is growing incredibly fast and the founders are some of the most effective people I’ve ever met, but at their current VC valuation, the total comp is lower than my Big Tech job so I can’t justify the move.

  • I think I could have a big impact as an academic researcher, but most grad students end up depressed and don’t land a tenure-track position, so it’s not worth trying.

  • You’re going to start a company? Are you aware that 90% of startups fail? What makes you think you and your ragtag band of weirdos are the chosen ones?

  • Who are you to be sounding the alarm about a pandemic when every past alarm has been false and all the reputable, top-tier experts say not to worry?

These all place way too much weight on the low-info heuristic.

A heuristic like that can make a good starting point when you’re not an expert in an area and don’t have very much time to think about it or dig in. This is useful in theory, but in practice, people don’t limit them to that regime—they fall back on the same heuristics even in high-context, high-investment situations, where it’s silly to throw away so much detailed context about the particulars.

What’s worse, these low-info heuristics almost always push in the direction of being less ambitious, because the low-info view of any ambitious project is that it will fail (most projects run behind schedule, most startups fail, most investors underperform the market, etc.).

The problem is that the bad consequences of underconfidence and under-ambition are severe but subtle, whereas the bad consequences of overconfidence and wishful thinking are milder but more obvious. If you’re overconfident, you’ll try things that fail, and people will laugh at you. If you’re underconfident, you’ll avoid making risky bets, and miss out on the potential upside, but nobody will know for sure what you missed.

That means it’s always tempting to do what the low-info heuristic tells you and be less ambitious—but ultimately, that ends up being worse for the world.


Why do people find low-info heuristics so compelling? A few potential reasons:

  • Many (most?) attempts to reason via specific details are wrong. Most people who think “I’m going to beat the market” don’t; most people who think “I know better than all the experts” are less Balaji Srinivasan and more Time Cube guy.

  • The reasoning and evidence backing up low-info heuristics is (relatively) legible and easily verifiable. If I claim “90% of startups fail,” I can often cite a study for support. Whereas if I claim “the markets aren’t freaking out enough about COVID,” I’d need to make a much more complicated argument to explain my reasoning.

  • It’s relatively straightforward to reason with low-info heuristics even when you’re not an expert in the domain. For something like a forecasting challenge, where forecasters need to make predictions across a wide range of topics and can’t possibly be an expert in all of them, this is very important.

  • Because it’s much more objective, reasoning via low-info heuristics gives you many fewer opportunities to fall prey to biases like optimism bias, motivated reasoning, the planning fallacy, etc.

Those are all real advantages! low-info heuristics are a great way to be more-or-less right most of the time as a non-expert, and to limit your vulnerability to overconfidence and wishful thinking.


The problem is that there are lots of ways that low-info heuristics fail or can be improved on.

For example, the efficient market hypothesis (“asset prices incorporate all available information, so it’s hard to beat the market” used in the above example to infer that “venture capitalists value companies correctly”) is justified by economic theory that relies on a few assumptions:

  • Low transaction costs: The cost of doing a trade in the market (in this case, an investment) must be near-zero so that people can use any mispricings to get rich.

  • Enough smart money: The well-informed and rational players in the market need to have enough capital to take advantage of any pricing inefficiencies that they notice.

  • No secrets: The “available information” must be available to enough of the smart money that it can be used to correct mispricings.

  • Ability to profit: There must be a way for a smart market participant to make money from a mispriced asset.

In the case of venture capital, many of these assumptions are super false. Fundraising takes a lot of time and money: transaction costs are high. Venture capitalists YOLO their valuations after a few meetings: they frequently miss important information. And it’s impossible to short-sell startups, so there’s no market mechanism to correct an overpriced company. You can see the outcome of this in the fact that there are venture capitalists that consistently beat “the market’s” returns.

But it’s not just venture capital: almost no markets fully satisfy the conditions of the EMH, and many important markets—like housing or prediction markets—strongly violate them.

Or consider the heuristic that “if internet weirdos disagree with experts, the experts are right.” What community of Internet weirdos and what community of experts? Some communities of experts are clearly bonkers, like the victims of the Sokal hoax. In other cases, a community with expertise in one narrow area might not have the context in adjacent areas or the ability to do the first-principles thinking necessary to apply their expertise correctly in the real world. For example, doctors are experts in medicine, and thus are often expected to make medical diagnoses, but only 21% of doctors are capable of doing the elementary statistical calculations necessary to turn a medical test result into the probability of having a disease.

Or consider the heuristic of the outside view: “the outcome of this situation will probably be similar to the outcome of similar past situations.” Suppose you’re using this to judge how likely a startup is to succeed. Sure, you could predict it based on the distribution of outcomes across all startups at a similar stage and valuation. But that would throw away almost all information you have about the particular startup at hand. It ignores tons of important questions, like:

You could imagine trying to incorporate info like this into your outside-view analysis, by, e.g., looking at outcomes specifically of all startups that have grown by 10x in a single year. But that kind of information is so private and closely guarded that you probably can’t do that analysis. For some of the other traits, e.g. “how determined are the founders,” we don’t even have a good enough way of measuring that trait that you could do the analysis even in principle.

Sometimes I see people use the low-info heuristic as a “baseline” and then apply some sort of “fudge factor” for the illegible information that isn’t incorporated into the baseline—something like “the baseline probability of this startup succeeding is 10%, but the founders seem really determined so I’ll guesstimate that gives them a 50% higher probability of success.” In principle I could imagine this working reasonably well, but in practice most people who do this aren’t willing to apply as large of a fudge factor as appropriate. Strong evidence is common:

One time, someone asked me what my name was. I said, “Mark Xu.” Afterward, they probably believed my name was “Mark Xu.” I’m guessing they would have happily accepted a bet at 20:1 odds that my driver’s license would say “Mark Xu” on it.

The prior odds that someone’s name is “Mark Xu” are generously 1:1,000,000. Posterior odds of 20:1 implies that the odds ratio of me saying “Mark Xu” is 20,000,000:1, or roughly 24 bits of evidence. That’s a lot of evidence.

… One implication of the Efficient Market Hypothesis (EMH) is that is it difficult to make money on the stock market. Generously, maybe only the top 1% of traders will be profitable. How difficult is it to get into the top 1% of traders? To be 50% sure you’re in the top 1%, you only need 200:1 evidence. This seemingly large odds ratio might be easy to get.


In fact, outperforming low-info heuristics isn’t just possible; it’s practically mandatory if you want to have an outsized impact on the world. That’s because leaning too heavily on low-info heuristics pushes people away from being ambitious or trying to search for outliers.

Most important things in life—jobs, hires, companies, ideas, partners, etc.—have a distribution of outcomes where the best possible choices are outliers that are dramatically better than the typical ones. In my case, for example, choosing to work at Wave was probably 10x better than staying at my previous employer: I learned more, gained responsibility faster, had a bigger impact on the world, etc.

Unfortunately, low-info heuristics tell you that outliers can’t exist. By definition, most members of any group are not outliers, so any generalized heuristic will predict that whatever you’re looking at isn’t an outlier either. If you index too heavily on what the average outcome is, you’re deliberately blinding yourself to the possibility of finding an outlier.

This is especially bad when someone uses this kind of reasoning to smack down other people’s ambition, because the payoffs are asymmetric. If you incorrectly tell someone that their ambitious idea is likely to succeed, then they’ll waste their time on a failed idea, which is not great, but ultimately fine. But if you smack them down with low-info heuristics and convince them their idea is likely to fail, you rob the world of an awesome idea that would have existed otherwise. Shame on you! (Too bad you’ll never know about it.)


OK, so what should you do instead of relying on low-info heuristics? Here are my suggestions:

  • Build gears-level models of the decision you’re trying to make. If you’re deciding, e.g., where to work, try to understand what makes different jobs awesome or terrible for you.

  • Think really hard about the problem. Most inside views are wrong—to stand a fighting chance of beating the outside view, you’ll need to put a lot of effort in.

  • Don’t fool yourself with motivated reasoning. Stress-test your ideas; ask yourself what the best arguments against your inside view are and see if you can rebut them.

    • To the extent that you do use low-info heuristics, use them as a stress test rather than a default belief. “90% of startups fail” is useful to know as a warning to try to mitigate failure modes. It’s dangerous when you hear it and stop thinking there.
  • Don’t be afraid to try ambitious things where the downside of failing is low, and the upside of succeeding is high!

Thanks to draft readers Irene Chen, Milan Cvitkovic, and Sam Zimmerman.

152

New Comment
23 comments, sorted by Click to highlight new comments since: Today at 5:41 PM

I have a few ideas for subtitles:

  • "Quitting your job is a legal way to cash in on insider information. It's the only way to short a startup."
  • "Many markets fail to meet the assumptions of the EMH, and that's you're opportunity."
  • "Each expert has one piece of the puzzle, but nobody knows exactly how they all fit together."
  • "Why don't I just trust the experts? Because they're not answering the questions I'm asking."

This is a nice post that echoes many points in Eliezer's book Inadequate Equilibria. In short, it is entirely possible that you outperform 'experts' or 'the market', if there are reasons to believe that these systems converge to a sub-optimal equilibrium, and even more so when you have more information that the 'experts', like in your Wave vs Theorem example. 

More related LW concepts: Hero Licensing, and a few essays in the Inside/Outside View tag.

I'd like to push back a bit against the downsides of being overconfident, which I think you undersell. Investing in a bad stock could lose you all your investment money (shorting even more so). Pursuing an ultimately bad startup idea might not hurt too much, unless you've gotten far enough that you have offices and VC dollars and people who need their paychecks. For something like COVID, mere overstocking of supplies probably won't hurt, but you'll lose a lot of social clout if you decide to get to a bunker for something that may end up harmless.

Risk is risk, and the more invested you are in something, the more you have to lose - stocks, startups, respiratory diseases. I fear being overconfident would lead to a lot of failure and pain. Almost everything in idea space is wrong, and humanity has clustered around the stuff that's mostly right already.

In light of the FTX thing, maybe a particularly important heuristic is to notice cases where the worst-case is not lower-bounded at zero. Examples:

  • Shorting stock vs buying put options
  • Running an ambitious startup that fails is usually just zero, but what if it's committed funding & tied its reputation to lots of important things that will now struggle? 
  • More twistily -- what if you're committing to a course of action s.t. you'll likely feel immense pressure to take negative-EV actions later on, like committing fraud in order to save your company or pushing for more AI progress so you can stay in the lead?

Not that you should definitely not do things that potentially have large-negative downsides, but you can be a lot more willing to experiment when the downside is capped at zero.

Being overconfident on places like Lesswrong invites others to correct you. This is good for your rate of learning. I'll often write things here that I'm not entirely sure about without using weasel words, hoping to learn something new.

Am with you very much here. Recently decided that I need to start doing this more often. Negative karma isn't really negative karma if you've learned something from the experience.

I think this is sort of a naive approach to this problem. 

For one, startup valuations are very high variance. It's impossible to know if you were right or lucky in the case you cite.  Although you do make a plausible case you had more information than the VCs who invested.

The the real reason for modesty is the status quo for a lot of systems is at or near optimal. Especially in areas where competitive pressures are strong.  Building gears level models can help. But doing that with sufficient fidelity is hard. Because even insiders often don't understand the system with enough granularity to sufficiently model it.

See also the contrarianism sequence https://www.lesswrong.com/tag/contrarianism .  There are PLENTY of topics where mainstream consensus is serving different purposes (social cohesion, status for elites, arbitrage opportunities for the well-connected, compliance encouragement for the proles, etc.) than you have for the questions they appear to be answering.  

For pure financial speculation, the EMH does hold, but only in aggregate over fairly long time periods.  The short-seller's adage applies to almost everything else: the market can stay irrational longer than you can stay liquid.

I fully support your advice, but would like to add that you probably can't spend that much time/energy on every topic - you have to decide what things are worth understanding deeply enough to know whether to disagree with the common wisdom.

Even for financial speculation, this is possible. For example, after Black Thursday, the bankers bought lots of stock to increase the price because they wanted to avoid a larger crash that affected their companies predicting other people would follow suit if they made the "prediction" that the market would go back up.

 

Reference: https://en.wikipedia.org/wiki/Wall_Street_Crash_of_1929#:~:text=Whitney%20placed%20a%20bid%20to%20purchase%2025%2C000%20shares%20of%20U.S.%20Steel%20at%20%24205%20per%20share%2C%20a%20price%20well%20above%20the%20current%20market.

Is there anything relevant to say about the interplay between the benefits to searching for outliers vs. rising central bank interest rates? I'm not sure how startups fare in different economic circumstances, but at least speculative investments are a better bet when interest rates are low. See e.g. this Matt Yglesias article:

When interest rates are low and “money now” has very little value compared to “money in the future,” it makes sense to take a lot of speculative long shots in hopes of getting a big score...

At the end of the day, venture capital is just a slightly odd line of endeavor where flopping a lot is fine as long as you score some hits... Good investors are able to internalize the much more abstract nature of finance and embrace prudent levels of embarrassing failure.

But what I think the VC mindset tended to miss was the extent to which the entire “take big swings and hope for the best” mindset was itself significantly downstream of macroeconomic conditions rather than being some kind of objectively correct life philosophy.

 

With interest rates higher, you have a structural shift in business thinking toward “I’d like some money now.” Something really boring like mortgage lending now has a decent return, so you don’t need Bitcoin. And if your company is profitable, shareholders would like to see some dividends. If it’s not profitable, they would like to see some profits...

Higher interest rates mean rational actors’ discount rates are rising, so everyone is acting more impatiently.

Curated. The question of inside view vs outside-view and expert deference vs own models has been debated before on LessWrong (and EA Forum), but this post does a superb job of making the case for the "use your own models more, trust your own information, be willing to go against the crowd and experts". It articulates the case clearly and crisply, in a way that I think is possibly more compelling than other sources.

A few points I particularly like:

The identification of selection bias on evidence in different directions:

The problem is that the bad consequences of underconfidence and under-ambition are severe but subtle, whereas the bad consequences of overconfidence and wishful thinking are milder but more obvious. If you’re overconfident, you’ll try things that fail, and people will laugh at you. If you’re underconfident, you’ll avoid making risky bets, and miss out on the potential upside, but nobody will know for sure what you missed.

That relying on mainstream/expert views won't allow for finding outliers, and finding outliers is crucial to outsized impact:

In fact, outperforming low-info heuristics isn’t just possible; it’s practically mandatory if you want to have an outsized impact on the world. That’s because leaning too heavily on low-info heuristics pushes people away from being ambitious or trying to search for outliers.

Most important things in life—jobs, hires, companies, ideas, partners, etc.—have a distribution of outcomes where the best possible choices are outliers that are dramatically better than the typical ones. In my case, for example, choosing to work at Wave was probably 10x better than staying at my previous employer: I learned more, gained responsibility faster, had a bigger impact on the world, etc.

Sometimes I see people use the low-info heuristic as a “baseline” and then apply some sort of “fudge factor” for the illegible information that isn’t incorporated into the baseline—something like “the baseline probability of this startup succeeding is 10%, but the founders seem really determined so I’ll guesstimate that gives them a 50% higher probability of success.” In principle I could imagine this working reasonably well, but in practice most people who do this aren’t willing to apply as large of a fudge factor as appropriate.

 

The last company I worked for was a tech scouting, market research, and consulting firm, and a big part of what they do is profile start-ups, using a standard format and scorecard, based on a 1 hour interview + background knowledge of an industry. One time they bought a data science company and turned them loose on a decade of profiles, and found several results like "hey, if this score is a 4/5 or 5/5 then the company is 2x or 4x more likely to have a successful exit, respectively." They put this in a white paper, sent it out to clients, and then... nothing. Never used it for marketing, sales, internal research process improvement. It always seemed bizarre to me, that "Hey, we know our process can quadruple your odds of finding startups that will succeed," when that was our whole job, just... didn't seem to motivate the people in charge. 

In any case, my point is, it is very easy to find subsets of companies that outperform the 90% failure figure if that is what you optimize for, and if what you hear isn't only filtered through the way the startups frame their pitches to investors.

Post summary (feel free to suggest edits!):
The author gives examples where their internal mental model suggested one conclusion, but a low-information heuristic like expert or market consensus differed, so they deferred. This included:

  • Valuing Theorem equity over Wave equity, despite Wave’s founders being very resourceful and adding users at a huge pace.
  • In the early days of Covid, dismissing it despite exponential growth and asymptomatic spread seeming intrinsically scary.

Another common case of this principle is assuming something won’t work in a particular case, because the stats for the general case are bad. (eg. ‘90% of startups fail - why would this one succeed?’), or assuming something will happen similarly to past situations.

Because the largest impact comes from outlier situations, outperforming these heuristics is important. The author suggests that for important decisions people should build a gears-level model of the decision, put substantial time into building an inside view, and use heuristics to stress test those views. They also suggest being ambitious, particularly when it’s high upside and low downside.

(If you'd like to see more summaries of top EA and LW forum posts, check out the Weekly Summaries series.)

I think the gears-level models are really the key here.  Without a gears-level model, you are flying blind, and the outside view is very helpful when you're flying blind.  But with a solid understanding of the causal mechanisms in a system, you don't need to rely on others' opinions to make good predictions and decisions. 

Another potential assumption/limitation of the EMH:

  • Socially acceptable to trade: It must be socially acceptable for people who have enough financial resources to noticeably affect market prices to trade based on the new information.

I initially proposed this idea to try to explain the market's slow response to the early warning signs of Covid in this comment. Similar dynamics may come into play with respect to the social acceptability of ESG vs anti-ESG investing based on political affiliation, although in this case I don't think there is enough anti-ESG money to affect the prevailing ESG trends much at this point.

It’s not merely a sufficient amount of money. For market intelligence, you’re looking for a sufficient number of participants—or enough people looking for inefficiencies that they get resolved quickly.

You’ll notice the indices are quite efficient (outside of the passive investment bubble, but that’s another topic) while many individual, low-volume stocks are not. Google equity markets will be more efficient than some penny stock. Another good example is the competition in prediction markets. PredictIt is quite prescient often, while certain lightly-used crypto prediction markets have few users (likely due to high barriers to use) and therefore feature far more inefficiencies.

You want a lot of highly qualified eyeballs.

I don't know why I get these Less Wrong articles in my email, but I read this one because of a startling premise: choosing a job based on its monetary value as an investment. I don't suppose there's anything wrong with that, it's just a bit mind-blowing for me. Maybe culture shock? But if so, what culture is this?

Making judgments with limited information is a thing, and what you say about asymmetric loss functions makes total sense. (In other words, I'm on board with the point you were trying to make with this article.) It's just the idea of applying it to choosing a job, with total dollars earned as an optimization function that surprises me. Maybe that's what you meant by

Fortunately, I chose Wave for other reasons.

The "other reasons" are probably the reasons I would generally think of, such as whether the things you get to work on are interesting, how much self-direction you get or want, what you think should change in the world and whether the job makes an impact in that area, how it fits into the rest of your life, like the length of the commute, etc.

FYI you get emails because you once subscribed to our curated email list (presumably). The emails should have an unsubscribe button if you no longer want to receive them.

Do you not choose jobs based (in part) on salary?

No, but I can see how it may be necessary. I guess I've been lucky that so far my interests have aligned with jobs that pay well enough for it to not be an issue—I'm sure some fields are more constrained than others. I didn't think that this would apply to programming, though. (That's my field, too.)

You are probably subscribed to curated emails (checkbox at signup), you can turn those off in your account settings if you wish.

[+][comment deleted]2mo 20

New to LessWrong?