Improvement Without Superstition

Zachary Jacobi

When you make continuous, incremental improvements to something, one of two things can happen. You can improve it a lot, or you can fall into superstition. I'm not talking about black cats or broken mirrors, but rather humans becoming addicted to whichever steps were last seen to work, instead of whichever steps produce their goal.

I've seen superstition develop first hand. It happened in one of the places you might least expect it – in a biochemistry lab. In the summer of 2015, I found myself trying to understand which mutants of a certain protein were more stable than the wildtype. Because science is perpetually underfunded, the computer that drove the equipment we were using was ancient and frequently crashed. Each crash wiped out an hour or two of painstaking, hurried labour and meant we had less time to use the instrument to collect actual data. We really wanted to avoid crashes! Therefore, over the course of that summer, we came up with about 12 different things to do before each experiment (in sequence) to prevent them from happening.

We were sure that 10 out of the 12 things were probably useless, we just didn't know which ten. There may have been no good reason that opening the instrument, closing, it, then opening it again to load our sample would prevent computer crashes, but as far as we could tell when we did that, the machine crashed far less. It was the same for the other eleven. More self-aware than I, the graduate student I worked with joked to me: "this is how superstitions get started" and I laughed along. Until I read two articles in The New Yorker.

In The Score (How Childbirth Went Industrial), Dr. Atul Gawande talks about the influence of the Apgar score on childbirth. Through a process of continuous competition and optimization, doctors have found out ways to increase the Apgar scores of infants in their first five minutes of life – and how to deal with difficult births in ways that maximize their Apgar scores. The result of this has been a shocking (six-fold) decrease in infant mortality. And all of this is despite the fact that according to Gawande, "[in] a ranking of medical specialties according to their use of hard evidence from randomized clinical trials, obstetrics came in last. Obstetricians did few randomized trials, and when they did they ignored the results."

Similarly, in The Bell Curve (What happens when patients find out how good their doctors really are), Gawande found that the differences between the best CF (cystic fibrosis) treatment centres and the rest turned out to hinge on how rigorously each centre followed the guidelines established by big clinical trials. That is to say, those that followed the accepted standard of care to the letter had much lower survival rates than those that hared off after any potentially lifesaving idea.

It seems that obstetricians and CF specialists were able to get incredible results without too much in the way of superstitions. Even things that look at first glance to be minor superstitions often turned out not to be. For example, when Gawande looked deeper into a series of studies that showed forceps were as good as or better than Caesarian sections, he was told by an experienced obstetrician (who was himself quite skilled with forceps) that these trials probably benefitted from serious selection effects (in general, only doctors particularly confident in their forceps skills volunteer for studies of them). If forceps were used on the same industrial scale as Caesarian sections, that doctor suspected that they'd end up worse.

But I don't want to give the impression that there's something about medicine as a field that allows doctors to make these sorts of improvements without superstition. In The Emperor of all Maladies, Dr. Siddhartha Mukherjee spends some time talking about the now discontinued practices of "super-radical" mastectomy and "radical" chemotherapy. In both treatments, doctors believed that if some amount of a treatment was good, more must be better. And for a while, it seemed better. Cancer survival rates improved after these procedures were introduced.

But randomized controlled trials showed that there was no benefit to those invasive, destructive procedures beyond that offered by their less-radical equivalents. Despite this evidence, surgeons and oncologists clung to these treatments with an almost religious zeal, long after they should have given up and abandoned them. Perhaps they couldn't bear to believe that they had needlessly poisoned or maimed their patients. Or perhaps the superstition was so strong that they felt they were courting doom by doing anything else.

The simplest way to avoid superstition is to wait for large scale trials. But from both Gawande articles, I get a sense that matches with anecdotal evidence from my own life and that of my friends. It's the sense that if you want to do something, anything, important – if you want to increase your productivity or manage your depression/anxiety, or keep CF patients alive – you're likely to do much better if you take the large scale empirical results and use them as a springboard (or ignore them entirely if they don't seem to work for you).

For people interested in nootropics, melatonin, or vitamins, there's self-blinding trials, which provide many of the benefits of larger trials without the wait. But for other interventions, it's very hard to effectively blind yourself. If you want to see if meditation improves your focus, for example, then you can't really hide the fact that you meditated on certain days from yourself [1].

When I think about how far from the established evidence I've gone to increase my productivity, I worry about the chance I could become superstitious.

For example, trigger-action plans (TAPs) have a lot of evidence behind them. They're also entirely useless to me (I think because I lack a visual imagination with which to prepare a trigger) and I haven't tried to make one in years. The Pomodoro method is widely used to increase productivity, but I find I work much better when I cut out the breaks entirely – or work through them and later take an equivalent amount of time off whenever I please. I use pomos only as a convenient, easy to Beemind measure of how long I worked on something.

I know modest epistemologies are supposed to be out of favour now, but I think it can be useful to pause, reflect, and wonder: when is one like the doctors saving CF patients and when is one like the doctors doing super-radical mastectomies? I've written at length about the productivity regime I've developed. How much of it is chaff?

It is undeniable that I am better at things. I've rigorously tracked the outputs on Beeminder and the graphs don't lie. Last year I averaged 20,000 words per month. This year, it's 30,000. When I started my blog more than a year ago, I thought I'd be happy if I could publish something once per month. This year, I've published 1.1 times per week.

But people get better over time. The uselessness of super-radical mastectomies was masked by other cancer treatments getting better. Survival rates went up, but when the accounting was finished, none of that was to the credit of those surgeries.

And it's not just uselessness that I'm worried about, but also harm; it's possible that my habits have constrained my natural development, rather than promoting it. This has happened in the past, when poorly chosen metrics made me fall victim to Campbell's Law.

From the perspective of avoiding superstition: even if you believe that medicine cannot wait for placebo controlled trials to try new, potentially life-saving treatments, surely you must admit that placebo controlled trials are good for determining which things aren't worth it (take as an example the very common knee surgery, arthroscopic partial meniscectomy, which has repeatedly performed no better than sham surgery when subjected to controlled trials).

Scott Alexander recently wrote about an exciting new antidepressant failing in Stage I trials. When the drug was first announced, a few brave souls managed to synthesize some. When they tried it, they reported amazing results, results that we now know to have been placebo. Look. You aren't getting an experimental drug synthesized and trying it unless you're pretty familiar with nootropics. Is the state of self-experimentation really that poor among the nootropics community? Or is it really hard to figure out if something works on you or not [2]?

Still, reflection isn't the same thing as abandoning the inside view entirely. I've been thinking up heuristics since I read Dr. Gawande's articles; armed with these, I expect to have a reasonable shot at knowing when I'm at risk of becoming superstitious. They are:

- If you genuinely care only about the outcome, not the techniques you use to attain it, you're less likely to mislead yourself (beware the person with a favourite technique or a vested interest!).

- If the thing you're trying to improve doesn't tend to get better on its own and you're only trying one potentially successful intervention at a time, fewer of your interventions will turn out to be superstitions and you'll need to prune less often (much can be masked by a steady rate of change!).

- If you regularly abandon sunk costs ("You abandon a sunk cost. You didn’t want to. It’s crying."), superstitions do less damage, so you can afford to spend less mental effort on avoid them.

Finally, it might be that you don't care that some effects are placebo, so long as you get them and get them repeatedly. That's what happened with the experiment I worked on that summer. We knew we were superstitious, but we didn't care. We just needed enough data to publish. And eventually, we got it.

Footnotes:

[1] Even so, there are things you can do here to get useful information. For example, you could get in the habit of collecting information on yourself for a month or so (like happiness, focus, etc.), then try several combinations of interventions you think might work (e.g. A, B, C, AB, BC, CA, ABC, then back to baseline) for a few weeks each. Assuming that at least one of the interventions doesn't work, you'll have a placebo to compare against. Although be sure to correct any results for multiple comparisons.

[2] That people still buy anything from HVMN (after they rebranded themselves in what might have been an attempt to avoid a study showing their product did no better than coffee) actually makes me suspect the latter explanation is true, but still.

[-]Elizabeth6y40

Alternate explanation for hospital results: bad hospitals try more things to improve, and when picking things to try they naturally pick from the latest research-approved techniques.

[-]Ben Pace6y30

I promoted this to featured, for these three reasons:

I appreciated the real-world examples/stories a lot
It helped me understand inadequacy analysis much better, by getting the conrete intuiton for looking at a system do insane things and wondering whether it’s secretly sane or whether it’s just as insane as it looks.
I liked the concrete recommendations at the end, even if these ones felt a bit vague

It also affected my decision a small amount that you're a new author on this site, and I want to make sure new authors get seen. However the post was great on its own and I probably would've promoted it regardless.

[-]JamesFaville6y30

Upvoted mostly for surprising examples about obstetrics and CF treatment and for a cool choice of topic. I think your question, "when is one like the doctors saving CF patients and when is one like the doctors doing super-radical mastectomies?" is an important one to ask, and distinct from questions about modest epistomology.

Say there is a set $A$ of available actions of which a subset $A^{'} \subset A$ have been studied intensively enough that their utility is known with high degree of certainty, but that the utility of the other available actions in $A$ is uncertain. Then your ability to surpass the performance of an agent who chooses actions only from $A^{'}$ essentially comes down to a combination of whether choosing uncertain-utility actions from $A ∖ A^{'}$ precludes also picking high-utility actions from $A^{'}$ , and what the expected payoff is from choosing uncertain-utility actions in $A ∖ A^{'}$ according to your best information.

I think you could theoretically model many domains like this, and work things out just by maximizing your expected utility. But it would be nice to have some better heuristics to use in daily life. I think the most important questions to ask yourself are really (i) how likely are you to horribly screw things up by picking an uncertain-utility action, and (ii) do you care enough about the problem you're looking at to take lots of actions that have a low chance of being harmful, but a small chance of being positive.

[-]Zachary Jacobi6y40

This formulation reminds me somewhat of the Bayesian approach to the likelihood of research being true from Ionnidis 2005 (Why Most Published Research Findings Are False).

[-]ChristianKl6y20

One of the implicit points of the story about the obstetricians might be that evidence-based medicine it itself a superstition. Many of the arguments for it are mainly theoretical and not based on empirics.

[-]Куля Ботаніки6y00

Although I don't think all of your heuristics apply to all situations. Sometimes an action might not be necessary, or even the best of non-necessary, to be seen as reasonable, if it generates desire to go further.

For example, my child is learning to write, read, speak clearly, type, find places on maps and do arithmetics. I know they'll teach him that in school, and that he will have to practice during various lessons, so from my point of view his skills are going to just improve on their own, and there's no real hurry to make it happen. I still think it's useful to support his own efforts in making "weather forecasts" (using official websites for information), because he gets to do some visible work and it's actually fun.

In some labs, they say that some people just cause machines to stop functioning without doing anything to the machines, and one should avoid their company. (I don't mean you - it's what a relative of mine was believed to do some thirty years back.) Superstition? Perhaps. Or perhaps these "agents of evil" distract those who are working and they make mistakes? Anyway, it is hard to separate office beliefs into superstitions proper and multidimensional duct tape.

That reminds me of the story of a biochemical lab where one woman effected an experiment through the parfume she was wearing.

LESSWRONG
LW

Improvement Without Superstition

38

Footnotes:

38