Bucket Errors

CFAR!Duncan

Author's note: There is a preexisting standalone essay on bucket errors by CFAR cofounder Anna Salamon available here. The version in the handbook is similar, but has enough disoverlap that it seemed worth including it rather than just adding the standalone post to the sequence.

Epistemic status: Mixed

The concept of question substitution, which underlies and informs this chapter, is one that is well-researched and well-documented, particularly in the work of Daniel Kahneman. The idea of “bucket errors” is one generated by CFAR staff and has no formal research behind it, but it has resonated with a majority of our alumni and seems like a reasonable model for a common class of human behaviors.

Humans don’t simply experience reality. We interpret it.

There’s some evidence that this is true “all the way down,” for literally everything we perceive. The predictive processing model of cognition posits that even very basic sensations like sight and touch are heavily moderated by a set of top-down control systems, predictions, and assumptions—that even as the photons are hitting our receptors, we’re on some level anticipating them, already attempting to define them and categorize them and organize them into sensible clusters. It’s not just a swirl of green and brown, it’s a tree—and we almost can’t stop ourselves from seeing the tree, and go back to something like unmediated perception.

CFAR’s concept of “buckets” is a similar idea on a broader scale. The claim is that reality is delivering to you a constant stream of experiences, and that—most of the time—you are categorizing those experiences into pre-existing mental buckets. Those buckets have titles like “do they like me?” and “is this a good idea?” and “what’s my boss like?” and “Chinese food?”

If you think of your mental architecture as being made up of a large number of beliefs, then the buckets contain the piles of evidence that lie behind and support those beliefs. Or, to put it another way, you know whether or not you like Chinese food because you can look into the bucket containing all of your memories and experiences of Chinese food and sum them up.

As another example, let’s say Sally is a young elementary school student with a belief that she is a good writer. That belief didn’t come out of nowhere—it started with observations that (say) whenever she turned in a paper, her teacher would smile and put a star-shaped sticker on it.

At first, observations like that probably fell into all sorts of different buckets, because Sally didn’t have a bucket for “am I a good writer?” But at some point, some pattern-detecting part of her brain made the link between several different experiences, and Sally (probably unconsciously) started to track the hypothesis “I am good at writing.” She formed a “good at writing” bucket, and started putting more and more of her experiences into it.

The problem (from CFAR’s perspective) is that that isn’t the only label on that bucket.

Bucket errors

One day, Sally turned in a paper and it came back without a gold star.

“Sally, this is wonderful!” says Sally’s teacher. “But I notice that you misspelled the word ‘ocean,’ here.”
“No, I didn’t!” says Sally, somewhat forcefully.
Her teacher is a bit apologetic, but persists. “Ocean is spelled with a ‘c’ rather than a ‘sh’... remember when we learned the rule that if there's an ‘e’ after the ‘c’, that changes its sound—”
“No, it’s spelled oshun, I saw it in a book—”
“Look,” says the teacher, gently but firmly. “I know it hurts to notice when we make mistakes. But it’s important to see them, so that you can do better next time. Here, let’s get the dictionary and check—”
“No!” shouts Sally, as she bursts into tears and runs away to hide. As she vanishes into the closet, the teacher can just barely hear the words “I didn’t misspell anything! I can too be a writer!”

One way that you can understand what’s happening for Sally is that her head contains a single “bucket” that is capturing data on three different variables:

All three questions are entangled; Sally’s worldview is such that they all have to share the same answer. Previously, that answer has been “Yes!” But now, her teacher is threatening to drop incontrovertible evidence of “no” into the bucket, and as a result, Sally is somewhat flipping out.

It’s important to note that what Sally is doing is actually good, if we take the current state of her belief structure as a given. Ideally, she would be able to update her belief structure to fix the entanglement (more on that below), but in the world where all those questions share a single answer, it’s clearly better for her to plug her ears than to erroneously switch to the belief that she will never succeed as a writer.

There is data coming in that, if it were allowed to land according to normal operating procedures, would force a drastic and possibly destructive update (“I’m no good at writing”), and so in response, some subconscious mechanism in Sally’s brain is hitting the brakes. Without really being aware of what her brain is doing, Sally is sacrificing some ability to recognize her mistakes in order to prevent herself from making a very very wrong sort of update that could have a lot of negative consequences. The new information is at risk of being double-counted in a way that is simultaneously unjustified and unhelpful, and the rejection of that data—the way that Sally runs off in distress—is a viable patch. It’s a reflexive, self-protective measure that’s probably not the best way to deal with the problem, but is better than just forcing herself to absorb information she’s not ready to process, and reaching a disastrous conclusion as a result.

Below are some other situations in which people are similarly loath to integrate data due to some underlying problem with their bucketing. Note that the point of these examples is to help you get the overall pattern—you don’t need to read every single one. Once you “get it,” you can skip ahead to the next section.

Kieran shows up at work dressed in new clothes. Lex smiles as Kieran walks in, and says that the outfit is awesome, and that Kieran looks great. Kieran smiles back and is clearly experiencing some significant warm-fuzzies as a result of the compliment. Later, though, Jesse walks into the office, looks over at Kieran, and makes a squidge-face. “What’s that about?” Kieran asks. “What? Oh—nothing,” Jesse says, and changes the subject. Kieran doesn’t press the issue, but anyone looking at them from the outside could see that they’re feeling something like panic-anxiety-doubt, and they seem to be more derailed than one would expect by what was really just a flickering expression on Jesse’s face.

Bryce is a college student with interest in Effective Altruism—moderately liberal, supportive of evidence-based policies, concerned with reducing suffering, taking a mix of technical and nontechnical classes, and trying to figure out how best to balance personal satisfaction and overall impact after graduation. Bryce’s friend Courtney has recently been reading a lot about existential risk, and keeps trying to engage Bryce in conversation about new ideas and open questions in that sphere. However, Bryce keeps shutting Courtney down, loudly insisting that the whole topic is just a Pascal’s mugging and that it’s not worth the time that would be wasted going around in circles with unfalsifiable hypotheticals.

Quinn has recently made progress in disentangling and understanding the dynamics behind a large, sticky bug that has previously been immune to change. Quinn now has a plan that it seems reasonable to be confident and optimistic about. However, Quinn’s friends keep coming up with advice and suggestions and thinly-veiled probes, recommending that Quinn read this-or-that and talk to such-and-such and look into trying X, Y, or Z. It’s been going on for a while, now, and Quinn is starting to get a hair-trigger around the topic—it’s as if Quinn’s friends aren't taking into account the fact that Quinn has a plan, it just hasn’t gotten off the ground yet. There’s just some scheduling stuff in the way—a few prior commitments that need to be wrapped up and some prep work that needs to be done, that’s all.

Dana has been living at a magnet high school for almost a year now, and the experience has been almost uniformly terrible—Dana’s homesick, sleep deprived, overburdened with homework, unhappy with the food, uncomfortable with the dorm, uncertain about the focus of the curriculum, dissatisfied with the quality of the instruction, not really clicking with any of the other students socially, and on and on and on. It’s gotten to the point that Dana’s even feeling anger and frustration at the buildings themselves. Yet when Dana’s parents try to offer up the option of dropping out and returning back to regular high school, Dana snaps and cuts them off. They don’t seem to understand that this would be a capitulation, a defeat—there’s no way Dana’s going to let this stupid place win.

Parker has been feeling the lack of ... something ... for years, and may have finally found it in a local worship group run by a fellow member of the local biking club. Parker has been blown away by the sense of community, the clear moral framework, the sensible pragmatism, the number and quality of activities, the intellectually challenging discussions—all of it. It completely subverted Parker’s stereotypes of religious groups being ignorant and anti-progressive and authoritarian, and it’s even been epistemically interesting—because Parker and the pastor are friends, they’ve been able to have several long, late-night conversations where they’ve talked openly about faith and the complex historical record of Christianity and the priors on various explanations of reported miracles and the cases for different moral frameworks. All in all, Parker’s experienced a significant uptick in happiness and satisfaction over the past six months, and has even made a marginal (10%) update toward conversion. Parker’s sibling Whitney, though, is horrified—Whitney’s model of Parker was that of a staunch and unwavering atheist, and, confused and dismayed, Whitney keeps aggressively pressing for Parker to explain the cruxes and reasons behind the recent shift. Parker is strangely reluctant, sometimes skirting around the issue and other times avoiding Whitney outright.

In each of these cases, there is a real, unsolved problem with the person’s evidence-sorting system. They’ve bundled multiple different questions into the same bucket, and as a result, evidence that should inform belief A is threatening to force updates on beliefs B and C and D as well. That causes them to flinch away from the incoming evidence—but the flinch is not the error. The flinch is an emergency stopgap procedure; the error is in the bucketing that makes the flinch necessary in the first place.

To return to Sally’s example, ideally she would be able to split apart those three questions, with a separate bucket for collecting evidence on each:

Of course, the buckets aren’t totally disjoint. The question of whether or not Sally is good at spelling does bear on her larger writing ambitions a little. Perhaps a more accurate diagram would have buckets of different sizes, or nested buckets, or little pipes between the buckets to allow for relevant information to flow back and forth.

But a belief structure with three disjoint buckets is nevertheless a better structure than Sally’s original one-bucket system. It presents a significantly lower risk of drastic and unjustified updates.

Bucket creation, bucket destruction

It’s worth noting that one can also have too many buckets. Imagine if Sally continually stashed each new criticism of her writing in its own little bucket, never letting herself see any larger patterns and never letting negative evidence influence her ambitions at all.

One CFAR workshop graduate reported noticing a problem with exactly that structure, while investigating some feelings of social anxiety and low self-esteem. They realized that they didn’t even have a mental bucket corresponding to the question “am I well-liked?”—when they put that term into their mental search function, no data came back. They hadn’t stored any of their memories with that tag.

What they did have were dozens of separate little buckets corresponding to specific people, specific interactions, and specific compliments.

Where Sally had too few buckets, and needed to make more, this alumnus had too many, and needed to consolidate. They made a deliberate mental effort to start catching all of these experiences in a single bucket, and reported a meaningful shift in mood and self-esteem as a result.

The takeaway, then, is not a straightforward recommendation like “always make more buckets,” but rather an imperative to think about your buckets explicitly. There’s a Goldilocks zone, with juuust the right amount of buckets to capture the detail that you need in any given situation.

Some suggestions for finding the sweet spot:

When you notice yourself flinching away from new information, ask yourself—what would be bad about taking it in? What would be the consequences of just believing X?
When you notice your mind making connections like “if A is true, then B will be true too,” pause for a moment and reflect on just how strongly A and B are correlated. Is A actually a strong indicator of B?
When you have the feeling that piece-of-information-M would force you to take action-N, take a moment to give yourself space. Notice that in many cases, you can consider M and retain freedom of choice about N—that you can simply not do N if it still seems like a bad idea after thinking about M.
Notice when your distress feels like it originates in something like a need for consistency. For instance, if you don’t want to take the action of apologizing, because you don’t internally feel regret, be willing to question whether apology actually requires contrition, or whether you can say yes to one without necessarily saying yes to the other.

Question substitution

If the bucket error concept doesn’t quite fit for you, another way to think about the problem that Sally and Kieran and Parker (etc.) are experiencing is through the lens of question substitution.

The central claim of question substitution is that humans often swap out a hard question for an easier-to-answer one, without actually noticing that this is what they’re doing. There are a handful of heuristics and biases and fallacies that tie into question substitution, such as representativeness, in which someone swaps a question like “how likely is it that Linda is a feminist?” with a question like “how much does Linda resemble a feminist?” or scope insensitivity, where people fail to distinguish between questions like “how much would I pay to rescue 2,000 birds?” and “how much would I pay to rescue 200,000 birds?” and instead seem to answer some other question like “how much would I pay to save an imagined beachful of birds?” or “how much is the warm feeling of helping some birds worth to me?”

Some other examples of question substitutions:

What’s my best next move in this situation? → What can be accomplished with the tools I have readily available?
Which of these two candidates would make a better President? → Which candidate has a longer list of positive attributes that I can easily think of?
Did my partner do something wrong? → Am I mad at my partner about anything?
Is my plan likely to succeed? → Can I imagine my plan succeeding? or How aversive is it to imagine my plan failing?
Should I buy this item at this price? → How unpleasant is imagining buying it next week at a higher price, once the sale is over?
Do I love this person? → Does this person make me happy? or Do I want to keep this person around in my life?
What would you do if X occurred? → Is X something I think is possible?

Just as the question of whether or not Sally knows how to spell “ocean” is related to the question of whether or not she should pursue writing, the question that gets substituted in is usually relevant to the question it’s replacing—it just isn’t the same question. There will be places where the answer to the substitute question is not a good answer to the original question, and is instead leading you astray.

In our examples of bucket errors above, each individual is reacting to some sort of in-progress question substitution. Sally was implicitly asking the question “am I a good writer?” and some part of her brain is trying to swap in the question “did I spell ‘ocean’ correctly?”—and then use the answer to that question as a response to the original question. Parker is wondering “can I be a part of this social network?” and some part of their brain is trying to instead ask (and answer) “is Christianity true, though?”

Again, the solution is less of a full technique, and more a set of things-to-notice and questions-to-ask-oneself. It starts with building the habit of catching question substitution when it happens—of recognizing, after the fact, that you answered a different question than the one you set out to consider. Once you’re aware of the discrepancy, you can then check to what extent the substituted question is a valid proxy, or whether there’s some other process you want to engage in to move forward (such as Focusing or Goal Factoring or looking for cruxes).

Bucket Errors, In Brief

One has an (often unacknowledged/subconscious) implication stored in one's mind.
Evidence of $X$ arises, threatening to force the conclusion $Y$ .
Some part of one's brain notices this happening, and does not want to conclude $Y$ .
Instead of rejecting the implication $X ⟹ Y$ , one adamantly denies $X$ .

The actual bucket error is the implication $X ⟹ Y$ ; in reality, $X$ either doesn't actually imply $Y$ or only does so weakly/in combination with other factors. Flinching away from $X$ is a protective reflex, because denying X is still better than erroneously accepting $Y$ . It would be best to reject the implication $X ⟹ Y$ , but given the (local, and hopefully temporary) fact that one simply can't, flinching away from X is (locally) better.

Bucket Errors—Further Resources

The "Immunity to Change" technique, developed by Lisa Lahey and Robert Kegan, includes steps whereby patients or participants take the "blocking behaviors" preventing them from making a behavior change, and investigate those behaviors for what underlying assumptions or implicit world models they might be evidence of.

Immunity to Change

A practical worksheet on Immunity to Change

Scott Alexander's in-depth review of the book Surfing Uncertainty goes in-depth on the predictive processing model of cognition, and how our anticipations shape our perceptions.

Surfing Uncertainty

Scott's review

Logan Strohl's Intro to Naturalism sequence provides a half-formalized framework useful for (among other things) noticing and disrupting bucket errors.

Duncan Sabien's essay on the metaphor of color blindness is another perspective on people experiencing an inability to tease apart two things that are not necessarily the same.

[-]riceissa2y128

I am curious what you think of my old comment here that I made on Anna's post (some related discussion here).

[-][DEACTIVATED] Duncan Sabien2y40

According to me, that is a succinct and exactly apt summary.

In fact, with your permission, I will edit that in.

[-]riceissa2y40

You have my permission!

[-]riceissa2y20

I see, thank you for the response!

[-]deepthoughtlife2y70

Since I have a computer science background, this reminds me of something called 'bucket sort'. Instead of doing an awful lot of calculations to get things in exact places, you split it based on some relevant but extremely easy to calculate criteria. For instance, when doing an alphabetical sorting of a string (a series of letters and other symbols), you might put multiple letters together. For instance, A, B, and C might be one bucket. You then use some other procedure to sort within the bucket, and then just read out the contents of the buckets in order to have an end result. This makes sense not because A,B, and C belong together, but because things scale worse than the number of items.

As is usually the case in CS, assume n is the size of the input, which we'll have be eight times as large as whatever we'll claim causes one unit of work in all of the following algorithms at the same time (this is a common simplifying assumption that approximates the way CS approaches things in a way that doesn't require much explanation). Best (vaguely reasonable) case for how much this would make things better if the other sorting algorithm you would use in n^2, and supposing an even distribution between buckets, there might be an eighth as many per bucket, which means you are now solving problems that are a sixty-fourth the amount of calculation each, for a total of 9/64ths of the remaining work. Add whatever it cost you to do the initial round (which could very well be 1/64th), and you might have 10/64ths total work, or about 15%. This is not the best possible result, but we usually call those specific things. Most people in CS incorrectly claim the best possible result is n log n (note this is base 2 log in computer science), which in this case is 24/64ths of an n^2 algorithm.

Can we do better than 10/64ths? Absolutely, we can do an algorithm that scales linearly in n, and thus 8/64ths. The way we do this is a special variant of 'bucket sort', where each bucket holds exactly one thing (simplification), but since we don't know how things are sorted, we need m buckets where m is the number of possibilities, or (assuming it is just the alphabet) m=26^maximumStringLength, which is ridiculous even for computers (a ten character maximum would be 141,167,095,653,376 possibilities, or over five hundred terabytes of extra memory), so we don't often use such schemes. Even if you were sorting a trillion things, it just doesn't make sense. Even just for integers, it might be 16 gigabytes. Hence, bucket sort is a thing. If, on the other hand, you know everything will be between one and 100, 400 extra bytes is nothing.

If too many things are in a bucket, it doesn't give you much saving on calculation. for instance, half of n size buckets would be 2(4^2)/64ths+1/64th=33/64ths, which is much worse than n log n. (The n log n ones could be argued to be iterated bucket sort. The most famous is quicksort, which just asks higher or lower n times, and is intended to be a series of half-n buckets, and actually only fully sorts a single thing per iteration.)

Unlike a well written program (well, most of them), Humans just simplify the calculation and get it wrong if there are too many calculations or they don't have the memory space (and humans have incredibly tiny working memory). Bundling into buckets is both completely necessary and possible to do completely wrong even if you ignore the connections between things.

[-][DEACTIVATED] Duncan Sabien2y20

Just noting that I am unusually proud of/pleased with this one. It feels like it's in the top five of all-entries-in-this-sequence, which is big for one that isn't even a "main technique."

LESSWRONG
LW