Intentional Bucket Errors

Scott Garrabrant

I want to illustrate a research technique that I use sometimes. (My actual motivation for writing this is to make it so that I don't feel as much like I need to defend myself when I use this technique.) I am calling it intentional bucket errors after a CFAR concept called bucket errors. Bucket errors is about noticing when multiple different concepts/questions are stored in your head as a single concept/question. Then, by noticing this, you can think about the different concepts/question separately.

What are Intentional Bucket Errors

Bucket errors are normally thought of as a bad thing. It has "errors" right in the name. However, I want to argue that bucket errors can sometimes be useful, and you might want to consider having some bucket errors on purpose. You can do this by taking multiple different concepts and just pretending that they are all the same. This usually only works if the concepts started out sufficiently close together.

Like many techniques that work by acting as though you believe something false, you should use this technique responsibly. The goal is to pretend that the concepts are the same to help you gain traction on thinking about them, but then to also be able to go back to inhabiting the world where they are actually different.

Why Use Intentional Bucket Errors

Why might you want to use intentional bucket errors? For one, maybe the concepts actually are the same, but they look different enough that you won't let yourself consider the possibility. I think this is especially likely to happen if the concepts are coming from very different fields or areas of your life. Sometimes it feels silly to draw strong connections between e.g. human rationality, AI alignment, evolution, economics, etc. but such connections can be useful.

Also I find this useful for gaining traction. There is something useful about constrained optimization for being able to start thinking about a problem. Sometimes it is harder to say something true and useful about X than it is to say something true and useful that simultaneously applies to X, Y, and Z. This is especially true when the concepts you are conflating are imagined solutions to problems.

For example, maybe I have an imagined solution to counterfactuals that has a hole in it that looks like understanding multi-level world models. Then, maybe I also have have an imagined solution to tiling that also has a hole in it that looks like understanding multi-level world models. I could view this as two separate problems. The desired properties of my MLWM theory for counterfactuals might be different from the desired properties for tiling. I have these two different holes I want to fill, and one strategy I have, which superficially looks like it makes the problem harder, is to try to find something that can fill both holes simultaneously. However, this can sometimes be easier because different use cases can help you triangulate the simple theory from which the specific solutions can be derived.

A lighter (maybe epistemically safer) version of intentional bucket errors is just to pay a bunch of attention to the connections between the concepts. This has its own advantages in that the relationships between the concepts might be interesting. However, I personally prefer to just throw them all in together, since this way I only have to work with one object, and it takes up fewer working memory slots while I'm thinking about it.

Examples

Here are a some recent examples where I feel like I have used something like this, to varying degrees.

How the MtG Color Wheel Explains AI Safety is obviously the product of conflating many things together without worrying too much about how all the clusters are wrong.

In How does Gradient Descent Interact with Goodhart, the question at the top about rocket designs and human approval is really very different from the experiments that I suggested, but I feel like learning about one might help my intuitions about the other. This was actually generated at the same time as I was thinking about Epistemic Tenure, which for me what partially about the expectation that there is good research and a correlated proxy of justifiable research, and even though our group idea selection mechanism is going to optimize for justifiable research, it is better if the inner optimization loops in the humans do not directly follow those incentives. The connection is a bit of a stretch in hindsight, but believing the connection was instrumental in giving me traction in thinking about all the problems.

Embedded Agency has a bunch of this, just because I was trying to factor a big problem into a small number of subfields, but the Robust Delegation section can sort of be described as "Tiling and Corrigibility kind of look similar if you squint. What happens when I just pretend they are two instantiations of the same problem?"

I refer to a version of this as conspiracy theory thinking. At any given time you have several problems you are working on, across various domains of your personal and professional life. The fake framework is that they're all secretly generated by the same problem, but there's a conspiracy to make them seem like different problems. Your job is to unravel the conspiracy.

I both approve of this problem solving method and realize I don't know what's going on in the minds of people you have needed to defend this idea to.

I'd paraphrase your idea as running with the hypothetical "what if these ideas were connected?" A huge amount of my creative leaps come from exploring "what if"s. It feels very simple to keep my "what if" explorations seperate from my most rigorous known truths, at least for intellectual topics.

So an actual question that would help me understand more is "what have other people said in conversations were you were defending this idea?"

I think there is a possible culture where people say a bunch of inside-view things, and run with speculations all the time, and another possible culture where people mostly only say literally true things that can be put into the listener's head directly. (I associate these cultures with the books R:A-Z and superintelligence respectively.) In the first culture, I don't feel the need to defend myself. However I feel like I am often also interacting with people from the second culture, and that makes me feel like I need a disclaimer before I think in public with speculation that conflates a bunch of concepts.

"Bucket Errors" seem to me to be pretty much the same idea as explained in the less reference post Fallacies of Compression (except the post introducing the former uses "compression"/conflation of different variables to explain psychological resistance to changing one's mind).

In other words, the concept at hand here is compression of the map. On my reading, your post is making the point that compression of your map is sometimes a feature, not a bug - and not just for space reasons.

This is evoking thoughts about compression of data for me, and how often PCA and related "compression" techniques often make it easier to see relationships and understand things.

This line of reasoning is why I put all of my Anki cards into one big “Misc” deck.

If I were to diligently organize each card by relevant content area, I’m worried that I would only be able to recall the relevant bit of information when I’ve been cued by the category. By putting all the information I want to remember in one place, it discourages compartmentalization. There are no such things as “Biology Facts” or “Geography Facts”. There is one super category: “Facts About The World”.

I wonder if casting the approach as a prudent application of Occam's Razor might make it a bit less needful of defense.

If one can simplify things by treating to arguably different things the same and thereby shed light and gain a better understanding of either or both that seems useful.