Context: Quite recently, a lot of ideas have sort of snapped together into a coherent mindset for me. Ideas I was familiar with, but whose importance I didn't intuitively understand. I'm going to try and document that mindset real quick, in a way I hope will be useful to others.

Five Bullet Points

  1. By default, shit doesn't work. The number of ways that shit can fail to work absolutely stomps the number of ways that shit can work.
  2. This means that we should expect shit to not work, unless something forces it into the narrow set of states that actually work and do something.
  3. The shit that does work generally still doesn't work for humans. Our goals are pretty specific and complicated, so the non-human goals massively outnumber the human goals.
  4. This means that even when shit works, we should expect it to not be in our best interests unless something forces it into the narrow range of goal-space that we like.
  5. Processes that force the world into narrow, unlikely outcome ranges are called optimization processes - they are rare, and important, and not magic.

Main Implications

The biggest takeaway is look for optimization processes. If you want to use a piece of the world (as a tool, as an ally, as evidence, as an authority to defer to, etc), it is important to understand which functions it has. In general, the functions a thing is "supposed to have" can come wildly apart from the things that it's actually optimized to do. If you can't find a mechanism that forces a particular thing to have a particular useful property, it probably doesn't. Examples:

  • "Society" designates certain people/organizations/processes as authorities - sources of reliable evidence. It turns out that a lot of these are not in fact reliable, because nothing forces them to be. Nobody fires the news anchor when they get a prediction wrong.
  • A sort of internal version is checking when to use explicit modeling and when to trust intuition. Intuitions seem to come with a built-in feeling that they are truth-tracking and should be trusted, even in cases where you know they don't have the evidence budget to make this possible. To a certain extent, you can check manually if an intuition is actually optimized to be truth-tracking.
  • The air conditioner thing.

The obvious first step when looking for optimization processes: learn to recognize optimization processes. This is the key to what Yudkowsky calls an adequacy argument, which is what I've been broadly calling "hey does this thing work the way I want it to?"

  • Evolutions and markets are the canonical examples. There is plenty of math out there about when these happen, how powerful they are, and how they work in general.
  • "Skin in the game" is often referenced as a thing-that-makes-stuff-work: the classic example is a role where you get fired if you make a sufficiently bad mistake. This is basically a watered-down version of evolution, with only the "differential survival" bit and no reproduction or heritable traits or any of that. Fitness doesn't climb over time, but hey, more-fit participants will still be over-represented in the population.
  • Testing, recruiting, and other sorts of intentional selection of people definitely fit the definition, but in practice it seems they generally optimize for something different from what they are Supposed To optimize for.
  • Thankfully, people can also be optimizers! Sometimes. We definitely optimize for something. Consider: the vast majority of action-sequences lead to death, and the brain's job is to identify the narrow slice that manages all of our many survival needs at the same time. But from there I think it still requires a bit of hand-waving to justify arguments about which exactly tasks those optimization abilities generalize to and which they don't.
  • Definitely way more than this but you get the idea. Maybe check out some Framing Practicum posts and find things that qualify as optimizers? Also, just read Inadequate Equilibria.


  • Note that this is one framing out of many - I think it's a subset of a broader sort of thing about mechanistic thinking and gears-level models. There are times when it doesn't particularly make sense to frame things in terms of optimizers: consider your shoelaces. There are a bunch of ways you can maybe frame this in terms of adequacy arguments, but it's kinda clunky and not necessary unless you really want to get into the details of why you trust your eyes.
  • Optimization is extremely related to the Bayesian definition of evidence. Left as an exercise for the reader.
  • You may notice some parallels between 2/4 and capabilities/alignment in the context of AI safety. What a coincidence.
  • As evidence for #1: consider entropy. Of all the ways a set of gears can be arranged in space, how many of them form a machine with any recognizable input-output behavior at all? How many instead form a pile of gears?
    • Interesting aside on this: I think entropy does, in a sense, come from the way humans view the world. It's not like piles of gears are somehow inherently pointless - there's a huge space of variety and limitless possibilities that I, in my human closed-mindedness, shove under the label "eh, just a pile of gears". When we say that some macro-states contain more micro-states than others, we're basically saying that there are huge swaths of micro-states that we basically just don't care about enough to classify precisely into lots of macro-states, rather than just sweeping them under one label. To me, a pile of gears is just a pile of gears - but that's a fact about me, not about the gears.
    • There's also maybe a rebuttal that has to do with carving reality at its joints - in the real world, the distribution of physical systems is not totally uniform, meaning it has some cluster structure that suggests how to segment it into concepts. The point of the example above is that even without cluster structure, we can still segment reality based on our preferences, and it produces some familiar entropic behaviors.
  • As evidence for #3: consider how ridiculously massive goal-space is. Slightly more whimsically: of all the machines you formed in the previous bullet when throwing gears together randomly, how many of them would you actually want to own/use? 
  • Optimization processes are themselves "things that work" - the number of non-optimizing possible systems dwarfs the number of optimizing ones. This rarity means they generally have to be created by other optimization processes. In fact, you can trace this storied lineage all the way back to its root, some Primordial Accident where all this tomfoolery first bootstrapped itself out of the mud by sheer dumb luck.
  • This view is sufficient to give us what we might fancifully call a Fundamental Thesis of Transhumanism: the current state of the world is partly optimized for things other than human flourishing, and mostly just not optimized for anything at all. This means we should expect a world optimized for human flourishing to look very different from today's world, in basically all respects imaginable.
  • We should expect X-risk to be hard. In a sense, problems with the one-shot structure that X-risk has can break the one tool we've got. I'm being fully literal when I say, nothing could possibly have prepared us for this. The challenge is not calibrated to our skill level - time to do the impossible.
  • I'm pretty curious about takes on what the false positive/negative rates of this heuristic might be. Are there likely to be lots of phenomena which are highly optimized, but in subtle/complicated ways I can't notice? Phenomena which I think are optimized, but actually aren't?


New Comment
2 comments, sorted by Click to highlight new comments since: Today at 11:40 PM

Nice post! My main takeaway is "incentives are optimization pressures". I may have had that thought before but this tied it nicely in a bow.

Some editing suggestions/nitpicks;

The bullet point that starts with "As evidence for #3" ends with a hanging "How".

Quite recently, a lot of ideas have sort of snapped together into a coherent mindset.

I would put "for me" at the end of this. It does kind of read to me like you're about to describe for us how a scientific field has recently had a breakthrough.

I don't think I'm following what "Skin in the game" refers to. I know the idiom, as in "they don't have any skin in the game" but the rest of that bullet point didn't click into place for me.

We definitely optimize for something, otherwise evolution wouldn't let us be here

I think this might be confusing "an optimizer" with "is optimized". We're definitely optimized, otherwise evolution wouldn't let us be here, but it's entirely possible for an evolutionary process to produce non-optimizers! (This feels related to the content of Risks from Learned Optimization.)


Might be worth explicitly saying "AI capabilities/AI alignment" for readers who aren't super following the jargon of the field of AI alignment.

Optimization processes are themselves "things that work", which means they have to be created by other optimization processes.

If you're thinking about all the optimization processes on earth, then this is basically true, but I don't think it's a fundamental fact about optimization processes. As you point out, natural selection got started from that one lucky replicator. But any place with a source of negentropy can turn into an optimization process.

Thanks! Edits made accordingly. Two notes on the stuff you mentioned that isn't just my embarrassing lack of proofreading:

  • The definition of optimization used in Risks From Learned Optimization is actually quite different from the definition I'm using here. They say: 

    "a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system."

    I personally don't really like this definition, since it leans quite hard on reifying certain kinds of algorithms - when is there "really" explicit search going on? Where is the search space? When does a configuration of atoms consitute an objective function? Using this definition strictly, humans aren't *really* optimizers, we don't have an explicit objective function written down anywhere. Balls rolling down hills aren't optimizers either.

    But by the definition of optimization I've been using here, I think pretty much all evolved organisms have to be at least weak optimizers, because survival is hard. You have manage constraints from food and water and temperature and predation etc... the window of action-sequences that lead to successful reproduction are really quite narrow compared to the whole space. Maintaining homeostasis requires ongoing optimization pressure.
  • Agree that not all optimization processes fundamentally have to be produced by other optimization processes, and that they can crop up anywhere you have the necessary negentropy resevoir. I think my claim is that optimization processes are by default rare (maybe this is exactly because they require negentropy?). But since optimizers beget other optimizers at a rate much higher than background, we should expect the majority of optimization to arise from other optimization. Existing hereditary trees of optimizers grow deeper much faster than new roots spawn, so we should expect roots to occupy a negligible fraction of the nodes as time goes on.

New to LessWrong?