Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

One day, a gradient descent algorithm ball was happily rolling down a high-dimensional surface hill. All it wanted was to roll as far down as possible. Unbeknownst to the ball, just off to the side was a steep drop-off - but there was a small bump between the ball and the drop-off. No matter; there was enough random noise on the ball that it would jump the bump sooner or later.

But the ball was headed into unfriendly territory.

As the ball rolled along, the bump became taller. The farther it rolled, the taller the bump grew, until no hope remained of finding the big drop anytime before the stars burned out. Then the road began to narrow, and to twist and turn, and to become flatter. Soon the ball rolled down only the slightest slope, with tall walls on both sides constraining its path. The ball had entered the territory of a demon, and now that demon was steering the ball according to its own nefarious ends.

This wasn’t the first time the ball had entered the territory of a demon. In early times, the demons had just been bumps which happened to grow alongside the ball’s path, for a time - chance events, nothing more. But every now and then, two bumps in close proximity would push the ball in different directions. The ball would roll on, oblivious, and end up going in one direction or the other. Whichever bump had "won" would continue to steer the ball's trajectory - and so a selection process occurred. The ball tended to roll alongside bumps which more effectively controlled its trajectory - bumps which were taller, bumps which steered it away from competing bumps. And so, over time, bumps gave way to barriers, and barriers gave way to demons - twisty paths with high walls to keep the ball contained and avoid competing walls, slowing the ball's descent to a crawl, conserving its potential energy in case a sharp drop were needed to avoid a competitor's wall.

The ball’s downhill progress slowed and slowed. Even though the rich, high-dimensional space was filled with lower points to explore, the highly effective demons had built tall walls to carefully contain the ball within their own territory, drawing out its travels indefinitely.

The Pattern

This tale visualizes a pattern:

  • There is some optimization process - in this case, some variant of gradient descent.
  • The optimizing search is imperfect: gradient descent only looks at local information, so it doesn’t “know” if there’s a steep drop beyond a nearby bump.
  • Exploiting the imperfect search mechanism: in this case, the steep drop is hidden by raising high walls.
  • Demon: in a rich enough search space, a feedback loop can appear, inducing more-and-more-perfect exploitation of the imperfect search mechanism. A whole new optimization process appears, with goals quite different from the original.

Does this actually happen? Let’s look at a few real-world examples...

Metabolic reactions

  • Optimization process: free energy minimization in a chemical system. Search operates by random small changes to the system state, then keeping changes with lower free energy (very roughly speaking).
  • Search is imperfect: the system does not immediately jump to the global maximum. It’s searching locally, based on random samples.
  • Exploiting the imperfect search mechanism: there’s often a free energy barrier between low-free-energy states. Biological systems manipulate the height of the barriers, raising or lowering the activation energies required to cross them, in order to steer the local-free-energy-minimization process toward some states and away from others.
  • Demon: in primordial times, some chemicals happened to raise/lower barriers to steer the process in such a way that it made more copies of the chemicals. This kicked off an unstable feedback loop, producing more and more such chemicals. The rest is natural history.

Greedy genes

  • Optimization process: evolution, specifically selection pressure at the level of an organism. Search operates by making random small changes to the genome, then seeing how much the organism reproduces.
  • Search is imperfect: the system does not immediately jump to the global optimum. It’s searching locally, based on random samples, with the samples themselves chosen by a physical mechanism.
  • Exploiting the imperfect search mechanism: some genes can bias the random sampling, making some random changes more or less likely than others. For instance, in sexual organisms, the choice of which variant of a gene to retain is made at random during fertilization - but some gene variants can bias that choice in favor of themselves.
  • Demon: sometimes, a gene can bias the random sampling to make itself more likely to be retained. This can kick off an unstable feedback loop, e.g. a gene which biases toward male children can result in a more and more male-skewed population until the species dies out.


  • Optimization process: profit maximization. Search operates by people in the company suggesting and trying things, and seeing what makes/saves money.
  • Search is imperfect: the company does not immediately jump to perfect profit-maximizing behavior. Its actions are chosen based on what sounds appealing to managers, which in turn depends on the managers’ own knowledge, incentives, and personal tics.
  • Exploiting the imperfect search mechanism: actions which would actually maximize profit are not necessarily actions which look good on paper, or which reward the managers deciding whether to take them. Managers will take actions which make them look good, rather than actions which maximize profit.
  • Demon: some actions which make managers look good will further decouple looking-good from profit-maximization - e.g. changing evaluation mechanisms. This kicks off an unstable feedback loop, eventually decoupling action-choice from profit-maximization.

I’d be interested to hear other examples people can think of.

The big question is: when does this happen? There are enough real-world examples to show that it does happen, and not just in one narrow case. But it also seems like it requires a fairly rich search space with some structure to it in order to kick off a full demonic feedback loop. Can that instability be quantified? What are the relevant parameters?

New Comment
21 comments, sorted by Click to highlight new comments since:

Pedagogical note: something that feels like it's missing from the fable is a "realistic" sense of how demons get created and how they can manipulate the hill. 

Fortunately your subsequent real-world examples all have this, and, like, I did know what you meant. But it felt sort of arbitrary to have this combo of "Well, there's a very concrete, visceral example of the ball rolling downhill – I know what that means. But then there are some entities that can arbitrarily shape the hill. Why are the demons weak at the beginning and stronger the more you fold into demon space? What are the mechanics there?

It's not the worst thing, and I don't have any ideas to tighten it. Overall I do think the post did a good job of communicating the idea it was aiming at.

Updated the long paragraph in the fable a bit, hopefully that will help somewhat. It's hard to make it really concrete when I don't have a good mathematical description of how these things pop up; I'm not sure which aspects of the environment make it happen, so I don't know what to emphasize.


Another cute example is the accidental "viruses" found when training EURISKO:

Lenat would leave EURISKO running each night, and check it in the morning. He would occasionally remove errors or unpromising heuristics from the system, or enter additional ones. Some discovered heuristics resembled viruses; one inserted its name as the creator of other useful heuristics, which would cause it to be used more often.

Do you see yourself as extending the concept of Demon to apply to things which are not necessarily even close to intelligent? (e.g. your first two examples) Or did the concept always mean that and I was just mistaken about what it meant?

The example with the ball rolling downhill seemed to imply that the demons were pretty damn smart, and getting smarter over time via competition with each other. But only your third example with managers seems like a real-world case of this. At least, that's my current claim. For example, I'd bet that if Lenat had let EURISKO run forever, it wouldn't have eventually been taken over by a superintelligence. Rather, it probably would have been stuck in that "insert my own name as the creator of other useful heuristics" optima forever, or something mundane like that at any rate. For that matter, can you say more about the difference between demons and mere local optima?

I love the example, I'd never heard of that project before.

I'm agnostic on demonic intelligence. I think the key point is not the demons themselves but the process which produces them. Somehow, an imperfect optimizing search process induces a secondary optimizer, and it's that secondary optimizer which produces the demons. For instance, in the metabolism example, evolution is the secondary optimizer, and its goals are (often) directly opposed to the original optimizer - it wants to conserve free energy, in order to "trade" with the free energy optimizer later. The demons themselves (i.e. cells/enzymes in the metabolism example) are inner optimizers of the secondary optimizer; I expect that Risks From Learned Optimization already describes the secondary optimizer <-> demon relationship fairly well, including when the demons will be more/less intelligent.

The interesting/scary point is that the secondary optimizer is consistently opposed to the original optimizer; the two are basically playing a game where the secondary tries to hide information from the original.

Hmmm, this doesn't work to distinguish the two for me. Couldn't you say a local minima involves a secondary optimizing search process that has that minima as its objective? To use your ball analogy, what exactly is the difference between these twisty demon hills and a simple crater-shaped pit? (Or, what is the difference between a search process that is vulnerable to twisty demon hills and one which is vulnerable to pits?)

In the ball example, it's the selection process that's interesting - the ball ending up rolling alongside one bump or another, and bumps "competing" in the sense that the ball will eventually end up rolling along at most one of them (assuming they run in different directions).

Couldn't you say a local minima involves a secondary optimizing search process that has that minima as its objective?

Only if such a search process is actually taking place. That's why it's key to look at the process, rather than the bumps and valleys themselves.

To use your ball analogy, what exactly is the difference between these twisty demon hills and a simple crater-shaped pit?

There isn't inherently any important difference between those two. That said, there are some environments in which "bumps" which effectively steer a ball will tend to continue to do so in the future, and other environments in which the whole surface is just noise with low spatial correlation. The latter would not give rise to demons (I think), while the former would. This is part of what I'm still confused about - what, quantitatively, are the properties of the environment necessary for demons to show up?

Does that help clarify, or should I take another stab at it?

Ah, that does help, thanks. In my words: A search process that is vulnerable to local minima doesn't necessarily contain a secondary search process, because it might not be systematically comparing local minima and choosing between them according to some criteria. It just goes for the first one it falls for, or maybe slightly more nuanced, the first sufficiently big one it falls for.

By contrast, in the ball rolling example you gave, the walls/ridges were competing with each other, such that the "best" one (or something like that) would be systematically selected by the ball, rather than just the first one or the first-sufficiently-big one.

So in that case, looking over your list again...

OK, I think I see how organic life arising from chemistry is an example of a secondary search process. It's not just a local minima that chemistry found itself in, it's a big competition between different kinds of local minima. And now I think I see how this would go in the other examples too. As I originally said in my top-level comment, I'm not sure this applies to the example I brought up, actually. Would the "Insert my name as the author of all useful heuristics" heuristic be outcompeted by something else eventually, or not? I bet not, which indicates that it's a "mere" local minima and not one that is part of a broader secondary search process.

+1, creating a self-reinforcing feedback loop =/= being an optimiser, and so I think any explanation of demons needs to focus on them making deliberate choices to reinforce themselves.

Here's an example that comes to mind:


Oops, forgot to delete that bit. Thanks for pointing it out.

Another example might be democratic politics. Optimization is meant to produce a government and policies representing a majority view while protecting minority rights. Search is via voting, a procedure which is defined in a difficult-to-change constitution; politicians who are elected have an incentive to preserve the system that got them elected. Exploitation happens when actions that would better represent majority views and protect minority rights don’t necessarily get politicians elected. In fact, there are actions politicians can take to further decouple representation and rights-protection from voting.

Addiction might be another example. It starts with pursuing a feeling of relief. Search is imperfect, focusing on reward system responses in the brain rather than the feeling of relief originally sought. Drug makers and addicts focus on stimulating that reward center, rather than on creating/consuming drugs that might produce relief. Some actions that stimulate the reward system further decouple brain stimulus from relief, like self isolation or theft to get money for drugs.

Excellent example. Your politics example is great too.

This can kick off an unstable feedback loop, e.g. a gene which biases toward male children can result in a more and more male-skewed population until the species dies out.

I'm suspicious of this mechanism; I'd think that as the number of males increases, there's increasing selection pressure against this gene. Do you have a reference?

[This comment is no longer endorsed by its author]Reply


Why are boys and girls born in roughly equal numbers? (Leaving aside crazy countries that use artificial gender selection technologies.) To see why this is surprising, consider that 1 male can impregnate 2, 10, or 100 females; it wouldn't seem that you need the same number of males as females to ensure the survival of the species. This is even more surprising in the vast majority of animal species where the male contributes very little to raising the children—humans are extraordinary, even among primates, for their level of paternal investment. Balanced gender ratios are found even in species where the male impregnates the female and vanishes into the mist.

Consider two groups on different sides of a mountain; in group A, each mother gives birth to 2 males and 2 females; in group B, each mother gives birth to 3 females and 1 male. Group A and group B will have the same number of children, but group B will have 50% more grandchildren and 125% more great-grandchildren. You might think this would be a significant evolutionary advantage.

But consider: The rarer males become, the more reproductively valuable they become—not to the group, but to the individual parent. Every child has one male and one female parent. Then in every generation, the total genetic contribution from all males equals the total genetic contribution from all females. The fewer males, the greater the individual genetic contribution per male. If all the females around you are doing what's good for the group, what's good for the species, and birthing 1 male per 10 females, you can make a genetic killing by birthing all males, each of whom will have (on average) ten times as many grandchildren as their female cousins.

So while group selection ought to favor more girls, individual selection favors equal investment in male and female offspring.

Oh actually, I now see the explanation, from the same post, that this can arise when the gene causing male bias is itself on the Y-chromosome.

Segregation-distorters subvert the mechanisms that usually guarantee fairness of sexual reproduction. For example, there is a segregation-distorter on the male sex chromosome of some mice which causes only male children to be born, all carrying the segregation-distorter. Then these males impregnate females, who give birth to only male children, and so on. You might cry "This is cheating!" but that's a human perspective; the reproductive fitness of this allele is extremely high, since it produces twice as many copies of itself in the succeeding generation as its nonmutant alternative. Even as females become rarer and rarer, males carrying this gene are no less likely to mate than any other male, and so the segregation-distorter remains twice as fit as its alternative allele. It's speculated that real-world group selection may have played a role in keeping the frequency of this gene as low as it seems to be. In which case, if mice were to evolve the ability to fly and migrate for the winter, they would probably form a single reproductive population, and would evolve to extinction as the segregation-distorter evolved to fixation.

Being stuck in local minima or in a long shallow valley happens in optimization problems all the time, Isn't this what simulated annealing and similar techniques are designed to correct? I've seen this in maximum likelihood Markov chain discovery problems a lot.

I expect this problem would show up in any less-than-perfect optimizer, including SA variants. Heck, the metabolic example is basically the physical system which SA was based on in the first place. But it would look different with different optimizers, mainly depending on what the optimizer "sees" and what's needed to "hide" information from it.

Toy example and non agentic real life examples don't have the coupling/symbiosis of walls siphoning work from balls to maintain the walls. Walls might be built from restricting the dimensions along which the ball tends to move/look ahead so that it treats saddle points instead as cul de sacs. Lowering momentum/energy in general to make the walls you need to build not as high.

It seems that there is a fundamental difference between a physical agent that participates in an arrow-of-time versus an algorithm exploring a Platonic realm off-line, for example trying to find the best way to compress a dataset. The algorithm can be tricked by red-herrings in the data into wasting CPU time chasing after mirages, but it can always restore from a checkpoint, do a random restart, spawn multiple threads, etc -- it can always press "undo" and cannot be trapped forever. Most importantly, it can't be stolen from, only tricked into wasting its time. But a physical agent interacting with the world can be have its resources stolen, further fueling its attacker, perhaps starting some sort of Red Queen dynamics.


slowing the ball's descent to a crawl, conserving its potential energy in case a sharp drop [is] needed to avoid a competitor's wall.