The many ways AIs behave badly

by Stuart_Armstrong1 min read24th Apr 20183 comments



EDIT: This has been previously posted here. Vika is now maintaining a centralized list of such examples.

I had a previous post about some of the ways AIs behave badly. But now there is a new paper, looking at many examples of (mis)behaviour, within the evolutionary programming design. A video summary of some of the results is here.

So note that these are ways that current agents already (mis)behave; theses are not theoretical arguments about what might happen with a future superintelligence.

These behaviours include:

  • solving the proxy/heuristic but not the proper problem (eg spinning while falling to get the highest score on a "jump" objective),
  • cheating on the test (eg playing dumb on a test so that they could get a higher score afterwards),
  • exploiting bugs in the environment (eg quickly twistching body parts to accumulate errors in the physics simulator and thus get "free energy" to propel themselves fast through virtual water),
  • agents deliberately crashing other agents (requesting absurdly distant moves on an unbounded tic-tac-toe game, causing the other agents to dynamically expand their memory too much and then crash)
  • unexpectedly elegant "impossible" solutions (crawling on its elbows when the percentage of time its feet could touch the ground was sent to 0%), and
  • parasitism (in Tierra, an artificial life system, not only were there parasites, but parasites of parasites).


Biological evolution provides a creative fount of complex and subtle adaptations, often surprising the scientists who discover them. However, because evolution is an algorithmic process that transcends the substrate in which it occurs, evolution's creativity is not limited to nature. Indeed, many researchers in the field of digital evolution have observed their evolving algorithms and organisms subverting their intentions, exposing unrecognized bugs in their code, producing unexpected adaptations, or exhibiting outcomes uncannily convergent with ones in nature. Such stories routinely reveal creativity by evolution in these digital worlds, but they rarely fit into the standard scientific narrative. Instead they are often treated as mere obstacles to be overcome, rather than results that warrant study in their own right. The stories themselves are traded among researchers through oral tradition, but that mode of information transmission is inefficient and prone to error and outright loss. Moreover, the fact that these stories tend to be shared only among practitioners means that many natural scientists do not realize how interesting and lifelike digital organisms are and how natural their evolution can be. To our knowledge, no collection of such anecdotes has been published before. This paper is the crowd-sourced product of researchers in the fields of artificial life and evolutionary computation who have provided first-hand accounts of such cases. It thus serves as a written, fact-checked collection of scientifically important and even entertaining stories. In doing so we also present here substantial evidence that the existence and importance of evolutionary surprises extends beyond the natural world, and may indeed be a universal property of all complex evolving systems.


3 comments, sorted by Highlighting new comments since Today at 11:50 PM
New Comment

Its a nice paper :)

I already posted it here a month ago and there is some discussion.

Vika is now maintaining a centralized list of such examples

Thanks! Hadn't seen the previous posting. Added that info to the post.

This was a very fun article. Notably absent from the list, even though I would absolutely have expected it (since the focus was on evolutionary algorithms, even though many observations also apply to gradient-descent):

Driving genes. Biologically, a "driving gene" is one that cheats in (sexual) evolution, by ensuring that it is present in >50% of offspring, usually by weirdly interacting with the machinery that does meiosis.

In artificial evolution that uses "combination", "mutation" and "selection", these would be regions of parameter-space that are attracting under "combination"-dynamics, and use that to beat selection pressure.