Mindcrime

Gloss: Running conscious code under poor living conditions is a moral evil. This could lead to a large-scale catastrophe if a superintelligence contained a huge number of independently conscious subprocesses. One worry is that this might happen from modeling human beings in too much detail.

Summary: 'Mindcrime' is Bostrom's suggested term for the moral catastrophe that occurs if a superintelligence contains enormous numbers of conscious beings trapped inside its code, in poor living conditions. This could happen as a result of self-awareness being a natural property of computationally efficient subprocesses. Perhaps more worryingly, the best model of a person may be a person itself, even if they're not the same person. This means that AIs trying to model humans might be unusually likely to create hypotheses and simulations that are themselves conscious.

Technical summary: 'Mindcrime' is Bostrom's term for mind designs producing moral harm by their internal operation, particularly through containing sentient subprocesses embedded in the code. One worry is that mindcrime might arise in the course of an agent trying to predict or manipulate the humans in its environment, since this implies a pressure to model the humans in faithful detail. This is especially concerning since several value alignment proposals would explicitly call for modeling humans in detail, e.g. extrapolated voition and approval-based agents. Another problem scenario is if the natural design for an efficient subprocess involves independent consciousness (though it is a separate question if this optimal design involves pain or suffering). Computationally powerful agents might contain vast numbers of trapped conscious subprocesses, qualifying this as a global catastrophic risk.

=== Intro lens:

'Mindcrime' is Nick Bostrom's suggested term for agent systems that do moral harm just through their internal computations. For example, systems that turn out to contain trillions of trapped, conscious subprocesses in poor living conditions; or systems that simulate people in order to model people.

The more predictive accuracy we want from a model, the more detailed the model becomes. A very rough model of an airplane might only contain the approximate shape, the power of the engines, and the mass of the airplane. A model good enough for engineering needs to be detailed enough to simulate the flow of air over the wings, the centripetal force on the fan blades, and more. As a model can predict the airplane in more and more fine detail and with better and better probability distributions, the computations carried out to make the model's predictions may start to look more and more like a detail simulation of the airplane flying.

Consider a machine intelligence building, and testing, the best models it can manage of a human being's behavior. If the model that produces the best predictions involves simulations with moderate degrees of isomorphism to human cognition, then the model, as it runs, may itself be self-aware or conscious or sapient or whatever other property stands in for being an object of ethical concern. This doesn't mean that the running model of Fred is Fred, or even that the running model of Fred is human. Nonetheless, the concern is that a sufficiently advanced model of a person will be a person, even if they might not be the same person.

We might then worry that, for example, if Fred is unhappy, or might be happy, the agent will consider thousands or millions of hypotheses about Fred, some of which will be suffering. Many of these hypotheses might then be discarded - cease to be run - if the agent sees contrary evidence. Given that programs can be people, stopping and erasing a conscious program is the crime of murder.

This scenario, which we might call 'the problem of sapient models', is a subscenario of the general problem of what Bostrom terms 'mindcrime'. (Yudkowsky has suggested 'mindgenocide' as a term with fewer Orwellian connotations.) More generally, we might worry that there are agent systems that do huge amounts of moral harm just in virtue of the way they compute, by containing embedded conscious suffering and death.

Another scenario might be called 'the problem of sapient subsystems'. It's possible that, for example, the most efficient possible system for, e.g., allocating memory to subprocesses, is a memory-allocating-subagent that is reflective enough to be an independently conscious person. This is distinguished from the problem of creating a single machine intelligence that is conscious and suffering, because the conscious agent might be hidden at a lower level of a design, and there might be a lot more of them than just one suffering superagent.

Both of these scenarios constitute moral harm done inside the agent's computations, irrespective of its external behavior. We can't conclude that we've done no harm by building a superintelligence, just in virtue of the fact that the superintelligence doesn't outwardly kill anyone. There could be trillions of people suffering and dying inside the superintelligence. This sets mindcrime apart from almost all other concerns within the Value alignment problem, which usually revolve around external behavior.

To avoid mindgenocide, it would be very handy to know exactly which computations are or are not conscious, sapient, or otherwise objects of ethical concern. Or, indeed, to know that any particular class of computations are not objects of ethical concern. Yudkowsky calls a nonperson predicate any test we could safely use to determine that a computation is definitely not a person (it's fine if this test says "Don't know this is a nonperson" on some nonpeople, so long as the test says "Don't know this is a nonperson" on all people).

However, the goal is not just to have any nonperson predicate - the predicate that only says "known nonperson" for the empty computation and no others meets this test. The goal is to have a nonperson predicate that includes powerful, useful computations. We want to be able to build an AI that is not a person, and let that AI build subprocesses that we know will not be people, and let that AI improve its models of environmental humans using hypotheses that we know are not people. This means the nonperson predicate does need to pass some AI designs, cognitive subprocess designs, and human models that are good enough for whatever it is we want the AI to do.

This seems like it might be very hard for several reasons:

There is unusually extreme philosophical dispute, and confusion, about exactly which programs are and are not conscious or otherwise objects of ethical value. (It might not be exaggerating to scream "nobody knows what the hell is going on".)

We can't fully pass any class of programs that's Turing-complete. We can't say once and for all that it's safe to model gravitational interactions in a solar system, if enormous gravitational systems could encode computers that encode people.

The Nearest unblocked strategy problem applies to any attempt to forbid an advanced consequentialist agent from using the most effective or obvious ways of modeling humans. The next best way of modeling humans, outside the blocked-off options, is unusually likely to look like a weird loophole that turns out to encode sapience some way we didn't imagine.

An alternative to finding a trustworthy nonperson predicate is to consider AI designs that don't even try to model humans, or other minds, in great detail, since there may be some pivotal achievements that can be accomplished without the AI modeling human minds in detail.

=== Main lens

"Mindcrime" is Nick Bostrom's suggested term for scenarios in which an AI's thought processes simulate human beings at sufficiently high fidelity for the simulations to themselves be conscious and objects of ethical value, or other scenarios in which the AI's thoughts contain sapient beings.

The most obvious way in which mindcrime could occur is if an instrumental pressure to produce maximally good predictions about human beings results in hypotheses and simulations so fine-grained and detailed that they are themselves people (conscious, sapient, objects of ethical value) even if they are not necessarily the same people. If you're happy with a very loose model of an airplane, it might be enough to know how fast it flies, but if you're engineering airplanes or checking their safety, you would probably start to simulate possible flows of air over the wings. It probably isn't necessary to go all the way down to the neural level to create a sapient being, either - it might be that even with some parts of a mind considered abstractly, the remainder would be simulated in enough detail to imply sapience. It'd help if we knew what the necessary and/or sufficient conditions for sapience were, but the fact that we don't know this doesn't mean that we can thereby conclude that any particular simulation is not sapient. (This would be argumentum ad ignorantiem.)

The agent's attempts to model and predict a human who is suffering, or who might possibly be suffering, could then create a simulated person (even if not the same person) who would actually experience that suffering. When the simulation stops, this would kill the simulated person (a bad event under many ethical systems even if the simulated person was happy).

Besides problems that are directly or obviously about modeling people, many other practical problems and questions can benefit from modeling other minds - e.g., reading the directions on a toaster oven in order to discern the intent of the mind that was trying to communicate how to use a toaster. Thus, mindcrime might result from a sufficiently powerful AI trying to solve very mundane problems as optimally as possible.

Other possible sources of mindcrime disasters would include:

Trying to model distant superintelligences or their origins.

Trying to extrapolate human volitions, in a preference framework that calls for such.

Being instructed by humans, or otherwise forming a goal, of creating an avatar that exhibits 'realistic' behavior.

The AI considering many hypothetical future models of itself, if the AI itself is conscious.

Since Superintelligences could potentially have a lot of computing power (especially if they have expanded onto rapid infrastructure) there is the potential for mindcrime accidents of this type to involve more simulated people than have existed throughout human history to date. This would not be an astronomical disaster since it would not (by hypothesis) wipe out our posterity and our intergalactic future, but it could be a disaster orders of magnitude larger than, say, the Holocaust, the Mongol Conquest, the Middle Ages, or all human tragedy to date.

Three possible research avenues for preventing mindcrime are as follows:

Try to create a limited AI that does not model other minds except relative to some very narrow class of permitted agent models that we are pretty sure are not sapient/sentient. This avenue is potentially motivated for other reasons as well, such as Avoiding Christiano's hack and averting programmer deception.

Try to define a nonperson predicate that returns 1 for all people and many nonpeople, and returns 0 only for nonpeople (has no false negatives). Nearest unblocked neighbor would be a potential problem for this approach if it is based on a blacklist rather than a whitelist, and possibly even if it's based on a whitelist (there would be great instrumental pressure to find a loophole which models unusually accurately and may also happen to be unusually sapient, since it got to be unusually accurate by finding a loophole in one of the exclusions).

Try to finish the philosophical problem of understanding which causal processes experience sapience (or are otherwise objects of ethical value) in the next couple of decades, to sufficient detail that it can be crisply stated to an AI. (While a proponent of finishable philosophy, even Eliezer Yudkowsky doesn't want to rely on doing this in the next 20 years when he hasn't done it already.)

Among other properties, the problem of mindcrime is distinguished by the worry that we can't ask an AI to solve it for us without already committing the disaster. In other words, if we ask an AI to predict what we would say if we had a thousand years to think about the problem of defining personhood or figuring out which causal processes are 'conscious', this seems exceptionally likely to cause the AI to commit mindcrime in the course of answering the question. Even asking the AI to think abstractly about the problem of consciousness, or predict by abstract reasoning what humans might say about it, seems exceptionally likely to result in mindcrime. There thus exists a development order problem preventing us from asking the AI to solve the problem for us, since to file this request safely and without committing mindcrime, we would need the request to already have been completed.