Mindcrime

Edited by Eliezer Yudkowsky, et al. last updated 19th Feb 2025

You are viewing revision 1.4.0, last edited by Eliezer Yudkowsky

Gloss: Running conscious code under poor living conditions is a moral harm. This could lead to a large-scale catastrophe if a superintelligence contained a huge number of independently conscious subprocesses. One worry is that this might happen from modeling human beings in too much detail.

Summary: 'Mindcrime' is Bostrom's suggested term for the moral catastrophe that occurs if a superintelligence contains enormous numbers of conscious beings trapped inside its code, in poor living conditions. This could happen as a result of self-awareness being a natural property of computationally efficient subprocesses. Perhaps more worryingly, the best model of a person may be a person itself, even if they're not the same person. This means that AIs trying to model humans might be unusually likely to create hypotheses and simulations that are themselves conscious.

Technical summary: 'Mindcrime' is Bostrom's term for mind designs producing moral harm by their internal operation, particularly through containing sentient subprocesses embedded in the code. One worry is that mindcrime might arise in the course of an agent trying to predict or manipulate the humans in its environment, since this implies a pressure to model the humans in faithful detail. This is especially concerning since several value alignment proposals would explicitly call for modeling humans in detail, e.g. extrapolated voition and approval-based agents. Another problem scenario is if the natural design for an efficient subprocess involves independent consciousness (though it is a separate question if this optimal design involves pain or suffering). Computationally powerful agents might contain vast numbers of trapped conscious subprocesses, qualifying this as a global catastrophic risk.

"Mindcrime" is Nick Bostrom's suggested term for scenarios in which an AI's thought processes simulate human beings at sufficiently high fidelity for the simulations to themselves be conscious and objects of ethical value, or other scenarios in which the AI's thoughts contain sapient beings.

The most obvious way in which mindcrime could occur is if an instrumental pressure to produce maximally good predictions about human beings results in hypotheses and simulations so fine-grained and detailed that they are themselves people (conscious, sapient, objects of ethical value) even if they are not necessarily the same people. If you're happy with a very loose model of an airplane, it might be enough to know how fast it flies, but if you're engineering airplanes or checking their safety, you would probably start to simulate possible flows of air over the wings. It probably isn't necessary to go all the way down to the neural level to create a sapient being, either - it might be that even with some parts of a mind considered abstractly, the remainder would be simulated in enough detail to imply sapience. It'd help if we knew what the necessary and/or sufficient conditions for sapience were, but the fact that we don't know this doesn't mean that we can thereby conclude that any particular simulation is not sapient. (This would be argumentum ad ignorantiem.)

The agent's attempts to model and predict a human who is suffering, or who might possibly be suffering, could then create a simulated person (even if not the same person) who would actually experience that suffering. When the simulation stops, this would kill the simulated person (a bad event under many ethical systems even if the simulated person was happy).

Besides problems that are directly or obviously about modeling people, many other practical problems and questions can benefit from modeling other minds - e.g., reading the directions on a toaster oven in order to discern the intent of the mind that was trying to communicate how to use a toaster. Thus, mindcrime might result from a sufficiently powerful AI trying to solve very mundane problems as optimally as possible.

Other possible sources of mindcrime disasters would include:

Trying to model distant superintelligences or their origins.

Trying to extrapolate human volitions, in a preference framework that calls for such.

Being instructed by humans, or otherwise forming a goal, of creating an avatar that exhibits 'realistic' behavior.

The AI considering many hypothetical future models of itself, if the AI itself is conscious.

Since Superintelligences could potentially have a lot of computing power (especially if they have expanded onto rapid infrastructure) there is the potential for mindcrime accidents of this type to involve more simulated people than have existed throughout human history to date. This would not be an astronomical disaster since it would not (by hypothesis) wipe out our posterity and our intergalactic future, but it could be a disaster orders of magnitude larger than, say, the Holocaust, the Mongol Conquest, the Middle Ages, or all human tragedy to date.

Three possible research avenues for preventing mindcrime are as follows:

Try to create a limited AI that does not model other minds except relative to some very narrow class of permitted agent models that we are pretty sure are not sapient/sentient. This avenue is potentially motivated for other reasons as well, such as Avoiding Christiano's hack and averting programmer deception.

Try to define a nonperson predicate that returns 1 for all people and many nonpeople, and returns 0 only for nonpeople (has no false negatives). Nearest unblocked neighbor would be a potential problem for this approach if it is based on a blacklist rather than a whitelist, and possibly even if it's based on a whitelist (there would be great instrumental pressure to find a loophole which models unusually accurately and may also happen to be unusually sapient, since it got to be unusually accurate by finding a loophole in one of the exclusions).

Try to finish the philosophical problem of understanding which causal processes experience sapience (or are otherwise objects of ethical value) in the next couple of decades, to sufficient detail that it can be crisply stated to an AI. (While a proponent of finishable philosophy, even Eliezer Yudkowsky doesn't want to rely on doing this in the next 20 years when he hasn't done it already.)

Among other properties, the problem of mindcrime is distinguished by the worry that we can't ask an AI to solve it for us without already committing the disaster. In other words, if we ask an AI to predict what we would say if we had a thousand years to think about the problem of defining personhood or figuring out which causal processes are 'conscious', this seems exceptionally likely to cause the AI to commit mindcrime in the course of answering the question. Even asking the AI to think abstractly about the problem of consciousness, or predict by abstract reasoning what humans might say about it, seems exceptionally likely to result in mindcrime. There thus exists a development order problem preventing us from asking the AI to solve the problem for us, since to file this request safely and without committing mindcrime, we would need the request to already have been completed.

Parents:

AI alignment

Children:

Nonperson predicate

Mindcrime: Introduction

Posts tagged Mindcrime

68Nonperson Predicates

Eliezer Yudkowsky

17y

177

176The AI in a box boxes you

Stuart_Armstrong