An unrealistic example of this would be Solomonoff ~~induction,~~induction, where predictions are made by means that include running many possible simulations of the environment and seeing which ones best correspond to reality. Among current machine learning algorithms, particle filters and Monte Carlo algorithms similarly involve running many possible simulated versions of a system.

(~~EliezerYudkowsky~~Eliezer Yudkowsky has advocated that we shouldn't let any AI short of extreme levels of safety and robustness assurance consider distant civilizations in lots of detail in any case, since this means our AI might embed (a model of) a hostile superintelligence.)

~~Gloss: A huge amount of harm could occur if a~~ ~~machine intelligence~~ ~~turns out to contain lots of~~ ~~conscious subprograms~~ ~~enduring poor living conditions. One worry is that this might happen if an AI models humans in too much detail.~~

Technical summary: 'Mindcrime' is Bostrom's term for mind designs producing moral harm by their internal operation, particularly through containing sentient subprocesses embedded in the code. One worry is that mindcrime might arise in the course of an agent trying to predict or manipulate the humans in its environment, since this implies a pressure to model the humans in faithful detail. This is especially concerning since several value alignment proposals would explicitly call for modeling humans in detail, e.g. ~~extrapolated volition~~ ~~and~~ ~~approval-based agents~~. Another problem scenario is if the natural design for an efficient subprocess involves independent consciousness (though it is a separate question if this optimal design involves pain or suffering). Computationally powerful agents might contain vast numbers of trapped conscious subprocesses, qualifying this as a ~~global catastrophic risk~~.

"Mindcrime" is Nick ~~Bostrom'~~Bostrom's suggested term for scenarios in which an AI's cognitive processes are intrinsically doing moral harm, for example because the AI contains trillions of suffering conscious beings inside it.

Problem of sapient models (of civilizations):

Three possible origins of a convergent instrumental pressure to consider intelligent civilizations in great detail: -

Assigning sufficient probability to the existence of non-obvious extraterrestrial intelligences in Earth's vicinity, perhaps due to considering the Fermi Paradox. -
Naturalistic induction, combined with the AI considering the hypothesis that it is in a simulated environment. -
Logical decision theories and utility functions that care about the consequences of the AI's decisions via instances of the AI's reference class that could be embedded inside alien simulations.

(~~Yudkowsky~~EliezerYudkowsky has advocated that we shouldn't let any AI short of extreme levels of safety and robustness assurance consider distant civilizations in lots of detail in any case, since this means our AI might embed (a model of) a hostile superintelligence.)

Difficulties:

Difficulties

Scope of potential disaster:

disaster

Development-order issue:

issue

Weirdness:

Weirdness

Nonperson predicates:

predicates

Possible approaches:

approaches

Problem of sapient models (of humans): Occurs naturally if the best predictive model for humans in the environment involves models that are detailed enough to be people themselves.
Problem of sapient models (of civilizations): Occurs naturally if the agent tries to simulate, e.g., alien civilizations that might be simulating it, in enough detail to include conscious simulations of the aliens.
Problem of sapient subsystems: Occurs naturally if the most efficient design for some cognitive subsystems involves creating subagents that are self-reflective, or have some other property leading to consciousness or personhood.
Problem of sapient self-models: If the AI is conscious or possible future versions of the AI are conscious, it might run and terminate a large number of conscious-self models in the course of considering possible self-modifications.

~~Yudkowsky~~ ~~terms a~~A nonperson predicate ~~any~~is an effective test that we, or ~~better,~~ an AI, can use to determine that some computer program is definitely not a person. In principle, a nonperson predicate needs only two possible outputs, "Don't know" and "Definitely not a person". It's acceptable for many actually-nonperson programs to be labeled "don't know", so long as no people are labeled "definitely not a person".

If the above was the only requirement, one simple nonperson predicate would be to label ~~the program "return 0" as "definitely not a person" and label all other programs~~everything "don't know". The implicit ~~difficult requirement~~difficulty is that the nonperson predicate must also pass some programs of high complexity that do things like "acceptably model humans" or "acceptably model future versions of the AI".

Not all philosophical confusions and computational difficulties are averted by asking for a partial list of unconscious programs instead of a total list of conscious programs. Even if we don't know which properties are sufficient, we'd need to know something solid about properties that are necessary for consciousness or sufficient for nonpersonhood.
We can't pass once-and-for-all any class of programs that's Turing-complete. We can't say once and for all that it's safe to model gravitational interactions in a solar system, if enormous gravitational systems could encode computers that encode people.
The Nearest ~~Unblocked Neighbor~~unblocked strategy problem seems particularly worrisome here. If we block off some options for modeling humans directly, the next best option is unusually likely to be conscious. Even if we rely on a whitelist rather than a blacklist, this may lead to a whitelisted "gravitational model" that secretly encodes a human, and so on.

~~Yudkowsky is currently pessimistic about relying on be able to find a nonperson predicate in time, though he considers it something to keep poking at.~~

Possible approaches

Research avenues

~~Mindblind genie~~Behaviorism: Try to create a limited AI that does not model other minds or possibly even itself,

...

Read More (82 more words)

~~Butlerian~~Mindblind genie: Try to create a limited AI that does not model other minds or possibly even itself, except using some narrow class of agent models that we are pretty sure will not be sentient. This avenue is potentially motivated for other reasons as well, such as avoiding Christiano's hack and averting programmer manipulation.
Try to define a nonperson predicate that whitelists enough programs to carry out some pivotal achievement.
Try for an AI that can bootstrap our understanding of consciousness and tell us about what we would define as a person, while committing a relatively small amount of mindcrime, with all computed possible-people being stored rather than discarded, and the modeled agents being entirely happy, mostly happy, or non-suffering. E.g., put a happy person at the center of the approval-directed agent, and try to oversee the AI's algorithms and ask it not to use Monte Carlo simulations if possible.
Ignore the problem in all pre-interstellar stages because it's still relatively small compared to astronomical stakes and therefore not worth significant losses in success probability. (This may backfire under some versions of the Simulation Hypothesis.)
Try to finish the philosophical problem of understanding which causal processes experience sapience (or are otherwise objects of ethical value), in the next couple of decades, to sufficient detail that it can be crisply stated to an AI, with sufficiently complete coverage that it's not subject to the Nearest Neighbor problem.

With respect to the latter two possibilities, note that the AI does not need to be considering possibilities in which the whole Earth as we know it is a simulation. The AI only needs to consider that, among the possible explanations of the AI's current sense data and internal data, there are scenarios in which the AI is embedded in some world other than the most 'obvious' one implied by the sense data. See also ~~Christiano's Hack~~Distant superintelligences can coerce the most probable environment of your AI for a related hazard of the AI considering possibilities in which it is being simulated.

[Summary: 'Mindcrime' is Bostrom's suggested term for the moral catastrophe that occurs if a [2c machine ~~intelligence~~intelligence] contains enormous numbers of conscious beings trapped inside its code, in poor living conditions. This could happen as a result of self-awareness being a natural property of computationally efficient subprocesses. Perhaps more worryingly, the best model of a person may be a person itself, even if they're not the same person. This means that AIs trying to model humans might be unusually likely to create hypotheses and simulations that are themselves conscious.]

			v1.18.0Feb 20th 2025 GMT
			v1.17.0Feb 19th 2025 GMT	(+10/-10)
			v1.16.0Dec 29th 2016 GMT	(+78/-17)
			v1.15.0Jul 11th 2016 GMT	(+132/-356)
			v1.14.0Feb 16th 2016 GMT	(+27/-26)
			v1.13.0Dec 29th 2015 GMT
			v1.12.0Dec 25th 2015 GMT	(+9/-9)
			v1.11.0Dec 16th 2015 GMT	(+78/-1231)
			v1.10.0Dec 15th 2015 GMT	(-587)
			v1.9.0Dec 15th 2015 GMT	(+19/-12)

			v1.18.0Feb 20th 2025 GMT
			v1.17.0Feb 19th 2025 GMT	(+10/-10)
			v1.16.0Dec 29th 2016 GMT	(+78/-17)
			v1.15.0Jul 11th 2016 GMT	(+132/-356)
			v1.14.0Feb 16th 2016 GMT	(+27/-26)
			v1.13.0Dec 29th 2015 GMT
			v1.12.0Dec 25th 2015 GMT	(+9/-9)
			v1.11.0Dec 16th 2015 GMT	(+78/-1231)
			v1.10.0Dec 15th 2015 GMT	(-587)
			v1.9.0Dec 15th 2015 GMT	(+19/-12)

LESSWRONG
LW