A Butlerian genie or AI Butler is an AI that understands material objects and technology extremely well, but has been somehow averted from modeling human minds, or possibly its own mind, ~~overly well.~~beyond some point. (Presumably via a corrigible injunction that averts it wanting to model such minds.)

This is because many of the dangers in AI seem to be associated with the AI having a sufficiently advanced model of human minds or ~~itself,~~AI minds, including:

Mindcrime
~~Programmer~~ deception and ~~programmer~~ manipulation
~~Recursive~~ self-improvement
Modeling distant adversarial superintelligences

~~This~~Limiting the degree to which the AI can understand cognitive science, other minds, its own programmers, and itself is a very severe restriction ~~and might be overly difficult~~that would prevent a number of obvious ways to ~~do coherently.~~make progress on the AGI subproblem.

However, if we could in fact have an AI with an excellent grasp of how to accomplish material goals not involving the direct modeling and manipulation of human minds, it would not be irrelevant; it could carry out instructions that were game-changing for the ~~value achievement~~larger dilemma.

~~While~~And while a Butlerian AI would still require most of genie theory and corrigibility to be solved, it's plausible that the restriction away from modeling humans, programmers, and some types of reflectivity, would collectively make it significantly easier to make a safe form of this genie.

Thus, a Butlerian AI is one of few open candidates for "AI that is restricted in a way that actually makes it safer to build, without it being ~~irrelevant"~~so restricted as to be incapable of game-changing achievements".

A Butlerian genie or AI Butler is an AI that understands material objects and technology extremely well, but has been somehow averted from modeling human minds, or possibly its own mind, overly well.

This is because many of the dangers in AI seem to be associated with the AI having a sufficiently advanced model of human minds or itself, including:

Mindcrime
Programmer deception and programmer manipulation
Recursive self-improvement

This is a very severe restriction and might be overly difficult to do coherently.

While a Butlerian AI would still require most of genie theory and corrigibility to be solved, it's plausible that the restriction away from modeling humans, programmers, and some types of reflectivity, would collectively make it significantly easier to make a safe form of this genie.

Thus, a Butlerian AI is one of few open candidates for "AI that is restricted in a way that actually makes it safer to build, without being irrelevant".

The term 'Butlerian AI' comes from Frank Herbert's Dune, in which the Orange Catholic Bible contains the injunction, "Thou shalt not make a machine in the likeness of a human mind."

A behaviorist genie is an AI that has been averted from modeling minds in more detail than some ~~allowed cap.~~whitelisted class of models.

Nonetheless, limiting the degree to which the AI can understand cognitive science, other minds, its own programmers, and itself is a very severe restriction that would prevent a number of obvious ways to make progress on the AGI subproblem and the value identification problem even for ~~genie~~ commands given to Task AGIs (Genies). Furthermore, there could perhaps be easier types of genies to build, or there might be grave difficulties in restricting the model class to some space that is useful without being dangerous. ~~Behaviorist genies are being considered tentatively, rather than as a settled research path.~~

A ~~Butlerian~~mindblind genie or AI Butler is an AI that understands material objects and technology extremely well, but has been somehow averted from modeling human minds, or possibly its own mind, beyond some point. (Presumably via a corrigible injunction that averts it wanting to model such minds.)

And while a ~~Butlerian~~mindblind AI would still require most of genie theory and corrigibility to be solved, it's plausible that the restriction away from modeling humans, programmers, and some types of reflectivity, would collectively make it significantly easier to make a safe form of this genie.

Thus, a ~~Butlerian~~mindblind AI is one of few open candidates for "AI that is restricted in a way that actually makes it safer to build, without it being so restricted as to be incapable of game-changing achievements".

~~The term 'Butlerian AI' comes from Frank Herbert's~~ ~~Dune~~~~, in which the Orange Catholic Bible contains the injunction, "Thou shalt not make a machine in the likeness of a human mind."~~

A ~~mindblind~~behaviorist genie is an AI that ~~understands material objects and technology extremely well, but~~ has been ~~somehow~~ averted from modeling ~~human minds, or possibly its own mind, beyond~~ minds in more detail than some ~~point. (Presumably via a~~ ~~corrigible~~ ~~injunction that averts it~~ ~~wanting~~ ~~to model such minds.)~~allowed cap.

This is possibly a good idea because many ~~of the dangers in AI~~possible difficulties seem to be associated with the AI having a sufficiently advanced model of human minds or AI minds, including:

~~Limiting~~...and yet an AI that is extremely good at understanding material objects and technology (just not other minds) would still be capable of some important classes of pivotal achievement.

A behaviorist genie would still require most of genie theory and corrigibility to be solved. But it's plausible that the restriction away from modeling humans, programmers, and some types of reflectivity, would collectively make it significantly easier to make a safe form of this genie.

Thus, a behaviorist genie is one of fairly few open candidates for "AI that is restricted in a way that actually makes it safer to build, without it being so restricted as to be incapable of game-changing achievements".

Nonetheless, limiting the degree to which the AI can understand cognitive science, other minds, its own programmers, and itself is a very severe restriction that would prevent a number of obvious ways to make progress on the AGI ~~subproblem.~~subproblem and the value identification problem even for genie commands. Furthermore, there could perhaps be easier types of genies to build, or there might be grave difficulties in restricting the model class to some space that is useful without being dangerous. Behaviorist genies are being considered tentatively, rather than as a settled research path.

Requirements for implementation

~~However, if~~Broadly speaking, two possible clusters of behaviorist-genie design are:

A cleanly designed, potentially self-modifying genie that can internally detect modeling problems that threaten to become mind-modeling problems, and route them into a special class of allowable mind-models.
A known-algorithm non-self-improving AI, whose complete set of capabilities have been carefully crafted and limited, which was shaped to not have much capability when it comes to modeling humans (or distant superintelligences).

Breaking the first case down into more detail, the potential desiderata for a behavioristic design are:

(a) avoiding mindcrime when modeling humans
(b) not modeling distant superintelligences or alien civilizations
(c) avoiding programmer manipulation
(d) avoiding mindcrime in internal processes
(e) making self-improvement somewhat less accessible.

These are different goals, but with some overlap between them. Some of the things we ~~could in fact have an~~might need:

A working Nonperson predicate that was general enough to screen the entire hypothesis space AND that was resilient against loopholes AND passed enough okay computations to screen the entire hypothesis space
A working Nonperson predicate that

...

			v1.9.0Apr 1st 2016 GMT	(+56/-112)
			v1.8.0Dec 29th 2015 GMT
			v1.7.0Dec 29th 2015 GMT	(+4363/-973)
			v1.6.0Dec 25th 2015 GMT	(-13)
			v1.5.0Dec 25th 2015 GMT
			v1.4.0Dec 25th 2015 GMT	(+27/-208)
			v1.3.0Dec 17th 2015 GMT	(+10)
			v1.2.0Aug 6th 2015 GMT	(+22)
			v1.1.0Jul 14th 2015 GMT	(+434/-131)
			v1.0.0Jul 14th 2015 GMT	(+1414)

			v1.9.0Apr 1st 2016 GMT	(+56/-112)
			v1.8.0Dec 29th 2015 GMT
			v1.7.0Dec 29th 2015 GMT	(+4363/-973)
			v1.6.0Dec 25th 2015 GMT	(-13)
			v1.5.0Dec 25th 2015 GMT
			v1.4.0Dec 25th 2015 GMT	(+27/-208)
			v1.3.0Dec 17th 2015 GMT	(+10)
			v1.2.0Aug 6th 2015 GMT	(+22)
			v1.1.0Jul 14th 2015 GMT	(+434/-131)
			v1.0.0Jul 14th 2015 GMT	(+1414)

LESSWRONG
LW

LESSWRONG
LW

Behaviorist genie

Behaviorist genie

Requirements for implementation