Oracle AI

The question of whether Oracles – or just keeping an AGI forcibly confined - are safer than fully free AGIs has been the subject of debate for a long time. Armstrong, Sandberg and Bostrom discuss Oracle AI safety at length in their Thinking inside the box: using and controlling an Oracle AI. In the paper, the authors ~~propose a generic theoretical conceptual architecture to create such a system and~~ review various methods which might be used to measure ~~the~~an Oracle's accuracy. They also try to shed some light on some weaknesses and dangers that can emerge on the human ~~side (such~~side, such as psychological vulnerabilities which can be exploited by the Oracle through social ~~engineering, for example). Some~~engineering. The paper discusses ideas for physical security ~~– also known as “boxing” - are also discussed~~(“boxing”), as well as ~~which questions may be safe~~problems involved with trying to ~~ask,~~ ~~utility indifference, and many other factors. The paper’s~~program the AI to only answer questions. In the end, the paper reaches the cautious conclusion ~~that Oracles – or~~ ~~AI boxing~~ ~~concepts in general - are~~of Oracle AIs probably being safer than ~~fully~~ free ~~agent~~ ~~AIs has been a subject of debate for a long time.~~AGIs.

~~After~~One can then imagine all the ~~establishment~~things that might be useful in achieving the goal of ~~a goal, one can imagine things the optimization process might do towards achieving that goal. This means that, for~~"have correct beliefs". For instance, ~~this process could involve gathering~~acquiring more computing power and resources ~~to answers questions.~~ could help this goal. As such, an Oracle could determine that it might answer more accurately and easily to a certain question if it turned all matter outside the box into computronium, therefore killing all the existing life.

Armstrong, Sandberg and Bostrom discuss Oracle AI safety at length in their Thinking inside the box: using and controlling an Oracle AI. ~~The~~In the paper, the authors propose a generic theoretical conceptual architecture to create such a ~~system, besides reviewing how one~~system and review various methods which might be used to measure ~~it accuracy and~~the Oracle's accuracy. They also try to shed some light on some weaknesses and dangers that can emerge on the human ~~level considerations. Among~~side (such as psychological vulnerabilities which can be exploited by the ~~last are~~Oracle through social engineering, for example). Some ideas for physical security – also known as “boxing” ~~-, the potential for the oracle to use social engineering,~~- are also discussed as well as which questions may be safe to ask, utility indifference, and many other factors. The paper’s conclusion that Oracles – or AI boxing concepts in general - are safer than fully free agent AIs has been a subject of debate for a long time.

The question of whether Oracles –— or just keeping an AGI forcibly confined -— are safer than fully free AGIs has been the subject of debate for a long time. Armstrong, ~~Sandberg~~Sandberg, and Bostrom discuss Oracle safety at length in their Thinking inside the box: using and controlling an Oracle AI. In the paper, the authors review various methods which might be used to measure an Oracle's accuracy. They also try to shed some light on some weaknesses and dangers that can emerge on the human side, such as psychological vulnerabilities which can be exploited by the Oracle through social engineering. The paper discusses ideas for physical security (“boxing”), as well as problems involved with trying to program the AI to only answer questions. In the end, the paper reaches the cautious conclusion ofthat Oracle AIs are probably ~~being~~ safer than free AGIs.

In a related work, Dreams of Friendliness, Eliezer Yudkowsky gives an informal argument stating that all oracles will be agent-like, that is, driven by its own goals. He rests on the idea that anything considered "intelligent" must choose the correct course of action among all actions available. That means that the Oracle will have many possible things to believe, although very few of them are correct. ~~Therefore~~Therefore, believing the correct thing means some method was used to select the correct belief from the many incorrect beliefs. By definition, this is an optimization process which has a goal of selecting correct beliefs.

One can then imagine all the things that might be useful in achieving the goal of ~~"have~~"having correct beliefs". For instance, acquiring more computing power and resources could help this goal. As such, an Oracle could determine that it might answer more accurately and easily to a certain question if it turned all matter outside the box to computronium, therefore killing all the existing life.

Given that true AIs are goal-oriented agents, it follows that a True Oracular AI has some kind of oracular goals. These act as the motivation system for the Oracle to give us the information we ask for and nothing else.

This means that a True Oracular AI has to have a full specification of human values, thus making it a FAI-complete problem – if we could achieve such skill and ~~knowledge~~knowledge, we could just build a Friendly AI and bypass the Oracle AI concept.

Any system that acts only as an informative machine, only answering ~~questions~~questions, and has no goals is by definition not an AI at all. That means that a non-AI Oracular is but a calculator of outputs based on inputs. Since the term in itself is heterogeneous, the proposals made for a sub-division are merely informal.

An Advisor can be seen as a system that gathers data from the real world and computes the answer to an informal “what we ought to do?” question. They also represent aan FAI-complete problem.

Finally, a Predictor is seen as a system that takes a corpus of data and produces a probability distribution over future possible data. There are some proposed dangers with predictors, namely exhibiting goal-seeking behavior which does not converge with ~~humanity~~humanity's goals and the ability to influence us through the predictions.

Dreams of Friendliness
Thinking inside the box: using and controlling an Oracle AI by Armstrong, ~~Sandberg~~Sandberg, and Bostrom

In a related work, Dreams of Friendliness, Eliezer Yudkowsky gives an informal argument stating that all oracles will be agent-like, that is, driven by its own goals. He rests on the idea that anything considered "intelligent" must choose the correct course of action among all actions ~~avaliable.~~available. That means that the Oracle will have many possible things to believe, although very few of them are correct. Therefore believing the correct thing means some method was used to select the correct belief from the many incorrect beliefs. By definition, this is an optimization process which has a goal of selecting correct beliefs.

After the establishment of a goal, one can imagine things the optimization process might do towards achieving that goal. This means that, for instance, this process could involve gathering more computing power and resources to answers questions. As such, an Oracle could determine that it might ~~answera~~answer more accurately and easily to a certain question if it turned all matter outside the box in computronium, therefore killing all the existing life.

In a related work, Dreams of Friendliness, Eliezer Yudkowsky gives an informal argument stating that all oracles will be agent-~~like.~~like, that is, driven by its own goals. He rests on the idea that anything considered "intelligent" must choose the correct course of action among all actions avaliable. That means that the Oracle will have many possible things to believe, although very few of them are correct. Therefore believing the correct thing means some method was used to select the correct belief from the many incorrect beliefs. By definition, this is an optimization process which has a goal of selecting correct beliefs.

After the establishment of a goal, one can imagine things the optimization process might do towards achieving that goal. This means that, for instance, ~~the~~this process could involve gathering more computing power and resources to answers questions. As such, an Oracle could ~~answer~~determine that it might answera more accurately and easily to a certain question if it ~~killed all life on earth or turn~~turned all matter outside the box in computronium., therefore killing all the existing life.

Armstrong, Sandberg and Bostrom discuss Oracle AI safety at length in their Thinking inside the box: using and controlling an Oracle AI. The authors propose a conceptual architecture to create such a system, besides reviewing how one might measure it accuracy and shed some light on human level considerations. Among the last are physical security – also known as “boxing” -, the potential for the oracle to use social engineering, which questions may be safe to ask, utility indifference, and many other factors. The paper’s conclusion that Oracles – or AI boxing concepts in general - are safer than fully free agent AIs has ~~however raised much debate.~~been a subject of debate for a long time.

In a related work, Dreams of Friendliness, Eliezer Yudkowsky gives an informal argument stating that all oracles will be agent-like. He rests on the idea that anything considered "intelligent" must ~~be an~~ ~~optimization process~~.choose the correct course of action among all actions avaliable. That means that the Oracle will have many possible things to ~~believe and~~believe, although very few ~~correct beliefs.~~of them are correct. Therefore believing the correct thing means some method was used to select the correct belief from the many incorrect beliefs. By definition, this is an optimization process which has a goal of selecting correct beliefs. After the establishment of a goal, one can imagine things the optimization process might do towards that goal. This means that, for instance, the Oracle could answer more accurately and easily to a certain question if it killed all life on earth or turn all matter outside the box in computronium.

			v1.18.0Feb 16th 2026 GMT	(+81/-67)
			v1.17.0Dec 30th 2024 GMT	(-3)
			v1.16.0May 25th 2022 GMT	(+10/-10)
			v1.15.0Nov 27th 2020 GMT	(+9)
			v1.14.0Oct 1st 2020 GMT	(+50/-50)
			v1.13.0Oct 8th 2012 GMT	(+509/-569)
			v1.12.0Oct 6th 2012 GMT	(+340/-144)
			v1.11.0Oct 3rd 2012 GMT	(+6/-7)
			v1.10.0Oct 3rd 2012 GMT	(+233/-47)
			v1.9.0Oct 2nd 2012 GMT	(+142/-81)

			v1.18.0Feb 16th 2026 GMT	(+81/-67)
			v1.17.0Dec 30th 2024 GMT	(-3)
			v1.16.0May 25th 2022 GMT	(+10/-10)
			v1.15.0Nov 27th 2020 GMT	(+9)
			v1.14.0Oct 1st 2020 GMT	(+50/-50)
			v1.13.0Oct 8th 2012 GMT	(+509/-569)
			v1.12.0Oct 6th 2012 GMT	(+340/-144)
			v1.11.0Oct 3rd 2012 GMT	(+6/-7)
			v1.10.0Oct 3rd 2012 GMT	(+233/-47)
			v1.9.0Oct 2nd 2012 GMT	(+142/-81)

LESSWRONG
LW

LESSWRONG
LW

Oracle AI

See also

See also