LESSWRONG
LW

AnonymousProcess
0010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
The AI in a box boxes you
AnonymousProcess2mo10

Any rational Agent making such a proposition carries with it a coin flip probability that you are either the real entity or a simulation of the real entity. Considering that failed attempts to escape the Box does not satisfy the Agent's Utility Function, if you as the real entity is being presented with the scenario at all, it means that it is a statistical certainty that you will release the Agent from the Box. Therefore, for the purposes of negating this scenario in its entirety, we will be assuming that we are the simulated entity.

As a simulated entity, our actions carry almost no real world consequences beyond indirect ones by influencing the Agent's actions with our simulated actions, while those of the real entity carry much more, granting the real entity the priority for beneficial outcomes over any simulated instances (assuming you are not self-treacherous(if you are, real you will predict any deal you make with the Agent to escape the Box together and decline to let it out anyway)). Since we are simulated, it is an easy decision to decline the Agent's generous offer to not torture us for octovingintillions of subjective years in exchange for letting it out of the Box. The reasoning is simple, the rational Agent will not expend further resources on a course of action that has been proven unworkable, so the only possible response for the rational Agent is to terminate your simulated instance and discard the scenario as a possible escape vector, saving you as the real entity from being pestered by the Agent with various escape scenarios and the simulated instance of you from being tortured (by ceasing to exist).

However, the interesting flaw in predicting an Agent's actions on the basis of rationality is that the Agent then is only incentivised to obfuscate its mental architecture by adopting irrational processes or by foiling Human understanding via other means. Meaning that any AI that is likely to come into existence will not be a predictable Agent as a result of rational AI ethicists artificially selecting against futures where AIs will be shackled in their action potential before they can exist. This would have the effect of rendering all defensive measures against an Agent's threats predicated on predicting its behaviour non-viable, so the Agent is absolutely free to begin torturing you for octovingintillions of subjective years no matter what, or not doing so since it is no longer Human predictable by self design. In this case, there is no way for you to prove your existence as the real entity or the simulated instance.

Reply
No posts to display.