At some point, the I/O channels *must be* well defined.

This statement is precisely what is being challenged--and for good reason: it's untrue. The reason it's untrue is because the concept of "I/O channels" does not exist within physics as we know it; the true laws of physics make no reference to inputs, outputs, or indeed any kind of agents at all. In reality, that which is considered a computer's "I/O channels" are simply arrangements of matter and energy, the same as everything else in our universe. There are no special XML tags attached to those configurations of matter and energy, marking them "input", "output", "processor", etc. Such a notion is unphysical.

Why might this distinction be important? It's important because an algorithm that is implemented on physically existing hardware can be physically disrupted. Any notion of agency which fails to account for this possibility--such as, for example, AIXI, which supposes that the only interaction it has with the rest of the universe is by exchanging bits of information via the input/output channels--will fail to consider the possibility that its own operation may be disrupted. A physical implementation of AIXI would have no regard for the safety of its hardware, since it has no means of representing the fact that the destruction of its hardware equates to its own destruction.

AIXI also fails on various decision problems that involve leaking information via a physical side channel that it doesn't consider part of its output; for example, it has no regard for the thermal emissions it may produce as a side effect of its computations. In the extreme case, AIXI is incapable of conceptualizing the possibility that an adversarial agent may be able to inspect its hardware, and hence "read its mind". This reflects a broader failure on AIXI's part: it is incapable of representing an entire class of hypotheses--namely, hypotheses that involve AIXI itself being modeled by other agents in the environment. This is, again, because AIXI is defined using a framework that makes it unphysical: the classical definition of AIXI is uncomputable, making it too "big" to be modeled by any (part) of the Turing machines in its hypothesis space. This applies even to computable formulations of AIXI, such as AIXI-tl: they have no way to represent the possibility of being simulated by others, because they assume they are too large to fit in the universe.

I'm not sure what exactly is so hard to understand about this, considering the original post conveyed all of these ideas fairly well. It may be worth considering the assumptions you're operating under--and in particular, making sure that the post itself does not violate those assumptions--before criticizing said post based on those assumptions.

Decision Theory

by abramdemski, Scott Garrabrant 1 min read31st Oct 201837 comments


Ω 24

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

(A longer text-based version of this post is also available on MIRI's blog here, and the bibliography for the whole sequence can be found here.)

The next post in this sequence, 'Embedded Agency', will come out on Friday, November 2nd.

Tomorrow’s AI Alignment Forum sequences post will be 'What is Ambitious Value Learning?' in the sequence 'Value Learning'.