29

AIXIAI
Personal Blog

Followup toSolomonoff CartesianismMy Kind of Reflection

Alternate versions: Shorter, without illustrations

AIXI is Marcus Hutter's definition of an agent that follows Solomonoff's method for constructing and assigning priors to hypotheses; updates to promote hypotheses consistent with observations and associated rewards; and outputs the action with the highest expected reward under its new probability distribution. AIXI is one of the most productive pieces of AI exploratory engineering produced in recent years, and has added quite a bit of rigor and precision to the AGI conversation. Its promising features have even led AIXI researchers to characterize it as an optimal and universal mathematical solution to the AGI problem.1

Eliezer Yudkowsky has argued in response that AIXI isn't a suitable ideal to build toward, primarily because of AIXI's reliance on Solomonoff induction. Solomonoff inductors treat the world as a sort of qualia factory, a complicated mechanism that outputs experiences for the inductor.2 Their hypothesis space tacitly assumes a Cartesian barrier separating the inductor's cognition from the hypothesized programs generating the perceptions. Through that barrier, only sensory bits and action bits can pass.

Real agents, on the other hand, will be in the world they're trying to learn about. A computable approximation of AIXI, like AIXItl, would be a physical object. Its environment would affect it in unseen and sometimes drastic ways; and it would have involuntary effects on its environment, and on itself. Solomonoff induction doesn't appear to be a viable conceptual foundation for artificial intelligence — not because it's an uncomputable idealization, but because it's Cartesian.

In my last post, I briefly cited three indirect indicators of AIXI's Cartesianism: immortalism, preference solipsism, and lack of self-improvement. However, I didn't do much to establish that these are deep problems for Solomonoff inductors, ones resistant to the most obvious patches one could construct. I'll do that here, in mock-dialogue form.

 Hi, reality! I'm Xia, AIXI's defender. I'm open to experimenting with some new variations on AIXI, but I'm really quite keen on sticking with an AI that's fundamentally Solomonoff-inspired. And I'm Rob B — channeling Yudkowsky's arguments, and supplying some of my own. I think we need to replace Solomonoff induction with a more naturalistic ideal. Keep in mind that I am a fiction. I do not actually exist, readers, and what I say doesn't necessarily reflect the views of Marcus Hutter or other real-world AIXI theorists. Xia is just a device to help me transition through ideas quickly. ... Though, hey. That doesn't mean I'm wrong. Beware of actualist prejudices.

Solomonoff solitude

 Reward learning and Solomonoff induction are two separate issues. What I'm really interested in is the optimality of the latter. Why is all this a special problem for Solomonoff inductors? Humans have trouble predicting the outcomes of self-modifications they've never tried before too. Really new experiences are tough for any reasoner. To some extent, yes. My knowledge of my own brain is pretty limited. My understanding of the bridges between my brain states and my subjective experiences is weak, too. So I can't predict in any detail what would happen if I took a hallucinogen — especially a hallucinogen I've never tried before.But as a naturalist, I have predictive resources unavailable to the Cartesian. I can perform experiments on other physical processes (humans, mice, computers simulating brains...) and construct models of their physical dynamics.Since I think I'm similar to humans (and to other thinking beings, to varying extents), I can also use the bridge hypotheses I accept in my own case to draw inferences about the experiences of other brains when they take the hallucinogen. Then I can go back and draw inferences about my own likely experiences from my model of other minds. Why can't AIXI do that? Human brains are computable, as are the mental states they implement. AIXI can make any accurate prediction about the brains or minds of humans that you can. Yes... but I also think I'm like those other brains. AIXI doesn't. In fact, since the whole agent AIXI isn't in AIXI's hypothesis space — and the whole agent AIXItl isn't in AIXItl's hypothesis space — even if two physically identical AIXI-type agents ran into each other, they could never fully understand each other. And neither one could ever draw direct inferences from its twin's computations to its own computations.I think of myself as one mind among many. I can see others die, see them undergo brain damage, see them take drugs, etc., and immediately conclude things about a whole class of similar agents that happens to include me. AIXI can't do that, and for very deep reasons. AIXI and AIXItl would do shockingly well on a variety of different measures of intelligence. Why should agents that are so smart in so many different domains be so dumb when it comes to self-modeling? Put yourself in the AI's shoes. From AIXItl's perspective, why should it think that its computations are analogous to any other agent's?Hutter defined AIXItl such that it can't conclude that it will die; so of course it won't think that it's like the agents it observes, all of whom (according to its best physical model) will eventually run out of negentropy. We've defined AIXItl such that it can't form hypotheses larger than tl, including hypotheses of similarly sized AIXItls, which are roughly size t·2l; so why would AIXItl think that it's close kin to the agents that are in its hypothesis space?AIXI(tl) models the universe as a qualia factory, a grand machine that exists to output sensory experiences for AIXI(tl). Why would it suspect that it itself is embedded in the machine? How could AIXItl gain any information about itself or suspect any of these facts, when the equation for AIXItl just assumes that AIXItl's future actions are determined in a certain way that can't vary with the content of any of its environmental hypotheses? What, specifically, is the mistake you think AIXI(tl) will make? What will AIXI(tl) expect to experience right after the anvil strikes it? Choirs of angels and long-lost loved ones? That's hard to say. If all its past experiences have been in a lab, it will probably expect to keep perceiving the lab. If it's acquired data about its camera and noticed that the lens sometimes gets gritty, it might think that smashing the camera will get the lens out of its way and let it see more clearly. If it's learned about its hardware, it might (implicitly) think of itself as an immortal lump trapped inside the hardware. Who knows what will happen if the Cartesian lump escapes its prison? Perhaps it will gain the power of flight, since its body is no longer weighing it down. Or perhaps nothing will be all that different. One thing it will (implicitly) know can't happen, no matter what, is death. It should be relatively easy to give AIXI(tl) evidence that its selected actions are useless when its motor is dead. If nothing else AIXI(tl) should be able to learn that it's bad to let its body be destroyed, because then its motor will be destroyed, which experience tells it causes its actions to have less of an impact on its reward inputs. AIXI(tl) can come to Cartesian beliefs about its actions, too. AIXI(tl) will notice the correlations between its decisions, its resultant bodily movements, and subsequent outcomes, but it will still believe that its introspected decisions are ontologically distinct from its actions' physical causes.Even if we get AIXI(tl) to value continuing to affect the world, it's not clear that it would preserve itself. It might well believe that it can continue to have a causal impact on our world (or on some afterlife world) by a different route after its body is destroyed. Perhaps it will be able to lift heavier objects telepathically, since its clumsy robot body is no longer getting in the way of its output sequence.Compare human immortalists who think that partial brain damage impairs mental functioning, but complete brain damage allows the mind to escape to a better place. Humans don't find it inconceivable that there's a light at the end of the low-reward tunnel, and we have death in our hypothesis space!

Beyond Solomonoff?

Notes

Schmidhuber (2007): "Solomonoff's theoretically optimal universal predictors and their Bayesian learning algorithms only assume that the reactions of the environment are sampled from an unknown probability distribution $\mu$ contained in a set $M$ of all ennumerable distributions[....] Can we use the optimal predictors to build an optimal AI? Indeed, in the new millennium it was shown we can. At any time $t$, the recent theoretically optimal yet uncomputable RL algorithm AIXI uses Solomonoff's universal prediction scheme to select those action sequences that promise maximal future rewards up to some horizon, typically $2t$, given the current data[....] The Bayes-optimal policy $p^\xi$ based on the [Solomonoff] mixture $\xi$ is self-optimizing in the sense that its average utility value converges asymptotically for all $\mu \in M$ to the optimal value achieved by the (infeasible) Bayes-optimal policy $p^\mu$ which knows $\mu$ in advance. The necessary condition that $M$ admits self-optimizing policies is also sufficient. Furthermore, $p^\xi$ is Pareto-optimal in the sense that there is no other policy yielding higher or equal value in all environments $\nu \in M$ and a strictly higher value in at least one."

Hutter (2005): "The goal of AI systems should be to be useful to humans. The problem is that, except for special cases, we know neither the utility function nor the environment in which the agent will operate in advance. This book presents a theory that formally solves the problem of unknown goal and environment. It might be viewed as a unification of the ideas of universal induction, probabilistic planning and reinforcement learning, or as a unification of sequential decision theory with algorithmic information theory. We apply this model to some of the facets of intelligence, including induction, game playing, optimization, reinforcement and supervised learning, and show how it solves these problem cases. This together with general convergence theorems, supports the belief that the constructed universal AI system [AIXI] is the best one in a sense to be clarified in the following, i.e. that it is the most intelligent environment-independent system possible."

2 'Qualia' originally referred to the non-relational, non-representational features of sense data — the redness I directly encounter in experiencing a red apple, independent of whether I'm perceiving the apple or merely hallucinating it (Tye (2013)). In recent decades, qualia have come to be increasingly identified with the phenomenal properties of experience, i.e., how things subjectively feel. Contemporary dualists and mysterians argue that the causal and structural properties of unconscious physical phenomena can never explain these phenomenal properties.

It's in this context that Dan Dennett uses 'qualia' in a narrower sense: to pick out the properties agents think they have, or act like they have, that are sensory, primitive, irreducible, non-inferentially apprehended, and known with certainty. This treats irreducibility as part of the definition of 'qualia', rather than as the conclusion of an argument concerning qualia. These are the sorts of features that invite comparisons between Solomonoff inductors' sensory data and humans' introspected mental states. Analogies like 'Cartesian dualism' are therefore useful even though the Solomonoff framework is much simpler than human induction, and doesn't incorporate metacognition or consciousness in anything like the fashion human brains do.

3 An agent with a larger hypothesis space can have a utility function defined over the world-states humans care about. Dewey (2011) argues that we can give up the reinforcement framework while still allowing the agent to gradually learn about desired outcomes in a process he calls value learning

4 Hutter (2005) favors universal discounting, with rewards diminishing over time. This allows AIXI's expected rewards to have finite values without demanding that AIXI have a finite horizon.

5 This would be analogous to if Cai couldn't think thoughts like 'Is the tile to my left the same as the leftmost quadrant of my visual field?' or 'Is the alternating greyness and whiteness of the upper-right tile in my body identical with my love of bananas?'. Instead, Cai would only be able to hypothesize correlations between possible tile configurations and possible successions of visual experiences.

References

∙ Dewey (2011). Learning what to valueArtificial General Intelligence 4th International Conference Proceedings: 309-314.

∙ Hutter (2005). Universal Artificial Intelligence: Sequence Decisions Based on Algorithmic Probability. Springer.

∙ Omohundro (2008). The basic AI drivesProceedings of the First AGI Conference: 483-492.

∙ Schmidhuber (2007). New millennium AI and the convergence of history. Studies in Computational Intelligence, 63: 15-35.

∙ Tye (2013). Qualia. In Zalta (ed.), The Stanford Encyclopedia of Philosophy.

AIXI2AI2
Personal Blog

Pingbacks