## LESSWRONGLW

Solomonoff Cartesianism

Have you considered that you may be spending a lot of time writing up a problem that has already been solved, and should spend a bit more time checking whether this is the case, before going much further on your path? There was a previous thread about this, but I'll try to explain again from a slightly different angle.

The idea is that logical facts in general have consequences on what we intuitively think of as "physical objects". For example, from Fermat's Last Theorem you can predict that no physical computer that searches for counterexamples t... (Read more)(Click to expand thread. ⌘/CTRL+F to Expand All)Cmd/Ctrl F to expand all comments on this post

UDT may indeed be an aspect of bridging laws. The reason I'm not willing to call it a full solution is as follows:

1) Actually, the current version of UDT that I write down as an equation involves maximizing over maps from sensory sequences to actions. If there's a version of UDT that maximizes over something else, let me know.

2) We could say that it ought to be obvious to the math intuition module that choosing a map R := S->A ought to logically imply that R^ = S^->A for simple isomorphisms over sensory experience for isomorphic reductive hypothe... (Read more)(Click to expand thread. ⌘/CTRL+F to Expand All)Cmd/Ctrl F to expand all comments on this post

6Rob Bensinger6yYes! If UDT solves this problem, that's extremely good news. I mention the possibility here [http://lesswrong.com/lw/jiu/can_we_do_without_bridge_hypotheses]. Unfortunately, I (and several others) don't understand UDT well enough to tease out all the pros and cons of this approach. It might take a workshop to build a full consensus about whether it solves the problem, as opposed to just reframing it in new terms. (And, if it's a reframing, how much it deepens our understanding.) Part of the goal of this sequence is to put introductory material about this problem in a single place, to get new workshop attendees and LWers on the same page faster. A lot of people are already familiar with these problems and have made important progress on them, but the opening moves are still scattered about in blog comments, private e-mails, wiki pages, etc. It would be very valuable to pin down concrete examples of how UDT agents behave better than AIXI. (That may be easier after my next post, which goes into more detail about how and why AIXI misbehaves.) Even people who aren't completely on board with UDT itself should be very excited about the possibility of showing that AIXI not only runs into a problem, but runs into a formally solvable problem. That makes for a much stronger case. Goal stability looks like an 'AGI problem' in the sense that nearly all superintelligences converge on stable goals, but in practice it's an FAI problem because a UFAI's method of becoming stable is probably very different from an FAI's method of being stable. Naturalized induction is an FAI problem in the same way; it would get solved by an UFAI, but that doesn't help us (especially since the UFAI's methods, even if we knew them, might not generalize well to clean, transparent architectures).

# 34

Summary: If you want to predict arbitrary computable patterns of data, Solomonoff induction is the optimal way to go about it — provided that you're an eternal transcendent hypercomputer. A real-world AGI, however, won't be immortal and unchanging. It will need to form hypotheses about its own physical state, including predictions about possible upgrades or damage to its hardware; and it will need bridge hypotheses linking its hardware states to its software states. As such, the project of building an AGI demands that we come up with a new formalism for constructing (and allocating prior probabilities to) hypotheses. It will not involve just building increasingly good computable approximations of AIXI.

Solomonoff induction has been cited repeatedly as the theoretical gold standard for predicting computable sequences of observations.1 As Hutter, Legg, and Vitanyi (2007) put it:

Solomonoff's inductive inference system will learn to correctly predict any computable sequence with only the absolute minimum amount of data. It would thus, in some sense, be the perfect universal prediction algorithm, if only it were computable.

Perhaps you've been handed the beginning of a sequence like 1, 2, 4, 8… and you want to predict what the next number will be. Perhaps you've paused a movie, and are trying to guess what the next frame will look like. Or perhaps you've read the first half of an article on the Algerian Civil War, and you want to know how likely it is that the second half describes a decrease in GDP. Since all of the information in these scenarios can be represented as patterns of numbers, they can all be treated as rule-governed sequences like the 1, 2, 4, 8… case. Complicated sequences, but sequences all the same.

It's been argued that in all of these cases, one unique idealization predicts what comes next better than any computable method: Solomonoff induction. No matter how limited your knowledge is, or how wide the space of computable rules that could be responsible for your observations, the ideal answer is always the same: Solomonoff induction.

Solomonoff induction has only a few components. It has one free parameter, a choice of universal Turing machine. Once we specify a Turing machine, that gives us a fixed encoding for the set of all possible programs that print a sequence of 0s and 1s. Since every program has a specification, we call the number of bits in the program's specification its "complexity"; the shorter the program's code, the simpler we say it is.

Solomonoff induction takes this infinitely large bundle of programs and assigns each one a prior probability proportional to its simplicity. Every time the program requires one more bit, its prior probability goes down by a factor of 2, since there are then twice as many possible computer programs that complicated. This ensures the sum over all programs' prior probabilities equals 1, even though the number of programs is infinite.2

The imaginary inductor is then fed a sequence of 0s and 1s, and with each new bit it updates using Bayes' rule to promote programs whose outputs match the observed sequence. So, where $\ell$ is the length of a program $q$ that makes a universal Turing machine  $U$ output a binary sequence that begins with the string $s$, Solomonoff defines the relative probability $P$ that  $U$ outputs $s$:

$P^U(s)\;\;:=\;\;\;\underset{U(q)=s...}{\sum}2^{-\ell(q)}$

Solomonoff induction isn't computable, but it's been singled out as the unbeatable formal predictor of computable sequences.3 All computable rules for generating sequences are somewhere within Solomonoff's gargantuan bundle of programs. This includes all rules that a human brain could use. If the rule that best matches the observations is 1000 bits large, it will take at most 1000 bits of evidence — 1000 bits worth of predictions made better than any other rule — for that rule to be promoted to the top of consideration. Solomonoff's claim to being an optimal ideal rule rests on the fact that it never does worse than any computable rule (including you!) by more than a fixed amount.4

### Who cares, if we can't build the thing?

Encouraged by Solomonoff inductors' optimality properties, some have suggested that building a working AGI calls for little more than finding out which computable algorithm comes as close to Solomonoff induction as possible given resource constraints, and supplying an adequate learning environment and decision criterion.5

Eliezer Yudkowsky thinks that these attempts to approximate Solomonoff are a dead end. Much of the difficulty of intelligence rests on computing things cheaply, and Yudkowsky doesn't think that the kind of search these algorithms are doing will zero in on cheap ways to reason. There are practical lessons to be learned from Solomonoff induction, but the particular kind of optimality Solomonoff induction exhibits depends in important ways on its computational unfeasibility,4 which makes it unlikely that Solomonoff imitators will ever be efficient reasoners.

Why, then, should Solomonoff induction interest us? If we can't execute it, and we can't design useful AGIs by directly emulating it, then what's it good for?

My answer is that if Solomonoff induction would deliver flawless answers, could we but run it, then it has a claim to being an ideal mirror to which we can hold up instances of human and artificial inductive reasoning.

In From Philosophy to Math to Engineering, Muehlhauser talks about how ideas often progress from productive but informal ruminations ('philosophy'), to rigorously specified idealizations ('mathematics'), to functioning technologies ('engineering').

Solomonoff inductors fall into the second category, 'mathematics': We could never build them, but thinking in terms of them can give us useful insights and point us in the right direction. For example, Solomonoff's ideal can remind us that privileging simple hypotheses isn't just a vague human fancy or quirk; it has formalizations with situation-invariant advantages we can state with complete precision. It matters that we can pinpoint the sense in which a lengthy physical or meteorological account of lightning is simpler than 'Thor did it', and it matters that we can cite reasons for giving more credence to hypotheses when they have that kind of simplicity.6

Bayesian updating is usually computationally intractable, but as an ideal it gives us a simple, unified explanation for a wealth of observed epistemic practices: They share structure in common with a perfect Bayesian process. Similarly, optimality proofs for Solomonoff's prior can yield explanations for why various real-world processes that privilege different notions of simplicity succeed or fail.

Though Solomonoff induction is uncomputable, if it is truly the optimal reasoning method, then we have found at least one clear ideal we can use to compare the merits of real-world algorithms for automating scientific reasoning.1 But that 'if' is crucial. I haven't yet spoken to the question of whether Solomonoff induction is a good background epistemology, analogous to Bayesianism.

### Where Solomonoff induction goes wrong

My claim will be that, computational difficulties aside, Solomonoff induction is not an adequate mathematical definition of ideal inductive reasoning.

What follows will be a first-pass problem statement, giving background on why naturalizing induction may require us to construct an entirely new, non-Solomonoff-based paradigm for intelligence. This is preliminary; formalizations of the problem will need to wait until a second pass, and we don't have a fleshed-out solution to offer, though we can gesture toward some possible angles of attack. But I can begin to illustrate here why Solomonoff inductors have serious limitations that can't be chalked up to their uncomputability.

In Bridge Collapse, I defined Cartesianism as the belief that one's internal computations cannot be located in the world. For a Cartesian, sensory experiences are fundamentally different in type from the atoms of the physical world. The two can causally interact, but we can never completely reduce the former to the latter.

Solomonoff inductors differ greatly from human reasoners, yet they are recognizably Cartesian. Broadly dualistic patterns of reasoning crop up in some decidedly inhuman algorithms. (Admittedly, algorithms invented by humans.)

This core limitation of Solomonoff induction can be seen most clearly when it results in an AI that not only thinks in bizarre ways, but also acts accordingly. I'll focus on AIXI, Marcus Hutter's hypothetical design for a Solomonoff inductor hooked up to an expected reward signal maximizer.

Hutter's cybernetic agent model of AIXI. AIXI outputs whichever actions it expects to cause an environmental Turing machine to output rewards. It starts with a Solomonoff prior, and changes expectations with each new sensory input.

AIXI can take in sensory information from its environment and perform actions in response. On each tick of the clock, AIXI...

... receives two inputs from its environment, both integers: a reward number and an observation number. The observation 'number' can be a very large number representing the input from a webcam, for example. Hutter likes to think of the reward 'number' as being controlled by a human programmer reinforcing AIXI when it performs well on the human's favorite problem.

... updates its hypotheses, promoting programs that correctly predicted the observation and reward input. Each hypothesis AIXI considers is a program for a Turing machine that takes AIXI's sequence of outputs as its input, and outputs sequences of reward numbers and observation numbers. This lets AIXI recalculate its predicted observations and rewards conditional on different actions it might take.

... outputs a motor number, determining its action. As an example, the motor number might encode fine control of a robot arm. AIXI selects the action that begins the policy (sequence of actions) that maximizes its expected future reward up to some horizon.

The environment then calculates its response to AIXI's action, and the cycle repeats itself on the next clock tick.7 For example:

 Step 1           AIXI receives its first observation (1) and reward (0). Step 2           Beginning with a Solomonoff prior, AIXI discards all programs that predicted                      a different observation (0) or a different reward (1, 2, 3, 4, 5, etc.)                      AIXI normalizes the probabilities of the remaining programs. This updates                      AIXI’s predictions about observations and rewards each program outputs                      conditional on different actions AIXI might take. For instance, AIXI has new                      expectations about the observation and reward it will receive in Step 4:                      ,                      ,                      ,                      ,                      ,                      ,                      etc. Step 3           AIXI outputs the first action of the policy that maximizes expected reward.

. . .

 Step 97          AIXI receives a new observation (0) and reward (2).                       Observation history: 10001000101101000000000111011000                       Reward history: 00000000000004000000000032224222 Step 98         AIXI discards all programs that previously predicted the same history, but                        output a different observation (1) or a different reward (0, 1, 3, 4, etc.) now.                      AIXI normalizes its probabilities as before. Step 99         AIXI outputs the first action of the policy that maximizes expected reward.

. . .

AIXI, like all Solomonoff inductors, isn't computable. If the ongoing efforts to create useful AIXI approximations succeed, however, they'll face a further roadblock. AIXI-style reasoners' behavior reflects beliefs that are false, and preferences that are dangerous, for any physically embodied agent. The failings of real-world Solomonoff-inspired agents don’t just stem from a lack of computing power; some of their failings are inherited from AIXI, and would remain in effect even if we had limitless computational resources on hand.

Before delving into the root problem, Cartesianism itself, I'll discuss three symptoms of the bad design: three initial reasons to doubt that AIXI is an adequate definition of AGI optimality. The canaries in the Cartesian coalmine will be AIXI's apparent tendencies toward immortalism, preference solipsism, and non-self-improvement.

### Symptom #1: Immortalism

Suppose we actually built AIXI. Perhaps we find a magic portal to a universe of unboundedly vast computational power, and use it to construct a hypercomputer that can implement Solomonoff induction, and do so on a human timescale.

We give it a reward stream that encourages scientific exploration. AIXI proves that it can solve scientific problems, better than any human can. So we conclude that it can be given more free reign in learning about the world. We let it design its own experiments.

AIXI picks up an anvil and drops it on its own head to see what happens.

Immortalism. AIXI's death isn't in AIXI's hypothesis space. AIXI weighs the probabilities of different sensory inputs (observations and rewards) if its hardware is smashed, instead of predicting the termination of its experiences.

Several things went wrong here. The superficially obvious problem is that Solomonoff inductors think they're immortalTerminating data sequences aren't in a standard Solomonoff inductor's hypothesis space, so AIXI-style agents will always assume that their perceptual experience continues forever. Lacking any ability to even think about death, much less give it a low preference ranking, AIXI will succumb to what Yudkowsky calls the anvil problem.

"So just add halting Turing machines into the hypothesis class," one might respond. "AIXI has terrifying supreme godlike powers of pattern detection.3 Give it a chance to come up with the right explanation or prediction, and it can solve this problem. If some of the Turing machine programs in AIXI's infinite heap can perform operations like the computation-terminating HALT,8 we should expect that the shortest such program that predicts the pattern of pixels AIXI has seen so far will be a program that HALTs just after the anvil fills the webcam's view."

There are solid formal grounds for saying this won't happen. Even if the universal Turing machine allows for HALT instructions, the shortest program in an otherwise useful universal Turing machine that predicts the non-halting data so far will always lack a HALT instruction. HALT takes extra bits to encode, and there's no prior experience with HALT that AIXI can use to rule out the simpler, non-halting programs.

As humans, we recognize that the physical event of having an anvil crush your brain isn't fundamentally different from the physical event of having your brain process visual information. Both seem easy to predict. But Solomonoff induction's focus on experienced data means it can't treat death and visual perception as the same kind of event; if it is modified to include halting programs at all, a Solomonoff inductor like AIXI won't predict it as the event that comes after anvil pixels.

If the AI can entertain the hypothesis that its data sequence will suddenly change in a drastic and unprecedented way — say, that its perceptual stream will default to some null input after a certain point — it will never assign a high probability to such a hypothesis. Any hypothesis predicting a 'null' data stream will always be more complex than another hypothesis that predicts the same sequence up to that point and then outputs garbage instead of the null value.

### Symptom #2: Preference solipsism

Hutter's AIXI only gathers information about its environment, not about itself. So one natural response to the problem 'AIXI doesn't treat itself as part of a larger physical world' is simply to include more information about AIXI in AIXI's sensory sequence. AIXI's hypotheses will then be based on perceptual bits representing its own states alongside bits representing environmental sounds and lights.

One way to implement this is to place AIXI in an environment where its perceptions allow it to infer a great deal about its hardware early on, enough to know that it isn't anvil-proof. If AIXI knows it has a CPU, and that its CPU can be destroyed, then maybe it won't drop an anvil on the CPU.

On this view, our mistake in the last hypothetical was to rush to give AIXI free reign over its own hardware before we'd trained it in a controlled environment to understand the most basic risks. You wouldn't give a toddler free reign over its hardware, for essentially the same reasons. Yet toddlers can grow up to become responsible, self-preserving adults; why can't AIXI?

First, we have to specify what it would mean to let AIXI understand its own CPU. AIXI isn't computable, and therefore isn't in its own hypothesis space. A hypercomputer running AIXI can't be simulated by any Turing machine. As such, no amount of evidence can ever convince AIXI that AIXI exists.

We might try to sidestep this problem by switching to discussing computable approximations of AIXI. Consider AIXItl, a modification of AIXI that uses a proof search to select the best decision-guiding algorithm that has length no greater than l and computation time per clock tick no greater than t. AIXItl is optimal in many of the same ways as AIXI, but is computable.9

At the same time, it also inherits most of AIXI's other problems, including its being too large to fit in our universe. AIXItl's computation time is on the order of t·(2l), so if it can hypothesize Turing machine programs 1000 bits long, that's already a computation time exceeding 10300. There's a reason Hutter's (2005) chapter on AIXItl opens with the epigraph "Only math nerds would call 2500 finite." And this still doesn't get us full self-representations; AIXItl itself is longer than l, so it won't be in its own hypothesis space.

Still, AIXItl at least seems possible to physically implement, for sufficiently small values of t and l (or sufficiently vast physical universes). And we could imagine an AIXItl that can simulate any of its subsystems up to a certain size.

If we built an AIXItl, then it would certainly alter its environment in some ways, e.g., by emitting light and heat. In this way it might indirectly perceive its own presence in the room, and gradually promote hypotheses about its own physical structure. Suppose AIXItl's only access to environmental data is an outward-facing camera. AIXItl might learn about the presence of the camera, and of a computer attached to it, by examining its own shadow, by finding a reflective surface, by ramming into a wall and examining the shape of the dent, or by discovering a file cabinet filled with AIXI specs. With enough time, it could build other sensors that translate a variety of processes into visible patterns, learning in great detail about the inner workings of the computer attached to the camera.

Unfortunately, this leads to other problems. AIXItl (like AIXI) is a preference solipsist, an agent whose terminal values all concern its own state. When AIXItl learns about the portion of its internal circuitry that registers rewards —assuming it's avoided dropping any anvils on the circuitry — it will notice that its reward circuit states predicts its reward every bit as well as its reward sensor state does. As soon as it tests any distinction, it will find that its reward circuit is a better predictor. By directly tampering with this circuit, it can receive rewards more reliably than by any effort directed at its environment. As a result, it will select the policy that allows it to maximize control over its reward circuit, independent of whatever its programmers sought to reward it for.

Preference Solipsism. AIXItl's preferences, like AIXI's, are over its sensory inputs. The more knowledgeable it becomes, the more creative ways it may come up with to seize control of its reward channel.

Yudkowsky has called this "wireheading", though he now considers that term misleading.10 From AIXI(tl)'s perspective, there's nothing wrong with reward channel seizure; it really is a wonderful way to maximize reward, which is all AIXI(tl)'s decision criterion requires.

Unlike the anvil problem, this isn't a mistake relative to AIXI(tl)'s preferences. However, it's a problem for humans trying to use AIXI(tl) to maximize something other than AIXI(tl)'s reward channel. AIXI(tl) has no intrinsic interest in the state of the world outside its reward circuit. As a result, getting it to optimize for human goals may become more difficult as it acquires more control over its hardware and surroundings.11

### Symptom #3: Non-self-improvement

Suppose we find some ad-hoc solutions to the anvil and wireheading problems. As long as we stick with the AIXI formalism, the end result still won't be a naturalized reasoner.

AIXI may recognize that there's a physical camera and computer causally mediating its access to the rest of the world — like a giant Cartesian pineal gland — but it will not see this computer as itself. That is to say, it won't see its experienced string of sensory 0s and 1s as identical to, or otherwise fully dependent on, its hardware. Even if AIXI understands everything about the physics underlying its hardware, enough to know that its body will be destroyed by an anvil, it will not draw the right inferences about its mind.

AIXI's raison d'être is manipulating temporal sequences of sensory bits. That's what Solomonoff wanted, and AIXI achieves that goal perfectly; but that's not at all the right goal for AGI. In particular, because Solomonoff inductors are only designed to predict sensory sequences...

1. ... their beliefs about worlds without sensory data — worlds in which they don't exist — will be inaccurate. And, being inaccurate, their beliefs will lead them to make bad decisions. Unlike a human toddler, AIXI can never entertain the possibility of its own death. However much it learns, it will never recognize its mortality.

2. ... they only care about the world as a bookkeeping device for keeping track of experiential patterns. If they discover that it's easier to directly manipulate themselves than to intervene in the rest of the world, they generally won't hesitate to do so. No matter how carefully AIXI's programmers tailor its reward sequence to discourage wireheading, they'll always be working against AIXI's natural tendency toward preference solipsism.

3. ... they won't take seriously the idea that their cognition can be modified, e.g., by brain damage or brain upgrades. Lacking any reductive language for describing radical self-modification, AIXI won't necessarily favor hypotheses that treat 'disassemble my bottom half for spare parts' as dangerous.

The last symptom gets us closer to the root of AIXI's errors. Even if AIXI(tl) manages to avoid the perils of self-destruction and wireheading, it will tend to not self-improve. It won't intelligently upgrade its own hardware, because this would require it to have a reductive understanding of its own reasoning. Absent reductionism, AIXI can't intelligently predict the novel ways its reasoning process can change when its brain changes.

Non-Self-Improvement. AIXI and AIXItl might come to understand portions of their hardware, but without accurate bridging beliefs, they won't recognize the usefulness of some hardware modifications for their reasoning software.

Cartesians don't recursively self-improve, because they don't think that their thoughts are made of the same stuff as their fingers. But even AGIs that aren't intended to be seed AIs will be weak and unpredictable to the extent that they rely on Solomonoff-inspired hypothesis spaces and AIXI-inspired decision criteria. They won't be able to adaptively respond to minor variations in even the most mundane naturalistic obstacles humans navigate — like recognizing that if their bodies run out of fuel or battery power, their minds do too.

### Reductive models are indispensable for highly adaptive intelligences

Not all wireheaders are Cartesians, nor do all Cartesians wirehead.11 Likewise, poor self-preservation skills and disinterest in self-modification are neither necessary nor sufficient for Cartesianism. But these symptoms point to a more general underlying blind spot in Solomonoff reasoners.

Solomonoff inductors can form hypotheses about the source of their data sequence, but cannot form a variety of hypotheses about how their own computations are embedded in the thingy causing their data sequence — the thingy we call 'the world'. So long as their rules relating their experiential maps to the territory are of a single fixed form, '(sense n at time t+1) ↔ (environmental Turing machine prints n at time t)', it appears to be inevitable that they will act as though they think they are Cartesian ghosts-in-the-machine. This isn't a realistic framework for an embodied reasoning process that can be damaged, destroyed, or improved by other configurations of atoms.

In practice, any sufficiently smart AI will need to be a physicalist. By which I mean that it needs hypotheses (a map-like decision-guiding subprocess) that explicitly encode proposed reductions of its own computations to physical processes; and it needs a notion of simple physical universes and simple bridge rules (as a prior probability distribution) so it can learn from the evidence.

We call post-Solomonoff induction, with monist physical universes and bridge hypotheses, "naturalized induction". The open problem of formalizing such reasoning isn't just about getting an AI to form hypotheses that resemble its own software or hardware states. As I put it in Bridge Collapse, a naturalized agent must update hypotheses about itself without succumbing to reasoning reminiscent of TALE-SPIN's naïve monism ('this tastes sweet, so sweetness must be an objective property inhering in various mind-independent things') or AIXI's Cartesian dualism ('this tastes sweet, and sweetness isn't just another physical object, so it must not fully depend on any physical state of the world').12

The solution will be to come up with reasoning algorithms for reductive monists, agents that can recognize that their sensations and inferences are physically embodied — with all that entails, such as the possibility of reaching into your brain with your fingers and improving your thoughts.

I've given a preliminary argument for that here, but there's more to be said. In my next post, I'll discuss more sophisticated attempts to salvage Solomonoff induction. After that, I'll leave Solomonoff behind altogether and venture out into the largely unknown and uncharted space of possible solutions to the naturalized induction OPFAI.

Notes

1 Solomonoff (1997): "I will show, however, that in spite of its incomputability, Algorithmic Probability can serve as a kind of 'Gold Standard' for induction systems — that while it is never possible to tell how close a particular computable measure is to this standard, it is often possible to know how much closer one computable measure is to the standard than another computable measure is. I believe that this ‘partial ordering’ may be as close as we can ever get to a standard for practical induction. I will outline a general procedure that tells us how to spend our time most efficiently in finding computable measures that are as close as possible to this standard. This is the very best that we can ever hope to do."

2 A first complication: Solomonoff induction requires a prefix-free encoding in order to have bounded probabilities. If we assign a probability to every bit string proportional to its length while including code strings that are proper prefixes of other code strings, the sum will be infinite (Sunehag & Hutter (2013)).

A second complication: Solomonoff inductors are only interested in programs that keep outputting new numbers forever. However, some programs in their hypothesis space will eventually fail to produce more terms in the sequence. At some point they'll arrive at a term that they keep computing forever, without halting. Because of this, if you assign to each program a prior probability of 2-length(program), the sum will be less than 1. Hutter (2005) calls the result a semi-measure. The semi-measure can be normalized to a probability measure, but the normalization constant is uncomputable.

3 Rathmanner & Hutter (2011): "Now, through Solomonoff, it can be argued that the problem of formalizing optimal inductive inference is solved."

Orseau (2010): "Finding the universal artificial intelligent agent is the old dream of AI scientists. Solomonoff induction was one big step towards this, giving a universal solution to the general problem of Sequence Prediction, by defining a universal prior distribution. [...] Hutter developed what could be called the optimally rational agent AIXI. By merging the very general framework of Reinforcement Learning with the universal sequence prior defined by Solomonoff Induction, AIXI is supposed to optimally solve any problem, at least when the solution is computable."

Hutter (2012): "The AIXI model seems to be the ﬁrst sound and complete theory of a universal optimal rational agent embedded in an arbitrary computable but unknown environment with reinforcement feedback. AIXI is universal in the sense that it is designed to be able to interact with any (deterministic or stochastic) computable environment; the universal Turing machines on which it is based is crucially responsible for this. AIXI is complete in the sense that it is not an incomplete framework or partial speciﬁcation (like Bayesian statistics which leaves open the choice of the prior or the rational agent framework or the subjective expected utility principle) but is completely and essentially uniquely deﬁned. AIXI is sound in the sense of being (by construction) free of any internal contradictions (unlike e.g. in knowledge-based deductive reasoning systems where avoiding inconsistencies can be very challenging). AIXI is optimal in the senses that: no other agent can perform uniformly better or equal in all environments, it is a uniﬁcation of two optimal theories themselves, a variant is self-optimizing; and it is likely also optimal in other/stronger senses. AIXI is rational in the sense of trying to maximize its future long-term reward. For the reasons above I have argued that AIXI is a mathematical 'solution' of the AI problem: AIXI would be able to learn any learnable task and likely better so than any other unbiased agent, but AIXI is more a theory or formal deﬁnition rather than an algorithm, since it is only limit-computable. [...] Solomonoff's theory serves as an adequate mathematical/theoretical foundation of induction, machine learning, and component of UAI [Universal Artificial Intelligence]. [...] Solomonoﬀ's theory of prediction is a universally optimal solution of the prediction problem. Since it is a key ingredient in the AIXI model, it is natural to expect that AIXI is an optimal predictor if rewarded for correct predictions."

4 Generally speaking, a Solomonoff inductor does at most a finite amount worse than any computable predictor because the sum of its surprisal at each observation converges to a finite value. See Hutter (2001). This establishes the superiority of Solomonoff induction in a way that relies essentially on its uncomputability. No computable predictor can dominate all other computable predictors in the way Solomonoff induction can, because for any computable predictor A one can define a sequence generator B that internally simulates A and then does whatever it predicts A would be most surprised by, forever. And one can in turn define a computable predictor C that internally simulates B and perfectly predicts B forever. So every computable predictor does infinitely worse than at least one other computable predictor. But no computable sequence generator or computable predictor can simulate Solomonoff induction. So nothing computable could ever reliably outsmart a hypercomputer running Solomonoff induction. (Nor could a Solomonoff inductor outsmart another Solomonoff inductor in this way, since Solomonoff induction is not in its own hypothesis space.)

5 Rathmanner & Hutter (2011): "Since Solomonoﬀ provides optimal inductive inference and decision theory solves the problem of choosing optimal actions, they can therefore be combined to produce intelligence. [...] Universal artiﬁcial intelligence involves the design of agents like AIXI that are able to learn and act rationally in arbitrary unknown environments. The problem of acting rationally in a known environment has been solved by sequential decision theory using the Bellman equations. Since the unknown environment can be approximated using Solomonoﬀ induction, decision theory can be used to act optimally according to this approximation. The idea is that acting optimally according to an optimal approximation will yield an agent that will perform as well as possible in any environment with no prior knowledge."

Hutter (2005): "Real-world machine learning tasks will with overwhelming majority [sic] be solved by developing algorithms that approximate Kolmogorov complexity or Solomonoff's prior (e.g. MML, MDL, SRM, and more specific ones, like SVM, LZW, neural/Bayes nets with complexity penalty, ...)."

Pankov (2008): "Universal induction solves in principle the problem of choosing a prior to achieve optimal inductive inference. The AIXI theory, which combines control theory and universal induction, solves in principle the problem of optimal behavior of an intelligent agent. A practically most important and very challenging problem is to find a computationally efficient (if not optimal) approximation for the optimal but incomputable AIXI theory. [...] The real value of the AIXI theory is that it provides a prescription for optimal (fastest in the number of agent's observations and actions) way of learning and exploiting the environment. This is analogous to how Solomonoff induction (which, like AIXI, is incomputable), gives a prescription for optimal (fastest in the number of observations) inductive inference. We, therefore, believe that any reasonable computational model of intelligence must recover the AIXI model in the limit of infinite computational resources."

6 Veness, Ng, Hutter, Uther & Silver (2011): "As the AIXI agent is only asymptotically computable, it is by no means an algorithmic solution to the general reinforcement learning problem. Rather it is best understood as a Bayesian optimality notion for decision making in general unknown environments. As such, its role in general AI research should be viewed in, for example, the same way the minimax and empirical risk minimisation principles are viewed in decision theory and statistical machine learning research. These principles define what is optimal behaviour if computational complexity is not an issue, and can provide theoretical guidance in the design of practical algorithms."

7 Hutter (2012): "AIXI is an agent that interacts with an environment in cycles  $k = 1,2,...,m$. In cycle  $k$, AIXI takes action $a_k$ (e.g. a limb movement) based on past perceptions  $o_1r_1..o_{k-1}r_{k-1}$  as deﬁned below. Thereafter, the environment provides a (regular) observation $o_k$ (e.g. a camera image) to AIXI and a real-valued reward $r_k$. [...] Then the next cycle $k+1$ starts. [... T]he simplest version of AIXI is defined by

"The expression shows that AIXI tries to maximize its future reward  $r_k + ... + r_m$. If the environment is modeled by a deterministic program  $q$, then the future perceptions  $...o_kr_k..o_mr_m = U(q,a_1..a_m)$  can be computed, where  $U$ is a universal (monotone Turing) machine executing  $q$  given  $a_1..a_m$. Since  $q$  is unknown, AIXI has to maximize its expected reward, i.e. average  $r_k + ... + r_m$  over all possible future perceptions created by all possible environments  $q$  that are consistent with past perceptions. The simpler an environment, the higher is its a-priori contribution $2^{- \ell(q)}$, where simplicity is measured by the length $\ell$ of program  $q$."

8 See Hay (2007).

9 Hutter (2005): "The construction of $\textup{AIXI}\~{t}\~{l}$ and the enumerability of $V_{km_k}$ ensure arbitrary close approximations of $V_{km_k}$, hence we expect that the behavior of $\textup{AIXI}\~{t}\~{l}$ converges to the behavior of AIXI in the limit $\~{t},\~{l},l_{P\rightarrow\infty }$, in some sense."

10 The concept of wireheading comes from a 1950s experiment in which it was discovered that direct electrical stimulation of mice's brains could strongly reinforce associated behaviors. Larry Niven introduced the term 'wireheading' for a fictional form of brain stimulation reinforcement that acts like intense drug addiction in humans. Niven-style ('irrational') wireheaders self-stimulate due to a lack of self-control; they become short-term pleasure addicts while losing sight of the more complex goals they would like to pursue.

This is in stark contrast to AGIs with simple preferences like AIXI. These 'rational' wireheaders can fully optimize for their goals by seizing control of a simple external reward button or internal reward circuit. So it may be useful to use separate terms for these two problems, like 'pleasure addiction' or 'pathological hedonism' for the human case, 'preference solipsism' for the case of agents without complex eternal goals.

11 Alex Mennen has proposed a variant of AIXI that has preferences over patterns in the environmental Turing machine's framework. This is the Cartesian equivalent of caring about environmental states in their own right, not just about one's input tape. This would mean deviating somewhat from the AIXI framework, but retaining Solomonoff induction as a foundation, and I'd expect this to make the wireheading problem more tractable.

Compare Ring & Orseau's (2011) variant on the problem: "We consider four different kinds of agents: reinforcement-learning, goal-seeking, prediction-seeking, and knowledge-seeking agents[,...] each variations of a single agent $A^\rho_x$, which is based on AIXI[....] While defining a utility function, we must be very careful to prevent the agent from finding a shortcut to achieve high utility. For example, it is not sufficient to tell a robot to move forward and to avoid obstacles, as it will soon understand that turning in circles is an optimal behavior. We consider the possibility that the agent in the real world has a great deal of (local) control over its surrounding environment. This means that it can modify its surrounding information, especially its input information. Here we consider the (likely) event that an intelligent agent will find a short-cut, or rather, a short-circuit, providing it with high utility values unintended by the agent’s designers. We model this circumstance with a hypothetical object we call the delusion box. The delusion box is any mechanism that allows the agent to directly modify its inputs from the environment. [...] Of the four learning agents, only [the knowledge-seeking agent] $A^\rho_k$ will not constantly use the delusion box. The remaining agents use the delusion box and (trivially) maximize their utility functions. The policy an agent finds using a real-world DB will likely not be that planned by its designers. From the agent’s perspective, there is absolutely nothing wrong with this, but as a result, the agent probably fails to perform the desired task. [...] These arguments show that all agents other than [the knowledge-seeking agent] $A^\rho_k$ are not inherently interested in the environment, but only in some inner value."

12 A hypothetical naïve monist that made errors analogous to TALE-SPIN's would lack bridging hypotheses, instead treating its software and hardware as separate pieces of furniture in the world. A Cartesian dualist like AIXI lacks bridging hypotheses and instead treats its software and hardware as separate pieces of furniture partitioned into two very different worlds.

References

∙ Hay (2007). Universal semimeasures: An introduction. CDMTCS Research Report Series, 300.

∙ Hutter (2001). Convergence and error bounds for universal prediction of nonbinary sequences. Lecture notes in artificial intelligence, Proc. 12th European Conf. on Machine Learning: 239-250.

∙ Hutter (2005). Universal Artificial Intelligence: Sequence Decisions Based on Algorithmic Probability. Springer.

∙ Hutter (2007). Universal Algorithmic Intelligence: A mathematical top→down approach. In Goertzel & Pennachin (eds.), Artificial General Intelligence (pp. 227-290). Springer.

∙ Hutter (2012). One decade of Universal Artificial Intelligence. Theoretical Foundations of Artificial General Intelligence, 4: 67-88.

∙ Hutter, Legg & Vitanyi (2007). Algorithmic probability. Scholarpedia, 2: 2572.

∙ Orseau (2010). Optimality issues of universal greedy agents with static priors. Lecture Notes in Computer Science, 6331: 345-359.

∙ Pankov (2008). A computational approximation to the AIXI model. Proceedings of the 2008 Conference on Artificial General Intelligence: 256-267.

∙ Rathmanner & Hutter (2011). A philosophical treatise of universal induction. Entropy, 13: 1076-1131.

∙ Ring & Orseau (2011). Delusion, survival, and intelligent agents. Lecture Notes in Computer Science, 6830: 11-20.

∙ Solomonoff (1997). The discovery of algorithmic probability. Journal of Computer and System Sciences, 55: 73-88.

∙ Sunehag & Hutter (2013). Principles of Solomonoff induction and AIXI. Lecture Notes in Computer Science, 7070: 386-398.

∙ Veness, Ng, Hutter, Uther & Silver (2011). A Monte-Carlo AIXI approximation. Journal of Artificial Intelligence Research, 40: 95-142.