Save the princess: A tale of AIXI and utility functions

[-]pengvado13y70

IIUC this addresses the ontology problem in AIXI by assuming that the domain of our utility function already covers every possible computable ontology, so that whichever one turns out to be correct, we already know what to do with it. If I take the AIXI formalism to be a literal description of the universe (i.e. not just dualism, but also that the AI is running on a hypercomputer, the environment is running on a turing computer, and the utility function cares only about the environment, not the AI's internals), then I think the proposal works.

But under the more reasonable assumption of an environment at least as computationally complex as the AI itself (whether AIXI in an uncomputable world or AIXI-tl in an exptime world or whatever), Solomonoff Induction still beats any computable sensory prediction, but the proposed method of dealing with utility functions fails: The Solomonoff mixture doesn't contain a simulation of the true state of the environment, so the utility function is never evaluated with the true state of the environment as input. It's evaluated on various approximations of the environment, but those approximations are selected so as to correctly predict sensory inputs, not to correctly predict utilities. If the proposal was supposed to work for arbitrary utility functions, then there's no reason to expect those two approximation problems to be at all related even before Goodhart's law comes into the picture; or if you intended to assume some smoothness-under-approximation constraint on the utility function, then that's exactly the part of the problem that you handwaved.

[-]Alex_Altair13y30

Possible model of semi-dualism:

The agent could have in its map that it is a computer box subject to physical laws, but only above the level where information processing occurs. That is, it could know that its memory was in a RAM stick, which was subject to material breaking and melting, but not have any predictions about inspecting individual bits of that stick. It could know that that box was itself by TDT-type analysis. Unfortunately this model isn't enough for hardware improvements; it would know that adding RAM was theoretically possible, but it wouldn't know the information-theoretic implications of how its memory protocols would react to added RAM.

Now that I think about it, that's kind of what humans do. Except, to get to the point where we can learn that we are a thing that can be damaged, we need parents and pain-mechanisms to keep us safe.

[-]Squark13y20

I believe I have a workable solution for the duality problem, which is essentially a special case of the Orseau-Ring framework, viewed slightly differently. Consider a specific computer architecture M, equipped with an input channel for receiving inputs ostensibly from the environment (although the environment doesn't appear explicitly in the formalism) and possibly special instructions for self-reprogramming (although the latter is semi-redundant as will become clear in the following). This architecture has a state space Sigma (typically M is a universal Turing machine so Sigma is countable but it also possible to consider models with finite RAM in which case M is a finite state automaton), with some state transitions s1 -> s2 being "legitimate" while other not (note that s1 doesn't determine s2 uniquely since the input from the environment can be arbitrary). Consider also U a utility function defined on arbitrary (possibly "illegitimate") infinite histories of M i.e. functions N -> Sigma. Then an "agent" is simply an initial state of M: s in Sigma regarded as a "program". The intelligence of s is defined to be its expected utility assuming the dynamics of M to be described by a certain stochastic process. If we stop here, without specifying this stochastic process, we get more or less an equivalent formulation of the Orseau-Ring framework. By analogy with Legg-Hutter it is natural to assume this stochastic process is governed by the Solomonoff semi-measure. But if we do this we probably won't be able to get any meaningful near-optimal agents since we need to write a program without knowing how the computer works. My suggestion is deforming the Solomonoff semi-measure by assigning weight 0 < p < 1 to state transitions s1 -> s2 which are illegal in terms of M. This should make the near-optimal agents sophisticated since p < 1 so they can rely to some extent on our computer architecture M. On the other hand p > 0 so these agents have to take possible wire-heading into account. In particular they can make positive use of wire-heading to reprogram themselves even if the basic architecture M doesn't allow it, assuming of course they are placed in a universe in which it is possible.

[-]Anja13y00

I think you are proposing to have some hypotheses privileged in the beginning of Solomonoff induction, but not too much because the uncertainty helps fight wireheading by means of providing knowledge about the existence of an idealized, "true" utility function and world model. I that a correct summary? (Just trying to test whether I understand what you mean.)

In particular they can make positive use of wire-heading to reprogram themselves even if the basic architecture M doesn't allow it

Can you explain this more?

[-]Squark13y10

Yes, I think you got it more or less right. For p=0 we would just get a version of Legg-Hutter (AIXI) with limited computing resources (but duality problem preserved). For p > 0, no hypothesis is completely ruled out and the agent should be able to find the correct hypothesis given sufficient evidence, in particular it should be able to correct her assumptions regarding how her own mind works. Of course this requires the correct hypothesis to be sufficiently aligned with M's architecture for the agent to work at all. The utility function is actually built in from the starters, however if we like we can choose it to be something like a sum of external input bits with decaying weights (in order to ensure convergence), which would be in the spirit of the Legg-Hutter "reinforcement learning" approach.

In particular the agent can discover that the true "physics" allow for reprogramming the agent, even though the initially assumed architecture M didn't allow it. In this case she can use it to reprogram herself for her own benefit. To draw a parallel, a human can perform brain surgery on herself because of her acquired knowledge about the physics of the universe and her brain and in principle she can use it to change the functioning of her brain in ways that are incompatible with her "intuitive" initial assumptions about her own mind

[-]Squark13y00

I made some improvements to the formalism, see http://lesswrong.com/lw/cze/reply_to_holden_on_tool_ai/8fjb

There I consider a stochastic model M and here a non-deterministic model, but the same principle can be applied here. Namely, we consider a Solomonoff process starting t0 time before formation of agent A, conditioned by observance of M's rules in the time before A's formation and by A's existence at time of its formation. The expected utility is computed with respect to the resulting distribution

[-]AlexMennen13y10

Great post!

But I'm not quite sure how I feel about the reformulation in terms of semimeasures instead of deterministic programs. Part of my motivation for the environment-specific utility was to avoid privileging observations over unobservable events in the environment in terms of what the agent can care about. So I would only consider the formulation in terms of semimeasures to be satisfactory if the semimeasures are specific enough that the correct semimeasure plus the observation sequence is enough information to determine everything that's happening in the environment.

I really dislike the discounting approach, because it doesn't respect the given utility function and makes the agent miss out on potentially infinite amounts of utility.

If we're going to allow infinite episodic utilities, we'll need some way of comparing how big different nonconvergent series are. At that point, the utility calculation will not look like an infinite sum in the conventional sense. In a sense, I agree that discounting is inelegant because it treats different time steps differently, but on the other hand, I think asymmetry between different time steps is somewhat inevitable. For instance, presumably we want the agent to value getting a reward of 1 on even steps and 0 on odd steps higher than getting a reward of 1 on steps that are divisible by three and 0 on non-multiples of three. But those reward sequences can be placed in 1-to-1 correspondence. This fact causes me to grudgingly give up on time-symmetric utility functions.

one has to be careful to only use the computable subset of the set of all infinite strings $ao_{1:\infty}$

Why?

[-]Anja13y00

So I would only consider the formulation in terms of semimeasures to be satisfactory if the semimeasures are specific enough that the correct semimeasure plus the observation sequence is enough information to determine everything that's happening in the environment.

Can you make an example of a situation in which that would not be the case? I think the semimeasure AIXI and deterministic programs AIXI are pretty much equivalent, am I overlooking something here?

If we're going to allow infinite episodic utilities, we'll need some way of comparing how big different nonconvergent series are.

I think we need that even without infinite episodic utilities. I still think there might be possibilities involving surreal numbers, but I haven't found the time yet to develop this idea further.

Why?

Because otherwise we definitely end up with an unenumerable utility function and every approximation will be blind between infinitely many futures with infinitely large utility differences, I think. The set of all binary strings of infinite length is uncountable and how would we feed that into an enumerable/computable function? Your approach avoids that via the use of policies p and q that are by definition computable.

[-]AlexMennen13y00

Can you make an example of a situation in which that would not be the case? I think the semimeasure AIXI and deterministic programs AIXI are pretty much equivalent, am I overlooking something here?

The nice thing about using programs is that a program not only determines what your observation will be, but also the entire program state at each time. That way you can care about, for instance, the head of the Turing machine printing 0s to a region of the tape that you can't see (making assumptions about how the UTM is implemented). I'm not sure how semimeasures are usually talked about in this context; if it's something like a deterministic program plus a noisy observation channel, then there's no problem, but if a semimeasure doesn't tell you what the program state history is, or doesn't even mention a program state, then a utility function defined over semimeasures doesn't give you a way to care about the program state history (aka events in the environment).

I think we need that even without infinite episodic utilities.

I don't understand. If all the series we care about converge, then why would we need to be able to compare convergent series?

I still think there might be possibilities involving surreal numbers, but I haven't found the time yet to develop this idea further.

That might end up being fairly limited. Academian points out that if you define a "weak preference" of X over Y as a preference such that X is preferred over Y but there exist outcomes Z and W such that for all probabilities p>0, pZ + (1-p)Y is preferred over pW + (1-p)X, and a "strong preference" as a preference that is not a weak preference, then strong preferences are archimedian by construction, so by the VNM utility theorem, a real-valued utility function describes your strong preferences even if you omit the archimedian axiom (i.e. u(X) > u(Y) means a strong preference for X over Y, and u(X) = u(Y) means either indifference or a weak preference one way or the other). Exact ties between utilities of different outcomes should be rare, and resolving them correctly is infinitely less important than resolving strong preferences correctly. The problem with this that I just thought of is that conceivably there could be no strong preferences (i.e. for any preference, there is some other preference that is infinitely stronger). I suspect that this would cause something to go horribly wrong, but I don't have a proof of that.

Because otherwise we definitely end up with an unenumerable utility function and every approximation will be blind between infinitely many futures with infinitely large utility differences, I think. The set of all binary strings of infinite length is uncountable and how would we feed that into an enumerable/computable function? Your approach avoids that via the use of policies p and q that are by definition computable.

I'm not sure if this fits the definition of enumerable, but the utility function could accept a (possibly uncomputable) stream of actions. The utility function could be set up such that for any desired degree of precision, it only has to look at a finite number of elements. (I think that's equivalent to the continuity assumption I made in my original post.)

[-]Manfred13y00

If it's specified to have a physical implementation, I think infinite-computation AIXI could actually get around dualism by predicting the behavior of its own physical implementation. That is, it computes outcomes as if the output channel (or similar complete output-determiner) is manipulated magically at the next time step, but it computes them using a causal model that has the the physical implementation still working into the future.

So it wouldn't drop a rock on its head, because even though it thinks it could send the command magically, it can correctly predict the subsequent, causal behavior of the output channel, i.e. silence after getting smooshed by a rock.

This behavior actually does require a non-limit infinity, since the amount of simulation required always grows faster than the simulating power. But I think you can pull it off with approximation schemes - certainly it works in the case of replacing exact simulation of future self-behavior with heuristics like "always silent if rock dropped on head" :D