# All of Anja's Comments + Replies

Save the princess: A tale of AIXI and utility functions

Super hard to say without further specification of the approximation method used for the physical implementation.

Save the princess: A tale of AIXI and utility functions

So I would only consider the formulation in terms of semimeasures to be satisfactory if the semimeasures are specific enough that the correct semimeasure plus the observation sequence is enough information to determine everything that's happening in the environment.

Can you make an example of a situation in which that would not be the case? I think the semimeasure AIXI and deterministic programs AIXI are pretty much equivalent, am I overlooking something here?

If we're going to allow infinite episodic utilities, we'll need some way of comparing how big

0AlexMennen9yThe nice thing about using programs is that a program not only determines what your observation will be, but also the entire program state at each time. That way you can care about, for instance, the head of the Turing machine printing 0s to a region of the tape that you can't see (making assumptions about how the UTM is implemented). I'm not sure how semimeasures are usually talked about in this context; if it's something like a deterministic program plus a noisy observation channel, then there's no problem, but if a semimeasure doesn't tell you what the program state history is, or doesn't even mention a program state, then a utility function defined over semimeasures doesn't give you a way to care about the program state history (aka events in the environment). I don't understand. If all the series we care about converge, then why would we need to be able to compare convergent series? That might end up being fairly limited. Academian points out [http://lesswrong.com/lw/244/vnm_expected_utility_theory_uses_abuses_and/#cont] that if you define a "weak preference" of X over Y as a preference such that X is preferred over Y but there exist outcomes Z and W such that for all probabilities p>0, pZ + (1-p)Y is preferred over pW + (1-p)X, and a "strong preference" as a preference that is not a weak preference, then strong preferences are archimedian by construction, so by the VNM utility theorem, a real-valued utility function describes your strong preferences even if you omit the archimedian axiom (i.e. u(X) > u(Y) means a strong preference for X over Y, and u(X) = u(Y) means either indifference or a weak preference one way or the other). Exact ties between utilities of different outcomes should be rare, and resolving them correctly is infinitely less important than resolving strong preferences correctly. The problem with this that I just thought of is that conceivably there could be no strong preferences (i.e. for any preference, there is some other preference that
Save the princess: A tale of AIXI and utility functions

I think you are proposing to have some hypotheses privileged in the beginning of Solomonoff induction, but not too much because the uncertainty helps fight wireheading by means of providing knowledge about the existence of an idealized, "true" utility function and world model. I that a correct summary? (Just trying to test whether I understand what you mean.)

In particular they can make positive use of wire-heading to reprogram themselves even if the basic architecture M doesn't allow it

Can you explain this more?

0Squark9yI made some improvements to the formalism, see http://lesswrong.com/lw/cze/reply_to_holden_on_tool_ai/8fjb [http://lesswrong.com/lw/cze/reply_to_holden_on_tool_ai/8fjb] There I consider a stochastic model M and here a non-deterministic model, but the same principle can be applied here. Namely, we consider a Solomonoff process starting t0 time before formation of agent A, conditioned by observance of M's rules in the time before A's formation and by A's existence at time of its formation. The expected utility is computed with respect to the resulting distribution
1Squark9yYes, I think you got it more or less right. For p=0 we would just get a version of Legg-Hutter (AIXI) with limited computing resources (but duality problem preserved). For p > 0, no hypothesis is completely ruled out and the agent should be able to find the correct hypothesis given sufficient evidence, in particular it should be able to correct her assumptions regarding how her own mind works. Of course this requires the correct hypothesis to be sufficiently aligned with M's architecture for the agent to work at all. The utility function is actually built in from the starters, however if we like we can choose it to be something like a sum of external input bits with decaying weights (in order to ensure convergence), which would be in the spirit of the Legg-Hutter "reinforcement learning" approach. In particular the agent can discover that the true "physics" allow for reprogramming the agent, even though the initially assumed architecture M didn't allow it. In this case she can use it to reprogram herself for her own benefit. To draw a parallel, a human can perform brain surgery on herself because of her acquired knowledge about the physics of the universe and her brain and in principle she can use it to change the functioning of her brain in ways that are incompatible with her "intuitive" initial assumptions about her own mind
Interpersonal and intrapersonal utility comparisons

They just do interpersonal comparisons; lots of their ideas generalize to intrapersonal comparisons though.

Interpersonal and intrapersonal utility comparisons

I recommend the book "Fair Division and Collective Welfare" by H. J. Moulin, it discusses some of these problems and several related others.

1AlexMennen9yThat looks like it only discusses interpersonal utility comparisons. I don't see anything about intrapersonal utility comparison in the book description.
A utility-maximizing varient of AIXI

you forgot to multiply by 2^-l(q)

I think then you would count that twice, wouldn't you? Because my original formula already contains the Solomonoff probability...

1AlexMennen9yOh right. But you still want the probability weighting to be inside the sum, so you would actually need =\frac{1}{\xi\left(\dot{y}\dot{x}_{%3Ck}y\underline{x}_{k:m_{k}}\right)}\sum_{q:q(y_{1:m_k})=x_{1:m_k}}%20U(q,y_{1:m_k})2%5E{-\ell\left(q\right)}%0A)
A utility-maximizing varient of AIXI

Let's stick with delusion boxes for now, because assuming that we can read off from the environment whether the agent has wireheaded breaks dualism. So even if we specify utility directly over environments, we still need to master the task of specifying which action/environment combinations contain delusion boxes to evaluate them correctly. It is still the same problem, just phrased differently.

0AlexMennen9yIf I understand you correctly, that sounds like a fairly straightforward problem for AIXI to solve. Some programs q_1 will mimic some other program q_2's communication with the agent while doing something else in the background, but AIXI considers the possibilities of both q_1 and q_2.
A utility-maximizing varient of AIXI

I think there is something off with the formulas that use policies: If you already choose the policy

$p:p\(x\_\{=y_{%3Ck}y_k)

then you cannot choose an y_k in the argmax.

Also for the Solomonoff prior you must sum over all programs

$q:q\(y\_\{1:m\_k\}\$=x_{1:m_k}) .

Could you maybe expand on the proof of Lemma 1 a little bit? I am not sure I get what you mean yet.

0AlexMennen9yThe argmax comes before choosing a policy. In , there is already a value for y_k before you consider all the policies such that p(x_<k) = y_<k y_k. Didn't I do that? Look at any finite observation sequence. There exists some action you could output in response to that sequence that would allow you to get arbitrarily close to the supremum expected utility with suitable responses to the other finite observation sequences (for instance, you could get within 1/2 of the supremum). Now look at another finite observation sequence. There exists some action you could output in response to that, without changing your response to the previous finite observation sequence, such that you can get arbitrarily close to the supremum (within 1/4). Look at a third finite observation sequence. There exists some action you could output in response to that, without changing your responses to the previous 2, that would allow you to get within 1/8 of the supremum. And keep going in some fashion that will eventually consider every finite observation sequence. At each step n, you will be able to specify a policy that gets you within 2^-n of the supremum, and these policies converge to the policy that the agent actually implements. I hope that helps. If you still don't know what I mean, could you describe where you're stuck?
A utility-maximizing varient of AIXI

I like how you specify utility directly over programs, it describes very neatly how someone who sat down and wrote a utility function

$U\(\\.{y}\\.{x}\_{)

would do it: First determine how the observation could have been computed by the environment and then evaluate that situation. This is a special case of the framework I wrote down in the cited article; you can always set

$U\(\\.{y}\\.{x}\_{=\sum_{q:q(y_{1:m_k})=x_{1:m_k}}%20U(q,y_{1:m_k}))

This solves wireheading only if we can specify which environments contain wireheaded (non-dualistic) agents, delusion boxes, etc..

1AlexMennen9yTrue, the U(program, action sequence) framework can be implemented within the U(action/observation sequence) framework, although you forgot to multiply by 2^-l(q) when describing how. I also don't really like the finite look-ahead (until m_k) method, since it is dynamically inconsistent. Not sure what you mean by that.

You are a wirehead if you consider your true utility function to be genetic fitness.

-1timtyler9yNot according to most existing usage of the term.
4DanArmak9yWhat makes a utility function "true"? If I choose to literally wirehead - implant electrodes - I can sign a statement saying I consider my "true" utility function to be optimized by wireheading. Does that mean I'm not wireheading in your sense?
-1brazil849yWell what else could it be? :)

To what extent does our response to Nozick's Experience Machine Argument typically reflect status quo bias rather than a desire to connect with ultimate reality?

I think the argument that people don't really want to stay in touch with reality but rather want to stay in touch with their past makes a lot of sense. After all we construct our model of reality from our past experiences. One could argue that this is another example of a substitute measure, used to save computational resources: Instead of caring about reality we care about our memories making sense and being meaningful.

On the other hand I assume I wasn't the only one mentally applauding Neo for swallowing the red pill.

What would happen if we set an algorithm inside the AGI assigning negative infinite utility to any action which modifies its own utility function and said algorithm itself?

There are several problems with this approach: First of all how do you specify all actions that modify the utility function? How likely do you think it is that you can exhaustively specify all sequences of actions that lead to modification of the utility function in a practical implementation? Experience with cryptography has taught us, that there is almost always some side channel at... (read more)

You might be right. I thought about this too, but it seemed people on LW had already categorized the experience machine as wireheading. If we rebrand, we should maybe say "self-delusion" instead of "pornography problem"; I really like the term "utility counterfeiting" though and the example about counterfeit money in your essay.

4davidpearce9y"Utility counterfeiting" is a memorable term; but I wonder if we need a duller, less loaded expression to avoid prejudging the issue? After all, neuropathic pain isn't any less bad because it doesn't play any signalling role for the organism. Indeed, in some ways neuropathic pain is worse. We can't sensibly call it counterfeit or inauthentic. So why is bliss that doesn't serve any signalling function any less good or authentic? Provocatively expressed, evolution has been driven by the creation of ever more sophisticated counterfeit utilities that tend to promote the inclusive fitness of our genes. Thus e.g. wealth, power, status, maximum access to seemingly intrinsically sexy women of prime reproductive potential (etc) can seem inherently valuable to us. Therefore we want the real thing. This is an unsettling perspective because we like to think we value e.g. our friends for who they are rather than their capacity to trigger subjectively valuable endogenous opioid release in our CNS. But a mechanistic explanation might suggest otherwise.
3timtyler9yBill Hibbard apparently endorses using the wirehead terminology to refer to utility counterfeiting via sense data manipulation here [http://arxiv.org/ftp/arxiv/papers/1111/1111.3934.pdf]. However, after looking at my proposal, I think it is fairly clear that the "wireheading" term should be reserved for the "simpleton gambit" of Ring and Orseau. I don't think my proposal represented a "rebranding". I do think you really have to invoke pornography or masturbation to describe the issue. I think "delusion" is the wrong word. A delusion [http://en.wikipedia.org/wiki/Delusion] is a belief held with conviction - despite evidence to the contrary. Masturbation or pornography do not require delusions.

The word "value" seems unnecessarily value-laden here.

Changed it to "number".

You are correct in pointing out that for human agents the evaluation procedure is not a deliberate calculation of expected utility, but some messy computation we have little access to. In many instances this can however be reasonably well translated into the framework of (partial) utility functions, especially if our preferences approximately satisfy transitivity, continuity and independence.

For noticing discrepancies between true and substitute utility it is not necessary to exactly know both functions, it suffices to have an icky feeling that tells you t... (read more)

Universal agents and utility functions

There is also a more detailed paper by Lattimore and Hutter (2011) on discounting and time consistency that is interesting in that context.

-1mytyde9yThis is a very interesting paper. Reminds me of HIGHLANDER for some reason... those guys lived for thousands of years and weren't even rich? They hadn't usurped control of vast econo-political empires? No hundred-generations-long family of bodyguards?
Universal agents and utility functions

I am starting to see what you mean. Let's stick with utility functions over histories of length m_k (whole sequences) like you proposed and denote them with a capital U to distinguish them from the prefix utilities. I think your Agent 4 runs into the following problem: modeled_action(n,m) actually depends on the actions and observations yx_{k:m-1} and needs to be calculated for each combination, so y_m is actually

$y\_m\(\\\.\{y\}\\\.\{x\}\_\{)

which clutters up the notation so much that I don't want to write it down anymore.

We also get into trouble with taking the expectation, the ob... (read more)

2AlexMennen9yYes. Oops, you are right. The sum should have been over x_{k:n}, not just over x_k. Yes, that is a cleaner and actually correct version what I was trying to describe. Thanks.
Universal agents and utility functions

I second the general sentiment that it would be good for an agent to have these traits, but if I follow your equations I end up with Agent 2.

3AlexMennen9yNo, you don't. If you tried to represent Agent 2 in that notation, you would get modeled_action(n, k) = argmax(y_k) sum(x_k) [u_k(yx_k. You were using u_k to represent the utility of the last step of its input, so that total utility is the sum of the utilities of its prefixes, while I was using u_k to represent the utility of the whole sequence. If I adapt Agent 4 to your use of u_k, I get modeled_action(n, k) = argmax(y_k) sum(x_k) [u_k(yx_k.
Universal agents and utility functions

First, replace the action-perception sequence with an action-perception-utility sequence u1,y1,x1,u2,y2,x2,etc.

This seems unnecessary. The information u_i is already contained in x_i.

modeled_action(n, k) = argmax(y_k) uk(yx\<k, yx_k:n)*M(uyx_<k, uyx_k:n)

This completely breaks the expectimax principle. I assume you actually mean something like $\\textrm\{modeled\\\_action\}\(n,k\$=\textrm{arg}\max_{y_k}\sum_{x_k}u_k(\.{y}\.{x}_{%3Ck}y\underline{x}_{k:n})M(\.{y}\.{x}_{%3Ck}y\underline{x}_{k:n}))

which is just Agent 2 in disguise.

0AlexMennen9yOops. Yes, that's what I meant. But it is not the same as Agent 2, because this (Agent 4?) uses its current utility function to evaluate the desirability of future observations and actions, even though it knows that it will use a different utility function to choose between them later. For example, Agent 4 will not take the Simpleton's Gambit because it cares about its current utility function getting satisfied in the future, not about its future utility function getting satisfied in the future. Agent 4 can be seen as a set of agents, one for each possible utility function, that are using game theory with each other.
Universal agents and utility functions

This generalizes to the horizon problem: If at time k you only look ahead to time step m_k but have unlimited life span you will make infinitely large mistakes.

Universal agents and utility functions

I would assume that it is not smart enough to forsee its own future actions and therefore dynamically inconsistent. The original AIXI does not allow for the agent to be part of the environment. If we tried to relax the dualism then your question depends strongly on the approximation to AIXI we would use to make it computable. If this approximation can be scaled down in a way such that it is still a good estimator for the agent's future actions, then maybe an environment containing a scaled down, more abstract AIXI model will, after a lot of observations, become one of the consistent programs with lowest complexity. Maybe. That is about the only way I can imagine right now that we would not run into this problem.

0Manfred9yThanks, that helps.
Universal agents and utility functions

I am pretty sure that Agent 2 will wirehead on the Simpleton Gambit, depending heavily on the number of time cycles to follow, the comparative advantage that can be gained from wireheading and the negative utility the current utility function assigns to the change.

Agent 1 will have trouble modeling how its decision to change its utility function now will influence its own decisions later, as described in AIXI and existential despair. So basically the two futures look very similar to the agent except that for the part where the screen says something differe... (read more)

1Manfred9yAh, right, that abstraction thing. I'm still fairly confused by it. Maybe a simple game will help see what's going on. The simple game can be something like a two-step choice. At time T1, the agent can send either A or B. Then at time T2, the agent can send A or B again, but its utility function might have changed in between. For the original utility function, our payoff matrix looks like AA: 10, AB: -1, BA: 0, BB: 1. So if the utility function didn't change, the agent would just send A at time T1 and A at time T2, and get a reward of 10. But suppose in between T1 and T2, a program predictably changes the agent's payoff matrix, as stored in memory, to AA: -1, AB: 10, BA: 0, BB: 1. Now if the agent sent A at time T1, it will send B at time T2, to claim the new payoff for AB of 10 units. Even though AB is lowest on the preference ordering of the agent at T1. So if our agent is clever, it sends B at time T1 rather than A, knowing that the future program will also pick B, leading to an outcome (BB, for a reward of 1) that the agent at T1 prefers to AB. So, is our AIXI Agent 1 clever enough to do that?
1timtyler9yBe warned that that post made practically no sense - and surely isn't a good reference.
Universal agents and utility functions

I am quite sure that pareto optimality is untouched by the proposed changes, but I haven't written down a proof yet.

2012 Less Wrong Census/Survey

Took the survey. Does the god question include simulators? I answered under the assumption that it did not.

5tgb9yI, for one, answered assuming that it does include simulators. I do not know what ontologically basic mental events are and didn't bother to look it up.

I assumed the same, based on the definition of "god" as "supernatural" and the definition of "supernatural" as "involving ontologically basic mental entities."

(Oh, and for anyone who hasn't read the relevant post, the survey is quoting this.)

8gwern9yI'm pretty sure it doesn't. At least, if it does I have no idea what the 'ontologically basic mental events' qualifiers were about...