Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Reflective AIXI and Anthropics

4interstice

1Diffractor

1Diffractor

1interstice

1interstice

1Charlie Steiner

1interstice

1Charlie Steiner

1interstice

1Charlie Steiner

1interstice

1Cole Wyeth

1Charlie Steiner

1Diffractor

New Comment

I think the framework of RO-AIXI can be modified pretty simply to include memory-tampering.

Here's how you do it. Say you have an environment E and an RO-AIXI A running in it. You have run the AIXI for a number of steps, and it has a history of observations O. Now we want to alter its memory to have a history of observations O'. This can be implemented in the environment as follows:

1. Create a new AIXI A', with the same reward function as the original and no memories. Feed it the sequence of observations O'.

2. Run A' in place of A for the remainder of E. In the course of this execution, A' will accumulate total reward R. Terminate A'.

3. Give the original AIXI reward R, then terminate it.

This basically captures what it means for AIXI's memory to be erased. Two AIXI's are only differentiated from each other by their observations and reward function, so creating a new AIXI which shares a reward function with the original is equivalent to changing the first AIXI's observations. The new AIXI, A', will also be able to reason about the possibility that it was produced by such a 'memory-tampering program', as this is just another possible RO-Turing machine. In other words it will be able to reason about the possibility that its memory has been altered.

[EDITED: My original comment falsely stated that AIXI-RO avoids dutch-booking, but I no longer think it does. I've edited my reasoning below]

As applied to the Sleeping Beauty problem from the paper, I think this WILL be dutch-booked. If we assume it takes one bit to specify heads/tails, and one to specify which day one wakes on, then the agent will have probabilities

1/2 Heads,

1/4 Tails, wake on Monday

1/4 Tails, wake on Tuesday

Since memory-erasure has the effect of creating a new AIXI with no memories, the betting scenario(in section 3.2) of the paper has the structure of either a single AIXI choosing to take a bet, or two copies of the same AIXI playing a two-person game. RO-AIXI plays Nash equilibria in such scenarios. Say the AIXI has taken bet 9. From the perspective of the current AIXI, let p be the probability that it takes bet 10, and let q be the probability that its clone takes bet 10.

E[u] = 1/2 * ( (-15 + 2eps) + p (10 + eps)) + 1/2 * ((15 + eps) + p*q *(-20 + 2eps) + p(1 - q)(-10 + eps) + q(1 - p) * (-10 + eps))

= 3/2 eps + 1/2 * (p * 2 * eps + q(-10 + eps))

This has the structure of a prisoner's dilemma. In particular, the expected utility of the current AIXI is maximized at p = 1. So both AIXI's will take the bet and incur a sure loss. On the other hand, for this reason the original AIXI A would not take the bet 9 on Sunday, if given the choice.

Not quite. If taking bet 9 is a prerequisite to taking bet 10, then AIXI won't take bet 9, but if bet 10 gets offered whether or not bet 9 is accepted, then AIXI will be like "ah, future me will take the bet, and wind up with 10+ in the heads world and -20+2 in the tails world. This is just a given. I'll take this +15/-15 bet as it has net positive expected value, and the loss in the heads world is more than counterbalanced by the reduction in the magnitude of loss for the tails world"

Something else feels slightly off, but I can't quite pinpoint it at this point. Still, I guess this solves my question as originally stated, so I'll PM you for payout. Well done!

(btw, you can highlight a string of text and hit crtl+4 to turn it into math-mode)

I figured out what feels slightly off about this solution. For events like "I have a long memory and accidentally dropped a magnet on it", it intuitively feels like describing your spot in the environment and the rules of your environment is much lower K-complexity than finding a turing machine/environment that starts by giving you the exact (long) scrambled sequence of memories that you have, and then resumes normal operating.

Although this also feels like something nearby is actually desired behavior. If you rewrite the tape to be describing some other simple environment, you would intuitively expect the AIXI to act as if it's in the simple environment for a brief time before gaining enough information to conclude that things have changed and rederive the new rules of where it is.

Well, it COULD be the case that the K-complexity of the memory-erased AIXI environment is lower, even when it learns that this happened. The reason for this is that there could be many possible past AIXI's who have their memory erased/altered and end up in the same subjective situation. Then the memory-erasure hypothesis can use the lowest K-complexity AIXI who ends up with these memories. As the AIXI learns more it can gradually piece together which of the potential past AIXI's it actually was and the K-complexity will go back up again.

EDIT: Oh, I see you were talking about actually having a RANDOM memory in the sense of a random sequence of 1s and 0s. Yeah, but this is no different than AIXI thinking that any random process is high K-complexity. In general, and discounting merging, the memory-altering subroutine will increase the complexity of the environment by a constant plus the complexity of whatever transformation you want to apply to the memories.

Incidentally, you can use the same idea to have RO-AIXI do anthropic reasoning/bargaining about observers that are in a broader reference class than 'exact same sense data', by making the mapping O -> O' some sort of coarse-graining.

I don't think AIXI needs any special sauce to understand memory wiping. There is a turing machine that writes the memory-wiped contents to tape all in one pass. It's just going to be a simulation of the universe with a slightly more complicated "bridging law." Different programs that write the right thing to the tape are on equal footing, no matter whether they describe different universes or different parts of the same universe.

So we might expect an assignment more like P(HM)=0.49, P(TM)=0.49, P(TT)=0.02 (EDIT: fixed) (not sure whether we ahould expect it to be harder to describe the normal Tails branch). And then AIXI will act on this using whatever action it predicts will get the greatest signal in the reward channel.

" P(HM)=0.49, P(TM)=0.49, P(TT)=0.2 " -- Are these supposed to be mutually exclusive probabilities?

" There is a turing machine that writes the memory-wiped contents to tape all in one pass. " - Yes, this is basically what I said. ('environment' above could include 'the world' + bridging laws). But you also need to alter the reward structure a bit to make it match our usual intuition of what 'memory-wiping' means, and this has significance for decision theory.

Consider, if your own memory was erased, you would probably still be concerned about what was going to happen to you later. But a regular AIXI won't care about what happens to its memory-wiped clone(i.e. another AIXI inducting on the 'memory-wiped' stream), because they don't share an input channel. So to fix this you give the original AIXI all of the rewards that its clone ends up getting.

Oops, that should have been 0.02 :)

Good point about caring for yourself even if you expect to lose the memory of (e.g.) the current hour. AIXI only cares about the tapes that are successors of the current one. Maybe expand the tape from write-only to also have some erasing operations?

I think there are probably some other toy problems that illustrate this issue a lot better than Sleeping Beauty, where AIXI equating memory loss with death might not actually change its decisions much in the bet.

I still don't see how you're getting those probabilities. Say it takes 1 bit to describe the outcome of the coin toss, and assume it's easy to find all the copies of yourself(ie your memories) in different worlds. Then you need:

1 bit to specify if the coin landed heads or tails

If the coin landed tails, you need 1 more bit to specify if it's Monday or Tuesday.

So AIXI would give these scenarios P(HM)=0.50, P(TM)=0.25, P(TT)=0.25.

I'm not thinking of it like specifying parts of the toy problem. I'm thinking of it as if for each of HM, TM, and TT, the observer is about to recieve 2 bits that describe which situation they're in, and the only object that matters for the probability of each is the shortest program that reproduces all past observations plus the next 2 bits.

If we assume Sleeping Beauty has lots of information, we might expect that the shortest matching program will look like a simulation of physical law, plus a "bridging law" that, given this simulation, tells you what symbols get written to the tape. It is in this context that is seems like HM and TM are equally complex - you're simulating the same chunk of universe and have what seems like (naively) a similar bridging law. It's only for tuesday that you obviously need a different program to reproduce the data.

If we assume Sleeping Beauty has lots of information, we might expect that the shortest matching program will look like a simulation of physical law plus a "bridging law" that, given this simulation, tells you what symbols get written to the tape

I agree. I still think that the probabilities would be closer to 1/2, 1/4, 1/4. The bridging law could look like this: search over the universe for compact encodings of my memories so far, then see what is written next onto this encoding. In this case, it would take no more bits to specify waking up on Tuesday, because the memories are identical, in the same format, and just slightly later temporally.

In a naturalized setting, it seems like the tricky part would be getting the AIXI on Monday to care what happens after it goes to sleep. It 'knows' that it's going to lose consciousness(it can see that its current memory encoding is going to be overwritten) so its next prediction is undetermined by its world-model. There is one program that will give it the reward of its successor then terminates, as I described above, but it's not clear why the AIXI would favour that hypothesis. Maybe if it has been in situations involving memory-wiping before, or has observed other RO-AIXI's in such situations.

I don't find these intuitive arguments reliable. In particular, I doubt it is meaningful to say that reflective oracle AIXI takes the complexity of its own counterfactual actions into account when weighing decisions. This is not how its prior works or interacts with its action choice. I don't fully understand your intuition, and perhaps you're discussing how it reasons about other agents in the environment, but this is much more complicated than you imply, and probably depends on the choice of reflective oracle. (I am doing a PhD related to this).

Hm. I recently wrote a post where I said this idea was probably not original - do you remember a source for this instance of it?

I actually hadn't read that post or seen the idea anywhere before writing this up. It's a pretty natural resolution, so I'd be unsurprised if it was independently discovered before. Sorry about being unable to assist.

The extra penalty to describe where you are in the universe corresponds to requiring sense data to pin down *which* star you are near, out of the many stars, even if you know the laws of physics, so it seems to recover desired behavior.

It's possible to define a version of Solomonoff Induction with Reflective Oracles, that allows an AIXI-like agent to consider hypotheses that include itself or other equally powerful agents, going partway towards addressing naturalized induction issues.

So then a natural question is "what does this partial answer seem to point to for anthropics?"

To figure this out, we'll be going over a few of the thought experiments in Bostrom's book about anthropic reasoning, and seeing what Reflective-Oracle AIXI has to say about them.

The following conclusions are

verydependent on how many extra bits it takes to encode "same environment, but I'm that other agent over there", so I'll be making a lot of assumptions that I can't prove, such as the most efficient way of encoding an environment being to specify an environment, and then specifying a place in there that the agent interfaces with. This seems unavoidable so far, so I'll at least make an effort to list out all the implicit assumptions that go into setting up the problems.As a quick refresher, SSA (self-selection assumption) and SIA (self-indication assumption) work as follows: SSA takes the probability of a world as given and evenly distributes probability mass to being everything in "your reference class" in that particular world. SIA reweights the probability of a world by the number of instances of "things in your reference class" that it contains. In short, SIA has a strong bias in favor of possible worlds/hypotheses/turing machines with many instances of you, while SSA doesn't care about how many instances of you are present in a possible world.

Thought Experiment 1: IncubatorThis will be modeled as a machine that represents the environment, that has a bit that is used to determine how the coinflip comes up. Also, in the second case, because there are two possible places where the agent can be hooked up to the environment, another bit is required to specify where the agent is "attached" to the environment. These three cases have minimum description lengths of |E|+1, |E|+2, and |E|+2 bits respectively (where |E| is the description length of the environment), so by the universal semimeasure, they have (relative) probability mass of 50%, 25% and 25% respectively.

So, assuming the problem setup actually works this way, the answers are 50% and 67%, respectively. This seems to point towards Reflective-Oracle Solomonoff Induction (RO-SI) doing something like SSA. The intuitive reason why, is because a hypothesis with a bunch of copies of you requires a bunch of extra bits to specify

whichcopy of you the input data stream is coming from, and this cancels out with the increased number of hypotheses where you are in the well-populated world. There may be 250 copies of you in a "world", but because it requires 50 bits to specify "I'm that copy right there", each specific hypothesis/Turing machine of the form "I'm in that world and am also that particular copy" requires 50 extra bits to specify where in the environment the data is being read out from, and receives a probability penalty of 2−50, which, when multiplied by the large number of hypotheses of that form, recovers normality.There are two ways where things get more interesting. One is that, for environments with many observers in your reference class (RO-SI uses as its reference-class all spots in the environment that receive the exact same observation string as is present in its memory), you'll assign much higher probability to being one of the (fairly few) observers for which specifying their spot in the environment is low K-complexity. It

definitelyisn't a uniform distribution over observers in the possible world, it favors observers that are lower-complexity to specify where in the environment they are. A similar effect occurs in logical induction, where there tend to be peaks of trading activity of simple traders, on low-K-complexity days. Sam's term for this was "Graham's crackpot", that there could be a simple trader with a lot of initial mass that just bides its time until some distant low-K-complexity day and screws up the probabilities then (it can't do so infinitely often, though)The other point of interest is what this does on the standard counterexamples to SSA.

To begin with, the Doomsday argument is valid for SSA. This doesn't seem like much of a limitation in practice, because RO-SI uses a

very restrictivereference class that in most practical cases includes just the agent itself, and also, because RO-SI is about as powerful as possible when it comes to updating on data, the starting prior wouldvery very quicklybe washed out by a maximally-detailed inside view on the probability of extinction using all data that has been acquired so far.Thought Experiment 2: Adam and EveHere's where the situation gets nifty.

Assume the environment is as follows: There's the coding of the Turing machine that represents the environment (|E| bits), the 1 bit that represents "fertile or not", and the bitstring/extra data that specifies where Eve is in the environment. (|L| bits, L for "location"). Eve has been wandering around the Garden of Eden for a bit, and since she's a hyper-powerful inductor, she's accumulated enough information to rule out all the other hypotheses that say she's actually not in the Garden of Eden. So it's down to two hypotheses that are both encoded by |E|+|L|+1 bits, which get equal probability. If we assume a utility function that's like "+1 reward for sex, -10 reward for creating billions of suffering beings" (if it was −1010 for an Eve that wasn't scope-insensitive, the serpent's reasoning would fail), the expected utility of sex is 0.5⋅1+0.5⋅−9=−4, and Eve ignores the serpent.

The specific place that the serpent's reasoning breaks down is assuming that the probability of being Eve/difficulty of specifying Eve's place in the universe goes down/up when a decision is made that results in the world having a lot more beings in it. It doesn't work that way.

However, it gets more interesting if you assume everyone in the resulting created world has sense data such that even a hyper-powerful inductor doesn't know whether or not they are Eve before the fateful decision.

Also, assume that it takes |L′| bits to specify any particular person's location if they're not Eve. This is a sort of "equally distributed probability" assumption on the future people, that doesn't restrict things that much. Maybe it's much easier to point to Eve than some other person, maybe it's the other way around.

Also assume that everyone's utility functions are like "+1 for sex, -10 for finding out shortly after sex that you are one of the suffering future beings, or that you created billions of such."

To begin with the analysis, break the hypothesis space into:

two worlds of |E|+|L|+1 bits where Eve is fertile/infertile, and you are Eve.

and 1010 worlds of (it depends) bits where Eve was fertile, sex was had, and you are not Eve. The reason why it's tricky to say what the description-length of being one of the future agents is, is because it takes fewer bits to encode a world where an agent does a thing in accordance with the laws of math, than it takes to encode a world where an agent does a different thing that they wouldn't have normally done. In this particular case, it would take |S| bits (S for surgery) to specify "at this particular spot, ignore what Eve would have done and instead substitute in the action "have sex", and then run things normally".

So, if Eve definitely has sex, it takes |E|+|L′|+1 bits to specify one of the future agents. If Eve definitely doesn't have sex, it takes |E|+|L′|+|S|+1 bits to specify one of the future agents.

Taking these two cases, we can rescale things to get a mass of 1, 1, and either 1010⋅2|L|−|L′| or 1010⋅2|L|−|L′|−|S|, on the three classes of worlds, respectively. Expected utility calculations will work out the same way if we use these numbers instead of probabilities that add up to 1, because it's just a scaling on expected utility and the scaling can be moved over the utility function, which is invariant under scale-and-shift. So then, in the first case, expected utility of sex and not-sex becomes:

1⋅1+1⋅−10+(1010⋅2|L|−|L′|)⋅−9=−9⋅(1010⋅2|L|−|L′|+1)

(1010⋅2|L|−|L′|)⋅−10

So sex will be had if 1010⋅2|L|−|L′|>9 . The crossover point occurs approximately at a 30 bit penalty to specify a non-Eve person (and 2−30 is approximately 1/billion.) So, if Eve has sex, and assigns less than about a 1/10 chance to being Eve, it's a consistent state of affairs. The reasoning is "I'm probably not Eve, and so I'm probably already going to suffer (since I know in advance what my decision is in this case), might as well pick up that +1 utility"

Redoing this analysis for the case where Eve doesn't have sex, we get that sex will be had if 1010⋅2|L|−|L′|−|S|>9 , and in this case, the crossover point occurs approximately at a 30 bit penalty to specify

boththe non-Eve person and that particular decision intervention. (there can also be consistent solutions where the reflective oracle is perched right on the decision threshold, and randomizes accordingly, but I'll ignore those for the time being, they don't change much)Considering the specific case where the ratios of the probability masses for "I'm Eve" and "I'm not Eve" is less than 1:9 (in the sex case) and 1:9⋅2−|S| (in the non-sex case), we get a case where the decision made depends on the choice of reflective oracle! If the reflective oracle picks sex, sex is the best decision (by the reasoning "I'm probably not Eve, might as well pick up the +1 utility"). If the reflective oracle picks not-sex, not-sex is the best decision (by the reasoning "I'm likely enough to be Eve (because the non-Eve people live in a lower-probability universe where an intervention on Eve's action happened), that I won't chance it with the coinflip on fertility")

So, RO-AIXI doesn't exactly

fail(as SSA is alleged to) in this case, because there's a flaw in the Serpent's reasoning where the difficulty of specifying where you are in the universedoesn't changewhen you make a decision that creates a bunch of other agents, and you don't think you could be those other agents you're creating.But if there's a case where the other agents are subjectively indistinguishable from yourself, and it's bad for you to create them, but good for them to push the "create" button, there are multiple fixed-points of reasoning that are of the form "I probably press the button, I'm probably a clone, best to press the button" and "I probably don't press the button, I'm probably not a clone, best to not press the button".

Another interesting angle on this is that the choice of action has a side-effect of altering the complexity of specifying various universes in the first place, and the decision rule of RO-AIXI doesn't take this side-effect into account, it only cares about causal consequences of taking a particular action.

The arguments of

Lazy Adam,Eve's Card Trick, andUN++in Bostrom's book fail to apply to RO-AIXI by a similar line of reasoning.Sleeping Beauty, SSA, and CDT:There's a possible future issue where, according to this paper, it's possible to money-pump the combination of SSA and CDT (which RO-AIXI uses), in the Sleeping Beauty experiment. Looking further at this is hindered by the fact that RO-AIXI implicitly presumes that the agent has access to the entire string of past observations that it made, so it doesn't interact cleanly with any sort of problem that involves amnesia or memory-tampering. I haven't yet figured out a way around this, so I'm putting up a 500-dollar bounty on an analysis that manages to cram the framework of RO-AIXI into problems that involve amnesia or memory tampering (as a preliminary step to figure out whether the combination of SSA-like behavior and CDT gets RO-AIXI into trouble by the argument in the aforementioned paper).

Takeaways:RO-AIXI seems to act according to SSA probabilities, although there are several interesting features of it. The first is that it assigns much more probability to embeddings of the agent in the environment that are low K-complexity, it definitely doesn't assign equal probability to all of them. The second interesting feature is that the reference class that it uses is "spots in the environment that can be interpreted as receiving my exact string of inputs", the most restrictive one possible. This opens the door to weird embeddings like "The etchings on that rock, when put through this complicated function, map onto my own sense data", but those sorts of things are rather complex to specify, so they have fairly low probability mass. The third interesting feature is that the probability of being a specific agent in the world doesn't change when you make a decision that produces a bunch of extra agents, which defuses the usual objections to SSA. The final interesting feature is that making a particular decision can affect the complexity of specifying various environments, and the standard decision procedure doesn't take this effect into account, permitting multiple fixed-points of behavior.

Also I don't know how this interacts with dutch-books on Sleeping Beauty because it's hard to say what RO-AIXI does in cases with amnesia or memory-tampering, and I'd really like to know and am willing to pay 500 dollars for an answer to that.