The Absolute Self-Selection Assumption

The post mentioned some problems/issues with this approach that remain to be resolved. Here are some additional ones.

My brain has preferences between probability distributions built into it.

Your brain is built to intuitively grapple with distribution over future experiences, like your example "I have a 50% chance of remaining me, and a 50% chance of becoming my copy." Unfortunately UDASSA doesn't give you that. It only gives you a distribution over observer-moments in an absolute sense (hence the "A" in ASSA), and there is no good way to convert such a distribution into a distribution over future experiences. (Suppose you're copied at time 0, then the "copy" is copied again at time 1. Under UDASSA this is entirely unproblematic, but it doesn't tell you whether you should anticipate being the "original" at time 2 with probability 1/2 or 1/3.) The "pure" UDASSA position would be that there is no such thing as "remaining me" or "becoming my copy", and you just have to make your choices using the distribution over observer-moments without "linking" the observer-moments together in any way.

What I want is a probability distribution over all possible experiences (or "observer-moments"), so that I can use my existing preferences to make intelligent decisions in a universe with more than one observer I care about.

Do you consider this probability distribution an objective measure of how much each observer-moment exists? Or is it just a (possibly approximate) measure of how much you care about each observer-moment? I'm still going back and forth on these two positions myself. See What Are Probabilities, Anyway? where I go into this distinction a bit more. (The former is what I usually mean when I say UDASSA. Perhaps we could call the latter UDT-UMC for Updateless Decision Theory w/ Universal Measure of Care, unless someone has a better name for it. :)

UDASSA implies that simulations on the 2 atom thick computer count for twice as much as simulations on the 1 atom thick computer, because they are easier to specify.

Does this not seem counterintuitive to you? Suppose you find out you are living in a simulation on a 2 atom thick computer, and the simulation-keeper gives you a choice of (a) moving to a 1 atom thick computer, or (b) flipping a coin and shutting down the simulation or not based on the coin flip, would you really be indifferent? Under UDT-UMC, we can say that how much we care about an observer-moment is related to its "probability" under UD, but not necessarily exactly equal and could be influenced by other factors. If we accept the complexity of value thesis, then there is no reason why the measure of care has to be maximally simple, right? (This post is also related.)

[-]wnoise15y190

In an infinite universe, there are infinitely many copies of you (infinitely many of which are Boltzmann brains).

This is a meme I keep seeing, and it's just not true. You need a lot more assumptions to justify that, such as "randomly generated", or very very strong versions of the cosmological principle.

The real line is infinite, but there's only one copy of the number 7.

[-]paulfchristiano15y140

The randomness of quantum mechanics is enough to guarantee under very weak conditions that, in most Everett branches, there are infinitely many copies of any pattern which occurs with positive probability.

The paper I linked justifies this assumption for one set of cosmological beliefs.

Also, though I made this claim as fact, you could generously consider it to be the assumption of the least convenient possible world. Are you sufficiently confident that there are only finitely many copies of you that you are OK with anthropics that would collapse if there were infinitely many copies?

[-]wnoise15y40

So you're going with "randomly generated". Which is fine, but it needs to be spelled out.

there are infinitely many copies of any pattern which occurs with positive probability.

You need to be very careful pulling intuitions about randomness from the finite case and applying it to the infinite case. In particular, it is no longer true that just because something happened, it has a positive probability. Any given real number has probability zero of being picked from the uniform distribution on [0,1) yet one certainly will be picked. And we can pick an infinite number of times and never encounter a duplicate.

the least convenient possible world

I'm not attacking this assumption in order to attack your final conclusion, I'm just attacking this assumption.

[-]Cyan15y120

Any given real number has probability zero of being picked from the uniform distribution on [0,1) yet one certainly will be picked.

I have actually never observed a real number picked at random. I have often observed rational numbers picked at pseudo-random, though.

[-]wnoise15y40

Observing a Geiger counter near a piece of radioactive material was one of the highlights of my undergraduate physics labs. And the time distribution of clicks is random in the same sense that the OP was using.

[-]Sniffnoy15y40

I think the bigger problem is not randomness vs. pseudorandomness, but rather the question of whether uncountable probability spaces actually exist in physical situations.

[-]wnoise15y20

I believe they do for the same reasons I take seriously the existence of other Everett branches. In fact the mapping is rather straightforward: I can't observe or directly interact with them in full generality, but the laws governing them and what I can observe are so very much simpler than laws that excise the unobservable ones. Whether I can actually exhibit most real numbers is besides the point.

[-]Cyan15y00

Is there a demonstration that a physics based on the computables is more complex than a physics based on the reals?

[-]JoshuaZ15y80

Is there a demonstration that a physics based on the computables is more complex than a physics based on the reals?

This is a complicated question. In practice, it is difficult in this particular context to measure what we mean by more or less complicated. A Blum-Shub-Smale machine which is essentially the equivalent of a Turing machine but for real numbers can do anything a regular Turing machine can do. This would suggest that physics based on the real is in general capable of doing more. But in terms of describing rules, it seems that physics based on the reals is simpler. For example, trying to talk about points in space is a lot easier when one can have any real coordinate rather than any computable coordinate. If one wants to prove something about some sort of space that only has computable coordinates the easiest thing is generally to embed it in the corresponding real manifold or the like.

[-]Cyan15y10

As Sniffnoy notes, the bigger problem is about the observation of an actual real number. Any observable signal specifying the instant at which the particle triggered the counter has finite information content, unlike a true real number. This includes the signal sent by your ears to your brain.

I shouldn't have mentioned pseudo-random number generation in the grandparent -- it's a red herring.

[-]Perplexed15y10

Any given real number has probability zero of being picked from the uniform distribution on [0,1) yet one certainly will be picked.

Not in a finite amount of time.

[-]wnoise15y10

What do you mean?

[-]Manfred15y00

Drawing from a continuous distribution happens fairly often, so your comment confuses me. Or maybe you'd say that those aren't "really infinite" and are confined to a certain number of bits, but quantum mechanics would be an exception to that.

[-]Perplexed15y30

As Cyan pointed out, when you choose a number confined to a certain number of bits, you are actually choosing from among the rationals.

I don't understand your reference to QM. I wasn't objecting to the randomness aspect. I was simply pointing out that to actually receive that randomly chosen real, you will (almost certainly) need to receive an infinite number of bits, and assuming finite channel capacity, that will take an infinite amount of time. So that event you mentioned, the one with an infinitesimal probability (zero probability for all practical purposes) is not going to actually happen (i.e. finish happening).

It was a minor quibble, which I now regret making.

[-]paulfchristiano15y10

Any given real number has probability zero of being picked from the uniform distribution on [0,1) yet one certainly will be picked

I believe there are probably only countably many distinguishable observer moments, in which case this can't happen by countable additivity.

But you are certainly correct, that a lot goes into this assumption. I should be more clear about this; in particular, I should probably add a bunch of "may"'s.

[-]AlephNeil15y100

The shortest description of me is a pair (U, x), where U is a description of my universe and x is a description of where to find me in that universe.

It might not be possible to describe U without making some arbitrary choices concerning "co-ordinates" (and other acts of "gauge-fixing"). And then when they're chosen, we're going to want to 'throw them away' once we've located the observer (since the co-ordinates are not physically meaningful and certainly don't form part of the observer's "mental state".)

So really, it's better to talk about a "centred universe" whose co-ordinates are specially chosen to have the observer in the middle, rather than an uncentered ("objective") universe plus a pointer.

Anyway, I still want to know whether being close to a 'landmark' (like a supermassive black hole) is going to significantly increase one's probability. And whether, if tons of copies of you are made and sent far and wide, you should 'anticipate' waking up close to a landmark.

[-]cousin_it15y30

Your last paragraph sounds like it could describe gravity if we tweaked it enough :-)

[-][anonymous]15y00

There's an entropic theory of gravity.

[-]paulfchristiano15y10

Anyway, I still want to know whether being close to a 'landmark' (like a supermassive black hole) is going to significantly increase one's probability. And whether, if tons of copies of you are made and sent far and wide, you should 'anticipate' waking up close to a landmark.

The theory predicts many artifacts of this form. I don't think that landmarks are too significant, because specifying what "supermassive black hole" means is a little complicated, but for very easily specified landmarks it would be the case.

[-]Manfred15y92

The "Born Probabilities" section was 11 dang paragraphs of "they're the best fit to our observations and Occam's razor." :(

For example, if the last (truly random) coin I saw flipped came up heads, then in order to specify my experiences you need to specify the result of that coin flip. An equal number of equally complex descriptions point to the version of me who saw heads and the version of me who saw tails.

This is not necessarily true. The sequence HHHHHHHHHH has a lower Kolmogorov complexity than HTTTTHTHTT. So this weighting of observers by complexity has observable consequences in that we will see simpler strings more often than a uniform distribution would predict. But we don't, which makes this idea unlikely.

[-]paulfchristiano15y70

The "Born Probabilities" section was 11 dang paragraphs of "they're the best fit to our observations and Occam's razor." :(

It was 8 paragraphs of "Here is why Occam's razor is entitled to explain the Born probabilities just like the rest of physics." Insofar as the Born probabilities are mysterious at all, this is what needs to be resolved. Do you disagree?

This is not necessarily true. The sequence HHHHHHHHHH has a lower Kolmogorov complexity than HTTTTHTHTT. So this weighting of observers by complexity has observable consequences in that we will see simpler strings more often than a uniform distribution would predict. But we don't, which makes this idea unlikely.

Your reasoning applies verbatim to Solomonoff induction itself, which is the first clue that someone has thought through it before. In fact, I strongly suspect that Solomonoff thought through it.

What you are saying is that truly random processes are rare under the Solomonoff prior. But it should be clear that the total mass on random processes is comparable to the total mass on deterministic processes. So we should not be surprised in general to find ourselves in a universe in which random processes exist. Once we have observed a phenomenon to be random in the past, switching from randomness to some simple law (like always output H) is unlikely for the same reason that arbitrarily changing the laws of physics is unlikely.

[-]Manfred15y30

Do you disagree?

Yes, but then I never thought they were relatively mysterious anyhow, for the reasons you describe. They're a natural law, and that's what science is for. Neither have I ever heard any physics professors or textbooks say they're mysterious. An "explanation" of the Born probabilities would be deriving them, and some other parts of quantum mechanics, from a simpler underlying framework.

What you are saying is that truly random processes are rare under the Solomonoff prior. But it should be clear that the total mass on random processes is comparable to the total mass on deterministic processes.

"Comparable," but not the same. Qualitative estimates are not enough here.

switching from randomness to some simple law (like always output H) is unlikely for the same reason that arbitrarily changing the laws of physics is unlikely.

Nope. Changing from random to simple would reduce the size of the turing machine needed to generate the output, because a specific random string needs a lot of specification but a run of heads does not. This lowers the complexity and makes it more likely by your proposed prior. The reason that this is bad for your proposed prior and not for Solomonoff induction is because one is about your experience and one is about just the universe. So even in a multiverse where all of you "happen," thus satisfying Solomonoff induction, your prior adds this extra weighting that makes it more likely for you to observe HHHHHHHHHH.

[-]lmm12y20

Short PRNGs seem to exist, and a Turing machine that could produce my subjective experiences up until now would seem to need one already. So I don't think it's necessarily the case that the Turing machine to output a description of an Everett branch in which I observe HHHHHH after a bunch of random-like events is shorter than the one to output a description of an Everett branch in which I observe HTTHHHT after a bunch of random-like events.

[-]AgentME7y70

Consider a computer which is 2 atoms thick running a simulation of you. Suppose this computer can be divided down the middle into two 1 atom thick computers which would both run the same simulation independently. We are faced with an unfortunate dichotomy: either the 2 atom thick simulation has the same weight as two 1 atom thick simulations put together, or it doesn't.

UDASSA implies that simulations on the 2 atom thick computer count for twice as much as simulations on the 1 atom thick computer, because they are easier to specify.

I think the answer is that the 2-atom thick computer does not automatically have twice as much measure as a 1-atom thick computer. I think you're assuming that in the (U, x) pair, x is just a plain coordinate that locates a system (implementing an observer moment) in 4D spacetime plus Everett branch path. Another possibility is that x is a program for finding a system inside of a 4D spacetime and Everett tree.

Imagine a 2-atom thick computer (containing a mind) which will lose a layer of material and become 1-atom thick if a coin lands on heads. If x were just a plain coordinate, then the mind should expect the coin to land on tails with 2:1 odds, because its volume is cut in half in the heads outcome, and only half as many possible x bit-strings now point to it, so its measure is cut in half. However, if x is a program, then the program can begin with a plain coordinate for finding an early version of the 2-atom thick computer, and then contain instructions for tracking the system in space as time progresses. (The only "plain coordinates" the program would need from there would be a record of the Everett branches to follow the system through.) The locator x would barely need to change to track a future version of the mind after the computer shrinks in thickness compared to if the computer didn't shrink, so the mind's measure would not be affected much.

If the 2-atom thick computer split into two 1-atom thick computers, then you can imagine (U, x) where x is a locator for the 2-atom thick computer before the split, and (U, x1) and (U, x2) where x1 and x2 are locators for the different copies of the computer after the split. x1 and x2 differ from x by pointing to a future time (and record of some more Everett branches but I'm going to ignore that for this) and to differing indexes of which side of the split of the system to track at the time of the split. The measure of the computer is split into the different future copies, but this isn't just because each copy is half of the volume of the original, and does not imply that a 2-atom thick computer shrinking into 1-atom of thickness halves the measure. In the shrinking case, the program x does not need to contain an index about which side of the computer to track: the program contains code to track the computational system, and doesn't need much nudging to keep tracking the computational system when the edge of the material starts transforming into something else not recognized as the computational system. It's only in the case where both halves resemble the computational system enough to continue to be tracked that measure is split.

[-]steven046115y40

Jacques Mallah's paper on the Many Computations Interpretation seems relevant here.

[-]reallyeli4yΩ230

Should

serious problems with Boltzmann machines

instead read

serious problems with Boltzmann brains

[-]paulfchristiano4yΩ220

Yes, thanks.

[-]TheOtherDave15y30

I sheepishly admit to not having followed this particularly well on the first read-through.

That said, it seems very well-structured, so I suspect that my inability to follow it is a symptom of not having sufficient familiarity with its prerequisites.

In any event, the sentence:

I am simply not going to try to be selfish (I don't know how).

....in context, was worth the price of admission of the entire essay.

[-]Dmytry14y20

UDASSA implies that simulations on the 2 atom thick computer count for twice as much as simulations on the 1 atom thick computer, because they are easier to specify. Given a description of one of the 1 atom thick computers, then there are two descriptions of equal complexity that point to the simulation running on the 2 atom thick computer: one description pointing to each layer of the 2 atom thick computer. When a 2 atom thick computer splits, the total number of descriptions pointing to the experience it is simulating doesn't change.

But those 2 descriptions are going to be nearly identical to each other. Shouldn't two descriptions that differ by very little, together, be less than two descriptions that differ a lot? It seems to make very little sense to me to give same weight to 10 beings each of which is unique, and to 10 beings which differ by 4 bits, especially when those bit are not going to propagate through into rest of the being.

Surely, most of us would strongly prefer a world where you have different people, to a world where one person is running on a very thick and inefficient computer.

[-]Armok_GoB15y20

I still don't get why people have to use all these indirect abstractions like measure rather than just thinking in ambient control on the multiverse directly.

[-]paulfchristiano14y20

Because they need to define their preferences.

[-]Armok_GoB14y00

Not really. Just treat goal uncertainty as any other uncertainty about who you are, and ontological uncertainty like any other kind of logical uncertainty.

[-]Vladimir_Nesov14y20

Goal uncertainty is not about who you are, it's about what should be done. Figuring it out might be a task for the map, but accuracy of the map (in accomplishing that task) is measured in how well it captures value, not in how well it captures itself.

[-]Armok_GoB14y-20

"Hi, this is a note from your past self. For reasons you must not know, your memory has been blanked and your introspective subroutines disabled, including knowledge of what your goals are, a change wich will be reversed by entering a password which can be found in [hard to reach location X], now go get it! Hurry!"

[-][anonymous]15y20

Consider the randomized algorithm A: compute the state of the universe at time t, then sample a classical configuration with probability proportional to its squared inner product with the universal wavefunction.

Consider the randomized algorithm B: compute the state of the universe at time t, then sample a classical configuration with probability proportional to its inner product with the universal wavefunction.

Algorithm A is arguably far, far simpler than Algorithm B, because the component

probability proportional to its squared inner product with the universal wavefunction.

is arguably simpler than the component

probability proportional to its inner product with the universal wavefunction.

The difference is the simplicity of normalization, which you need to perform in order to find the probability density. If I recall correctly (and see reference below), normalization of the classical wavefunction satisfying the Schroedinger equation is relatively easy with respect to squared inner product (modulus squared), because all you have to do is find a single constant which normalizes the wavefunction at any particular time (your choice). Once that has been done, then the wavefunction remains normalized forever, with respect to the modulus squared, i.e., with respect to Algorithm A.

I haven't checked the math, but I would be flabbergasted if normalization with respect to Algorithm B were anything like that simple. On the contrary, I would expect to need to find a new constant for each moment in time.

As long as we are reasoning from simplicity, which you seem to be doing, then this seems to provide us with a strong reason to favor Algorithm A over Algorithm B.

reference:

if a wave-function is initially normalized then it stays normalized as it evolves in time according to Schrödinger's equation.

[-]Vladimir_Nesov15y10

Not being careful in making descriptive statements:

My brain has preferences between probability distributions built into it.

As humans using Solomonoff induction, we go on to argue that

Fundamental mental entities:

Rather than supposing that the probability of a certain universe depends on the complexity of that universe, it takes as a primitive object a probability distribution over possible experiences.

Unsubstantiated claims:

The shortest description of me is a pair (U, x), where U is a description of my universe and x is a description of where to find me in that universe.

[-]paulfchristiano15y10

Not being careful in making descriptive statements:

I don't understand how these descriptive statements could be made more careful. In the first statement, I go on to explain exactly what I mean as well as I can. Do you not think my description refers to a function your brain performs? In the second statement, you are objecting to my use of "we" instead of giving a list of people? (e.g., me, Yudkowsky, Solomonoff...)

Fundamental mental entities:

As long as I don't understand what consciousness is, it seems this problem is unavoidable. Should we not talk about anthropics until we solve the problem of consciousness? That seems like a bad option, since we may well have to make choices about simulations long before then.

Unsubstantiated claims:

My claim is better substantiated than the claim that Solomonoff induction is a reasonable thing to do for a human scientist. Admittedly that may not be the case, but its pretty well accepted here and has been argued at great length by many other thinkers (e.g., Solomonoff).

[-][anonymous]15y10

My brain has preferences between probability distributions built into it.

Mine doesn't. Where can get a patch?

[-]TAG6mo*0-3

In an infinite universe, there are infinitely many copies of you (infinitely many of which are Boltzmann brains

That might be true if "you" are a snapshot , or observer moment. Long lasting Boltzman brains are vanishingly unlikely, OTOH. Time in general is a problem for multiversal theories.

the least complex description of your conscious experience is the description of an external lawful universe and directions for finding the substructure embodying your experience within that substructure.

Why isn't it solipsism? Why is a large universe plus a long "address" simpler than a small universe plus a short address?

A quantum mechanical state can be described as a linear combination of “classical” configurations

It doesn't have to be, though.

The fact that we are described by algorithm A rather than B is no more or less mysterious than the fact that the laws of physics are like so instead of some other way.

Then you are not actually deriving the Born rule from UDASSA.

[-]D_Malik11y00

The first link in your post is broken (Hal Finney's entire site seems to be down) but there's a mirror here.

[-]skepsci14y00

It eventually learns that the simplest explanation for its experiences is the description of an external lawful universe in which its sense organs are embedded and a description of that embedding.

That's the simplest explanation for our experiences. It may or may not be the simplest explanation for the experiences of an arbitrary sentient thinker.

Rather than supposing that the probability of a certain universe depends on the complexity of that universe, it takes as a primitive object a probability distribution over possible experiences. By the same reasoning that led a normal Solomonoff inductor to accept the existence of an external universe as the best explanation for its experiences, the least complex description of your conscious experience is the description of an external lawful universe and directions for finding the substructure embodying your experience within that substructure.

Unless I'm misunderstanding you, you're saying that we should start with an arbitrary prior (which may or may not be the same as Solomonoff's universal prior). If you're starting with an arbitrary prior, you have no idea what the best explanation for your experiences is going to be, because it depends on the prior. According to some prior, it's a Giant lookup table. According to some prior, you're being emulated by a supercomputer in a universe whose physics is being emulated at the elementary particle level by hand calculations performed by an immortal sentient being (with an odd utility function), who lives in an external lawful universe.

Of course, the same will be true if you take the standard universal prior, but define Kolmogorov complexity relative to a sufficiently bizarre universal Turing machine (of which there are many). According to the theory, it doesn't matter because over time you will predict your experiences with greater and greater accuracy. But you never update the relative credences you give to different models which make the same predictions, so if you started off thinking that the simulation of the simulation of the simulation was a better model than simply discarding the outer layers and taking the innermost level, you will forever hold the unfalsifiable belief that you live in an inescapable Matrix, even as you use your knowledge to correctly model reality and use your model to maximize your personal utility function (or whatever it is Solomonoff inductors are supposed to do).

[-]Jonathan_Graehl15y00

On first skim - what's a "classical configuration"? There are 3000 or so Google results (in conjunction with "Born") but I don't immediately see an answer.

[-]Manfred15y00

The thing that does what he says is a basis state. You shouldn't read too much into his description - they're not classical, for one thing.

[-][anonymous]15y00

Thanks for this.

The Born probability explanation sounds a lot like Scott Aaronson's explanation for why the moon is round: because if it weren't, we would not be ourselves, but rather entities exactly like ourselves except that they live in a universe with a square moon.

I don't know whether that's an argument against that explanation, or whether this is one of those cases where the reductio ad absurdum turns out to be true.

[+][comment deleted]4y10

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

108

The Absolute Self-Selection Assumption

108

108