Comments

Formal Solution to the Inner Alignment Problem

It's not that simulating is difficult, but that encoding for some complex goal is difficult, whereas encoding for a random, simple goal is easy.

I'm still mystified by the Born rule

I don't think I would take that bet—I think the specific question of what UTM to use does feel more likely to be off-base than other insights I associate with UDASSA. For example, some things that I feel UDASSA gets right: a smooth continuum of happeningness that scales with number of clones/amount of simulation compute/etc., and simpler things being more highly weighted.

I'm still mystified by the Born rule

Yeah—I think I agree with what you're saying here. I certainly think that UDASSA still leaves a lot of things unanswered and seems confused about a lot of important questions (embeddedness, uncomputable universes, what UTM to use, how to specify an input stream, etc.). But it also feels like it gets a lot of things right in a way that I don't expect a future, better theory to get rid of—that is, UDASSA feels akin to something like Newtonian gravity here, where I expect it to be wrong, but still right enough that the actual solution doesn't look too different.

I'm still mystified by the Born rule

FWIW I have decent odds on "a thicker computer (and, indeed, any number of additional copies of exactly the same em) has no effect", and that's more obviously in contradiction with UDASSA.

Absolutely no effect does seem pretty counterintuitive to me, especially given that we know from QM that different levels of happeningness are at least possible.

Like, I continue to have the dualing intuitions "obviously more copies = more happening" and "obviously, setting aside how it's nice for friends to have backup copies in case of catastrophe, adding an identical em of my bud doesn't make the world better, nor make their experiences different (never mind stronger)". And, while UDASSA is a simple idea that picks a horse in that race, it doesn't... reveal to each intuition why they were confused, and bring them into unison, or something?

I think my answer here would be something like: the reason that UDASSA doesn't fully resolve the confusion here is that UDASSA doesn't exactly pick a horse in the race as much as it enumerates the space of possible horses, since it doesn't specify what UTM you're supposed to be using. For any (computable) tradeoff between “more copies = more happening” and “more copies = no impact” that you want, you should be able to find a UTM which implements that tradeoff. Thus, neither intuition really leaves satisfied, since UDASSA doesn't actually take a stance on how much each is right, instead just deferring that problem to figuring out what UTM is “correct.”

I'm still mystified by the Born rule

For another instance, I weakly suspect that "running an emulation on a computer with 2x-as-thick wires does not make them twice-as-happening" is closer to the truth than the opposite, in apparent contradiction with UDASSA.

I feel like I would be shocked if running a simulation on twice-as-thick wires made it twice as easy to specify you, according to whatever the “correct” UTM is. It seems to me like the effect there shouldn't be nearly that large.

I'm still mystified by the Born rule

The only reason that sort of discarding works is because of decoherence (which is a probabilistic, thermodynamic phenomenon), and in fact, as a result, if you want to be super precise, discarding actually doesn't work, since the impact of those other eigenfunctions never literally goes to zero.

I'm still mystified by the Born rule

I think I basically agree with all of this, though I definitely think that the problem that you're pointing to is mostly not about the Born rule (as I think you mostly state already), and instead mostly about anthropics. I do personally feel pretty convinced that at least something like UDASSA will serve the test of time on that front—it seems to me like you mostly agree with that, but just think that there are problems with embededness + figuring out how to properly extract a sensory stream that still need to be resolved, which I definitely agree with, but still expect the end result of resolving those issues to look UDASSA-ish.

I also definitely agree that there are really important hints as to how we're supposed to do things like anthropics that we can get from looking at physics. I think that if you buy that we're going to want something UDASSA-ish, then one way in which we can interpret the hint that QM is giving us is as a hint as to what our Universal Turing Machine should be. Obviously, the problem with that is that it's a bit circular, since you don't want to choose a UTM just by taking a maximum likelihood estimate, otherwise you just get a UTM with physics as a fundamental operation. I definitely still feel confused about the right way to use physics as evidence for what our UTM should be, but I do feel like it should be some form of evidence—perhaps we're supposed to have a pre-prior or something here to handle combining our prior beliefs about simple UTMs with our observations about what sorts of UTM properties would make the physics that we find ourselves in look simple.

It's also worth noting that UDASSA pretty straightforwardly predicts the existence of happeningness dials, which makes me find their existence in the real world not all that surprising.

Also, it's not really either here not there, but as an aside I feel like this sort of discussion is where the meat of not just anthropics, but also population ethics is supposed to be—if we can figure out what the “correct” anthropic distribution is supposed to be (whatever that means), then that measure should also clearly be the measure that you use to weight person-moments in your utilitarian calculations.

Formal Solution to the Inner Alignment Problem

I think that “think up good strategies for achieving [reward of the type I defined earlier]” is likely to be much, much more complex (making it much more difficult to achieve with a local search process) than an arbitrary goal X for most sorts of rewards that we would actually be happy with AIs achieving.

Formal Solution to the Inner Alignment Problem

A's structure can just be “think up good strategies for achieving X, then do those,” with no explicit subroutine that you can find anywhere in A's weights that you can copy over to B.

Formal Solution to the Inner Alignment Problem

A is sufficiently powerful to select M which contains the complex part of B. It seems rather implausible that an algorithm of the same power cannot select B.

A's weights do not contain the complex part of B—deception is an inference-time phenomenon. It's very possible for complex instrumental goals to be derived from a simple structure such that a search process is capable of finding that simple structure that yields those complex instrumental goals without being able to find a model with those complex instrumental goals hard-coded as terminal goals.

Load More