How would you empirically distinguish between your invisible-pink-unicorn maximizer and something that wasn't an invisible-pink-unicorn maximizer? I mean, you could look for a section of code that was interpreting sensory inputs as number of invisible-pink-unicorns - except you couldn't, because there's no set of sensory inputs that corresponds to that, because they're impossible. If we're talking about counterfactuals, the counterfactual universe in which the sensory inputs that currently correspond to paperclips correspond to invisible-pink-unicorns seems just as valid as any other.

Well, there's certainly a set of sensory inputs that corresponds to /invisible-unicorn/, based on which one could build an invisible unicorn detector. Similarly, there's a set of sensory inputs that corresponds to /pink-unicorn/, based on which one could build a pink unicorn detector.

If I wire a pink unicorn detector up to an invisible unicorn detector such that a light goes on iff both detectors fire on the same object, have I not just constructed an invisible-pink-unicorn detector?

Granted, a detector is not the same thing as a maximizer, but the conceptual issue seems identical in both cases.

0linkhyrule57yThat does not follow. I'll admit my original example is mildly flawed, but let's tack on something (that's still impossible) to illustrate my point: invisible pink telekinetic unicorns. Still not a thing that can exist, if you define telekinesis as "action at a distance, not mediated through one of the four fundamental forces." But now, if you see an object stably floating in vacuum, and detect no gravitational or electromagnetic anomalies (and you're in an accelerated reference frame like the surface of the earth, etc etc), you can infer the presence of an invisible telekinetic something. Or in general - an impossible object will have an impossible set of sensory inputs, but the set of corresponding sensory inputs still exists.

What makes us think _any_ of our terminal values aren't based on a misunderstanding of reality?

by bokov 1 min read25th Sep 201389 comments


Let's say Bob's terminal value is to travel back in time and ride a dinosaur.

It is instrumentally rational for Bob to study physics so he can learn how to build a time machine. As he learns more physics, Bob realizes that his terminal value is not only utterly impossible but meaningless. By definition, someone in Bob's past riding a dinosaur is not a future evolution of the present Bob.

There are a number of ways to create the subjective experience of having gone into the past and ridden a dinosaur. But to Bob, it's not the same because he wanted both the subjective experience and the knowledge that it corresponded to objective fact. Without the latter, he might as well have just watched a movie or played a video game.

So if we took the original, innocent-of-physics Bob and somehow calculated his coherent extrapolated volition, we would end up with a Bob who has given up on time travel. The original Bob would not want to be this Bob.

But, how do we know that _anything_ we value won't similarly dissolve under sufficiently thorough deconstruction? Let's suppose for a minute that all "human values" are dangling units; that everything we want is as possible and makes as much sense as wanting to hear the sound of blue or taste the flavor of a prime number. What is the rational course of action in such a situation?

PS: If your response resembles "keep attempting to XXX anyway", please explain what privileges XXX over any number of other alternatives other than your current preference. Are you using some kind of pre-commitment strategy to a subset of your current goals? Do you now wish you had used the same strategy to precommit to goals you had when you were a toddler?