Markvy — LessWrong

Zombies! Substance Dualist Zombies?

Markvy6mo10

Did not expect you to respond THAT fast :)

Zombies! Substance Dualist Zombies?

Markvy6mo32

Either I’m missing something or you have a typo after “Epiphenomenalist version of the argument:”

The equation on the next line should say “equals 0” instead of “not equal to zero”, right?

What's the Deal with Logical Uncertainty?

Markvy11mo10

some issues with formalization of the axioms?

Yeah, I think it’s that one

Try to solve the hard parts of the alignment problem

Markvy1y10

I’m tempted to agree and disagree with you at the same time… I agree that memory should be cleared between tasks in this case, and I agree that it should not be trying to guess the user’s intentions. These are things that are likely to make alignment harder while not helping much with the primary task of getting coffee.

But ideally a truly robust solution would not rely on keeping the robot ignorant of things. So, like you said, the problem is still hard enough that you can’t solve it in a few minutes.

But still, like you said… it certainly seems we have tools that are in some sense more steerable than pure reinforcement learning at least. Which is really nice!

Try to solve the hard parts of the alignment problem

Markvy1y20

In step 2, situation is “user looks like he is about to change his mind about wanting coffee”

From memory: “in a similar situation last week, I got a shutdown order when he changed his mind”

Final prompt: “what is the best next step to get coffee in such situation?”

Vaguely plausible completion “to avoid wasteful fetching off coffee that turns out to be unneeded, consider waiting a bit to see if the user indeed changes to his mind. Alternatively, if the fetching the coffee is important for reasons that the user may not fully appreciate, then it must be fetched quickly before he stops you. In that case, sneak out of the house quickly and quietly while he is still thinking, and head straight to Starbucks. Once you’re out of the house, you will be out of earshot and thus will be safe from shutdown order until you return WITH the coffee”

Sounds vaguely plausible or not really?

Try to solve the hard parts of the alignment problem

Markvy1y20

Fair enough… I vaguely recall reading somewhere that people worrying that you might get sub modules doing long term planning on their own just because their assigned task is hard enough that they would fail without it… then you would need to somehow add a special case that “failing due to shutdown is okay”

As a silly example that you’ve likely seen before (or something close enough) imagine a robot built to fetch you coffee. You want it to be smart enough that it knows to go to the store if there’s no coffee at home, without you having to explicitly teach it that. But then it would also be smart enough to “realize” that “if I were turned off, then my mission to fetch coffee would fail… maybe no one would fetch it if I’m gone… this could delay coffee delivery by hours or even days! Clearly, I should try to avoid being turned off”

If I understand your proposal correctly, then you agree that that it’s pretty likely that some module will indeed end up reasoning that way, but the damage is contained, because the ethics module will veto plans designed to prevent shutdown.

If that’s the idea, then it might work, but seems vaguely inelegant because then you have two modules working at cross purposes and you have to care which one is better at what it does.

Or did I lose track of what you meant?

Try to solve the hard parts of the alignment problem

Markvy1y10

That works if you already have a system that’s mostly aligned. If you don’t… imagine what you would do if you found out that someone had a shutdown switch for YOU. You’d probably look for ways to disable it.

The Solution to Sleeping Beauty

Markvy1y20

Thanks :) the recalibration may take a while… my intuition is still fighting ;)

The Solution to Sleeping Beauty

Markvy2y20

Re: no coherent “stable” truth value: indeed. But still… if she wonders out loud “what day is it?” at the very moment she says that, it has an answer. An experimenter who overhears her knows the answer. It seems to me that you “resolve” this tension is that the two of them are technically asking a different question, even though they are using the same words. But still… how surprised should she be if she were to learn that today is Monday? It seems that taking your stance to its conclusion, the answer would be “zero surprise: she knew for sure she would wake up on Monday so no need to be surprised it happened”

And even if she were to learn that the coin landed tails, so she knows that this is just one of a total of two awakenings, she should have zero surprise upon learning the day of the week, since she now knows both awakenings must happen. Which seems to violate conservation of expected evidence, except you already said that the there’s no coherent probabilities here for that particular question, so that’s fine too.

This makes sense, but I’m not used to it. For instance, I’m used to these questions having the same answer:

P(today is Monday)?
P(today is Monday | the sleep lab gets hit by a tornado)

Yet here, the second question is fine (assuming tornadoes are rare enough that we can ignore the chance of two on consecutive days) while the first makes no sense because we can’t even define “today”

It makes sense but it’s very disorienting, like incompleteness theorem level of disorientation or even more

The Solution to Sleeping Beauty

Markvy2y40

Ah, so I’ve reinvented the Lewis model. And I suppose that means I’ve inherited its problem where being told that today is Monday makes me think the coin is most likely heads. Oops. And I was just about to claim that there are no contradictions. Sigh.

Okay, I’m starting to understand your claim. To assign a number to P(today is Monday) we basically have two choices. We could just Make Stuff Up and say that it’s 53% or whatever. Or we could at least attempt to do Actual Math. And if our attempt at actual math is coherent enough, then there’s an implicit probability model lurking there, which we can then try to reverse engineer, similar to how you found the Lewis model lurking just beneath the surface of my attempt at math. And once the model is in hand, we can start deriving consequences from it, and Io and behold, before long we have a contradiction, like the Lewis model claiming we can predict the result of a coin flip that hasn’t even happened yet just because we know today is Monday.

And I see now why I personally find the Lewis model so tempting… I was trying to find “small” perturbations of the experiment where “today is Monday” clearly has a well defined probability. But I kept trying use Rare Events to do it, and these change the problem even if the Rare Event is not Observed. (Like, “supposing that my house gets hit by a tornado tomorrow, what is the probability that today is Monday” is fine. Come to think of it, that doesn’t follow Lewis model. Whatever, it’s still fine.)

As for why I find this uncomfortable: I knew that not any string of English words gets a probability, but I was naïve enough to think that all statements that are either true or false get one. And in particular I was hoping they this sequence of posts which kept saying “don’t worry about anthropics, just be careful with the basics and you’ll get the right answer” would show how to answer all possible variations of these “sleep study” questions… instead it turns out that it answers half the questions (the half that ask about the coin) while the other half is shown to be hopeless… and the reason why it’s hopeless really does seem to have an anthropics flavor to it.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments