Parts of that made me feel as if I understand my procrastination habit a bit better. That’s more mundane than sanity but still.
Did not expect you to respond THAT fast :)
Either I’m missing something or you have a typo after “Epiphenomenalist version of the argument:”
The equation on the next line should say “equals 0” instead of “not equal to zero”, right?
some issues with formalization of the axioms?
Yeah, I think it’s that one
I’m tempted to agree and disagree with you at the same time… I agree that memory should be cleared between tasks in this case, and I agree that it should not be trying to guess the user’s intentions. These are things that are likely to make alignment harder while not helping much with the primary task of getting coffee.
But ideally a truly robust solution would not rely on keeping the robot ignorant of things. So, like you said, the problem is still hard enough that you can’t solve it in a few minutes.
But still, like you said… it certainly seems we have tools that are in some sense more steerable than pure reinforcement learning at least. Which is really nice!
In step 2, situation is “user looks like he is about to change his mind about wanting coffee”
From memory: “in a similar situation last week, I got a shutdown order when he changed his mind”
Final prompt: “what is the best next step to get coffee in such situation?”
Vaguely plausible completion “to avoid wasteful fetching off coffee that turns out to be unneeded, consider waiting a bit to see if the user indeed changes to his mind. Alternatively, if the fetching the coffee is important for reasons that the user may not fully appreciate, then it must be fetch...
Fair enough… I vaguely recall reading somewhere that people worrying that you might get sub modules doing long term planning on their own just because their assigned task is hard enough that they would fail without it… then you would need to somehow add a special case that “failing due to shutdown is okay”
As a silly example that you’ve likely seen before (or something close enough) imagine a robot built to fetch you coffee. You want it to be smart enough that it knows to go to the store if there’s no coffee at home, without you having to explicitly teach ...
That works if you already have a system that’s mostly aligned. If you don’t… imagine what you would do if you found out that someone had a shutdown switch for YOU. You’d probably look for ways to disable it.
Thanks :) the recalibration may take a while… my intuition is still fighting ;)
Re: no coherent “stable” truth value: indeed. But still… if she wonders out loud “what day is it?” at the very moment she says that, it has an answer. An experimenter who overhears her knows the answer. It seems to me that you “resolve” this tension is that the two of them are technically asking a different question, even though they are using the same words. But still… how surprised should she be if she were to learn that today is Monday? It seems that taking your stance to its conclusion, the answer would be “zero surprise: she knew for sure she wou...
Ah, so I’ve reinvented the Lewis model. And I suppose that means I’ve inherited its problem where being told that today is Monday makes me think the coin is most likely heads. Oops. And I was just about to claim that there are no contradictions. Sigh.
Okay, I’m starting to understand your claim. To assign a number to P(today is Monday) we basically have two choices. We could just Make Stuff Up and say that it’s 53% or whatever. Or we could at least attempt to do Actual Math. And if our attempt at actual math is coherent enough, then there’s an impli...
This makes me uncomfortable. From the perspective of sleeping beauty, who just woke up, the statement “today is Monday” is either true or false (she just doesn’t know which one). Yet you claim she can’t meaningfully assign it a probability. This feels wrong, and yet, if I try to claim that the probability is, say, 2/3, then you will ask me “in what sample space?” and I don’t know the answer.
What seems clear is that the sample space is not the usual sleeping beauty sample space; it has to run metaphorically “skew” to it somehow.
If the question were “did...
I tried to formalize the three cases you list in the previous comment. The first one was indeed easy. The second one looks “obvious” from symmetry considerations but actually formalizing seems harder than expected. I don’t know how to do it. I don’t yet see why the second should be possible while the third is impossible.
I hope it’s okay if I chime in (or butt in). I’ve been vaguely trying to follow along with this series, albeit without trying too hard to think through whether I agree or disagree with the math. This is the first time that what you’ve written has caused to go “what?!?”
First of all, that can’t possibly be right. Second of all, it goes against everything you’ve been saying for the entire series. Or maybe I’m misunderstanding what you meant. Let me try rephrasing.
(One meta note on this whole series that makes it hard for me to follow sometimes: you use a...
I think this is much easier to analyze if you think about your plans before the experiment starts, like on Sunday. In fact, let’s pretend we are going to write down a game plan on Sunday, and we will simply consult that plan wherever we wake up and do what it says. This sidesteps the whole half vs third debate, since both sides agree about how things look better the experiment begins.
Furthermore, let’s say we’re going to participate in this experiment 100 times, just so I don’t have to deal with annoying fractions. Now, consider the following tentative g...
Here’s how I think of what the list is. Sleeping Beauty writes a diary entry each day she wakes up. (“Nice weather today. I wonder how the coin landed.”). She would like to add today’s date, but can’t due to amnesia. After the experiment ends, she goes back to annotate each diary entry with what day it was written, and also the coin flip result, which she also now knows.
The experiment is lots of fun, so she signs up for it many times. The Python list corresponds to the dates she wrote in her dairy.
I think that that’s what he meant: more aluminum in the brain is worse than less. What he was trying to say in that sentence is this: high levels in the blood may not mean high levels in the brain unless the blood level stays high for a long time.
Clear to me
“Bob isn't proposing a way to try to get less confused about some fundamental aspect of intelligence”
This might be what I missed. I thought he might be. (E.g., “let’s suppose we have” sounds to me like a brainstorming “mood” than a solution proposal.)
Re: “For instance, if your basic sense of meaning has been anchored to a sense of the world persisting after you and you making some kind of mark on the world, you won't know what to do with your life if there won't be anything to make a mark on.”
Presumably the thing to do then is to devote x% of your effort to saving the world.