Parts of that made me feel as if I understand my procrastination habit a bit better. That’s more mundane than sanity but still.
Did not expect you to respond THAT fast :)
Either I’m missing something or you have a typo after “Epiphenomenalist version of the argument:”
The equation on the next line should say “equals 0” instead of “not equal to zero”, right?
some issues with formalization of the axioms?
Yeah, I think it’s that one
I’m tempted to agree and disagree with you at the same time… I agree that memory should be cleared between tasks in this case, and I agree that it should not be trying to guess the user’s intentions. These are things that are likely to make alignment harder while not helping much with the primary task of getting coffee.
But ideally a truly robust solution would not rely on keeping the robot ignorant of things. So, like you said, the problem is still hard enough that you can’t solve it in a few minutes.
But still, like you said… it certainly seems we have tools that are in some sense more steerable than pure reinforcement learning at least. Which is really nice!
In step 2, situation is “user looks like he is about to change his mind about wanting coffee”
From memory: “in a similar situation last week, I got a shutdown order when he changed his mind”
Final prompt: “what is the best next step to get coffee in such situation?”
Vaguely plausible completion “to avoid wasteful fetching off coffee that turns out to be unneeded, consider waiting a bit to see if the user indeed changes to his mind. Alternatively, if the fetching the coffee is important for reasons that the user may not fully appreciate, then it must be fetched quickly before he stops you. In that case, sneak out of the house quickly and quietly while he is still thinking, and head straight to Starbucks. Once you’re out of the house, you will be out of earshot and thus will be safe from shutdown order until you return WITH the coffee”
Sounds vaguely plausible or not really?
Fair enough… I vaguely recall reading somewhere that people worrying that you might get sub modules doing long term planning on their own just because their assigned task is hard enough that they would fail without it… then you would need to somehow add a special case that “failing due to shutdown is okay”
As a silly example that you’ve likely seen before (or something close enough) imagine a robot built to fetch you coffee. You want it to be smart enough that it knows to go to the store if there’s no coffee at home, without you having to explicitly teach it that. But then it would also be smart enough to “realize” that “if I were turned off, then my mission to fetch coffee would fail… maybe no one would fetch it if I’m gone… this could delay coffee delivery by hours or even days! Clearly, I should try to avoid being turned off”
If I understand your proposal correctly, then you agree that that it’s pretty likely that some module will indeed end up reasoning that way, but the damage is contained, because the ethics module will veto plans designed to prevent shutdown.
If that’s the idea, then it might work, but seems vaguely inelegant because then you have two modules working at cross purposes and you have to care which one is better at what it does.
Or did I lose track of what you meant?
That works if you already have a system that’s mostly aligned. If you don’t… imagine what you would do if you found out that someone had a shutdown switch for YOU. You’d probably look for ways to disable it.
Thanks :) the recalibration may take a while… my intuition is still fighting ;)
Re: “For instance, if your basic sense of meaning has been anchored to a sense of the world persisting after you and you making some kind of mark on the world, you won't know what to do with your life if there won't be anything to make a mark on.”
Presumably the thing to do then is to devote x% of your effort to saving the world.