Explain a non-VNM-rational architecture which is very intelligent, but has goals that are toggleable with a button in a way that is immune to the failures discussed in the article (as well as the related failures).
But we learned something from the exercise. We learned not just about the problem itself, but also about how hard it was to get outside grantmakers or journal editors to be able to understand what the problem was.
True and unfortunately extends beyond just grantmakers and journal editors.
This part was interesting:
“Ah,” says the computer scientist. “Well, in that case, how about if [some other clever idea]?”
Well, you see, that clever idea is isomorphic to the AI believing that it’s impossible for the button to ever be pressed, which incentivizes it to terrify the user whenever it gets a setback, so as to correlate setbacks with button-presses, which (relative to its injured belief system) causes it to think the setbacks can’t happen.
I would love to see Yudkowsky's or Soares's thoughts on Wentworth's Shutdown Problem Proposal, which seems to avoid the problems discussed here. At first glance it appears to fall under the above failure mode. But since it uses do
operations instead of conditional probability, Wentworth argues it doesn't have this problem.
This is a good post. I remember being so confused in a real analysis class when my professor started talking about how important it is that we restrict our attention to continuous linear functions (what on Earth was a discontinuous linear function supposed to be?). This post explains what's going on better than my professor or textbook did.
I agree with one of the other commenters that this part is not technically phrased accurately:
One way to think about continuity is that a function not having any "jumps" implies that it can't have an "infinitely steep" slope
Because eg the derivative of at is despite the fact that it's continuous there.
For your example to work you need to restrict the domain of your functions to some compact e.g. because the uniform norm requires the functions to be bounded.
Hmm, is this really a substantive problem? Call it an "extended norm" instead of a norm and everything in the post works, right? My reasoning: An extended norm yields an extended metric space, which still generates a topology — it's just that points which are infinitely far apart are in different connected components. Since you get a perfectly valid topology, it makes perfect sense for the post to talk about continuity. Or at least I think so; am I missing something?
This proposed solution is addressed directly in the article: