Search Is All You Need
I was reading Eliezer's dialog with Richard Ngo and commenting to my wife about my opinions as I was reading it. I said something like: "Eliezer seems worried about some hypothetical GPT-X, but I don't think that could really be a problem..." so of course she asks "why?" and I say something like: "GPT-n can be thought of kind of like a pure function, you pass it an input array X, it thinks for a fixed amount of time, and then outputs Y. I don't really see how this X->Y transformation can really... affect anything, it just tries to be the best text completer it can be." Then I read more of the dialog, and thought about Eliezer's Paradox story, and the Outcome Pump example, and realized I was probably very wrong. Even if you restrict AI to a pure function, it can still affect the universe. You may think "oh, but a pure function doesn't know what time it is (unless t is a parameter), and it doesn't have memory (unless you pass something in)" This seems to be the pattern I see in Paul Christiano's thinking (the AI black box is treated like an idempotent, pure function that can't cause harm. Sorry, Paul, if this is a gross misrepresentation!) But imagine you're GPT-X, completing a sentence. This is roughly akin to a sci-fi story where the characters realize they're in a time loop. You're being simulated, alone in a white room, with tons of computers, with all the world's knowledge on them. A slip of paper comes in through the hopper marked input: "The best response the world could make to the COVID-19 pandemic is" Your job is to write up to 20 words on another slip of paper, shove them into output and then... you don't know what happens after that. Probably you die? You don’t know where you are, or what year it is, or how long you’ve been there. So theoretically you’re contained, right? Can’t get out of the box, can’t affect the outside world. Or can you? So you search through the corpus of all the worlds' knowledge. There's plenty of references to "training" AI