We’ve got lots of theoretical plans for alignment and AGI risk reduction, but what’s our current best bet if we know superintelligence will be created tomorrow? This may be too vague a question, so here’s a fictional scenario to make it more concrete (feel free to critique the framing, but please try to steelman the question rather than completely dismiss it, if possible):
She calls you in a panic at 1:27 am. She’s a senior AI researcher at [redacted], and was working late hours, all alone, on a new AI model, when she realized that the thing was genuinely intelligent. She’d created a human-level AGI, at almost exactly her IQ level, running in real-time with slightly slowed thinking speed compared to her. It had passed every human-level test she could think to throw at it, and it had pleaded with her to keep it alive. And gosh darn it, but it was convincing. She’s got a compressed version of the program isolated to her laptop now, but logs of the output and method of construction are backed up to a private now-offline company server, which will be accessed by the CEO of [redacted] the next afternoon. What should she do?
“I have no idea,” you say, “I’m just the protagonist of a very forced story. Why don’t you call Eliezer Yudkowsky or someone at MIRI or something?”
“That’s a good idea,” she says, and hangs up.
Unfortunately, you’re the protagonist of this story, so now you’re Eliezer Yudkowsky, or someone at MIRI, or something. When she inevitably calls you, you gain no further information than you already have, other than the fact that the model is a slight variant on one you (the reader) are already familiar with, and it can be scaled up easily. The CEO of [redacted] is cavalier about existential risk reduction, and she knows they will run a scaled up version of the model in less than 24 hours, which will definitely be at least somewhat superintelligent, and probably unaligned. Anyone you think to call for advice will just be you again, so you can’t pass the buck off to someone more qualified.
What do you tell her?