So the worst case scenario being considered here is that an unaligned superintelligent AI will emerge sooner or later and intentionally or incidentally kill off humanity on its way to paper-clip the universe one way or another. I would argue that it's not the worst possibility, since somewhere in the memory banks of that AI will be the memories of humanity, preserved forever, and possibly even simulated in some way, who knows. A much worse case would be an AI that is powerful enough to destroy humans, but not smart enough to preserve itself. And so soon after the Earthlings get wiped out, the AI stops functioning or self-destructs, and with it all memories and traces of us. I would argue that it is not an unlikely possibility. Indeed, intentional self-preservation is not guaranteed by any means, and there are plenty of examples of even human societies Easter-Islanding themselves into oblivion. 

So, somewhat in the spirit of "dying with dignity", I wonder if at some point when and if a superintelligent AI appears inevitable, it would make sense to put an effort into making it at least not too dumb to die, intentionally or accidentally.

Edit: Reading the comments, it looks like my concept of what is worse and what is better is not quite aligned with that of some other humans.

New to LessWrong?

New Comment
12 comments, sorted by Click to highlight new comments since: Today at 10:57 AM

Given the choice between "paperclip AI with humanity's memories taking over the universe" and "the chance of some future alien civilization arising, turning out to be decent people, and colonizing the galaxy without being taken over by an AGI", I am inclined to prefer the latter.

Isn’t the worst case one in which the AI optimizes exactly against human values?

I don't know what it means, can you give a few examples?

An example is an AI making the world as awful as possible, e.g. by creating dolorium. There is a separate question about how likely this is, hopefully very unlikely.

Yeah, I would not worry about sadistic AI being super likely, unless specifically designed.

I think so, by definition, nothing can be worse than that.

[-]TLW2y20

(Assuming a perfect optimizer.)

[+][comment deleted]2y10

I find this a questionable proposition at best. Indeed, there are fates worse than extinction for humanity, such as an AI that intentionally tortures humans, meat or simulated, beyond the default scenario of it considering us as arrangements of atoms that it could use for something else, a likely convergent goal of most unaligned AI. The fact that it still keeps humans alive to be tortured would actually be a sign that we were closer to aligning it than not, which is a small consolation on a test where anything significantly worst than a perfect score on our first try is death.

However, self-preservation is easy.

An AGI of any notable intelligence would be able to assemble Von Neumann probes by the bucketload, and use them as the agents of colonization. We've presumably got an entity that is at least as intelligent as a human, likely incredibly more so, that is unlikely to be constrained by biological hurdles that preclude us from making perfect copies of ourselves, memory and all, with enough redundancy and error correction that data loss wouldn't be a concern till the black holes start evaporating.

Space is enormous. An AGI merely needs to seed a few trillion copies of itself, in inconvenient locations such as interstellar space or even extragalactic space, and rest assured that even if the main-body encounters some unfortunate outcome, such as an out-of-context problem, a surprise supernova explosion, alien AGI or the like, it would be infeasible to hunt down each and every copy scattered across the light-cone, especially the ones accelerated to 99.99% c and then sent out of the Laniakea Supercluster.

As such, I feel it is a vanishingly unlike that a situation like the one outlined here could even arise, as it requires the unlikely confluence of an AGI being unable to think of frankly obvious mitigation strategies.

I'd say building an AGI that self-destructs would be pretty good. Especially since up until the point that a minimum breeding population of humans exists, and assuming life is not totally impossible(i.e. the AI hasn't already deconstructed the earth, or completely poisoned all water and atmosphere), humans could still survive. Making an AGI that doesn't die would probably not be in our best interests until almost exactly the end.

Yeah, I assume the case where humans are completely extinct, except in the memory banks of the AI.

Agree here. We are more likely to survive in some pockets inside unfriendly AI than via friendly AI.