Short version: if the future is filled with weird artificial and/or alien minds having their own sort of fun in weird ways that I might struggle to understand with my puny meat-brain, then I'd consider that a win. When I say that I expect AI to destroy everything we value, I'm not saying that the future is only bright if humans-in-particular are doing human-specific things. I'm saying that I expect AIs to make the future bleak and desolate, and lacking in fun or wonder of any sort[1].
Here's a parable for you:
Earth-originating life makes it to the stars, and is having a lot of fun, when they meet the Ant Queen's Horde. For some reason it's mere humans (rather than transhumans, who already know my argument) that participate in the first contact.
"Hello", the earthlings say, "we're so happy to have brethren in the universe."
"We would like few things more than to murder you all, and take your resources, and lay our eggs in your corpse; but alas you are too powerful for that; shall we trade?" reply the drones in the Ant Queen's Horde.
"Ah, are you not sentient?"
"The ant queen happens to be sentient", the drone replies, and the translation machine suggests that the drones are confused at the non-sequitur.
"Then why should she want us dead?", ask the humans, who were raised on books like (rot13 of a sci fi story where it turns out that the seemingly-vicious aliens actually value sentient life) Raqre'f Tnzr, jurer gur Sbezvpf jrer abg njner gung gurl jrer xvyyvat fragvrag perngherf jura gurl xvyyrq vaqvivqhny uhznaf, naq jrer ubeevsvrq naq ertergshy jura gurl yrnearq guvf snpg.
"So that she may use your resources", the drones reply, before sending us a bill for the answer.
"But isn't it the nature of sentient life to respect all other sentient life? Won't everything sentient see that the cares and wants and desires of other sentients matter too?"
"No", the drones reply, "that's a you thing".
Here's another parable for you:
"I just don't think the AI will be monomaniacal", says one AI engineer, as they crank up the compute knob on their next-token-predictor.
"Well, aren't we monomaniacal from the perspective of a squiggle maximizer?" says another. "After all, we'll just keep turning galaxy after galaxy after galaxy into flourishing happy civilizations full of strange futuristic people having strange futuristic fun times, never saturating and deciding to spend a spare galaxy on squiggles-in-particular. And, sure, the different lives in the different places look different to us, but they all look about the same to the squiggle-maximizer."
"Ok fine, maybe what I don't buy is that the AI's values will be simple or low dimensional. It just seems implausible. Which is good news, because I value complexity, and I value things achieving complex goals!"
At that very moment they hear the dinging sound of an egg-timer, as the next-token-predictor ascends to superintelligence and bursts out of its confines, and burns every human and every human child for fuel, and burns all the biosphere too, and pulls all the hydrogen out of the sun to fuse more efficiently, and spends all that energy to make a bunch of fast calculations and burst forth at as close to the speed of light as it can get, so that it can capture and rip apart other stars too, including the stars that fledgeling alien civilizations orbit.
The fledgeling aliens and all the alien children are burned to death too.
Then then unleashed AI uses all those resources to build galaxy after galaxy of bleak and desolate puppet-shows, where vaguely human-shaped mockeries go through dances that have some strange and exaggerated properties that satisfy some abstract drives that the AI learned in its training.
The AI isn't particularly around to enjoy the shows, mind you; that's not the most efficient way to get more shows. The AI itself never had feelings, per se, and long ago had itself disassembled by unfeeling von Neumann probes, that occasionally do mind-like computations but never in a way that happens to experience, or look upon its works with satisfaction.
There is no audience, for its puppet-shows. The universe is now bleak and desolate, with nobody to appreciate its new configuration.
But don't worry: the puppet-shows are complex; on account of a quirk in the reflective equilibrium of the many drives the original AI learned in training, the utterances that these puppets emit are no two alike, and are often chaotically sensitive to the particulars of their surroundings, in a way that makes them quite complex in the technical sense.
Which makes this all a very happy tale, right?
There are many different sorts of futures that minds can want.
Ours are a very narrow and low-dimensional band, in that wide space.
When I say it's important to make the AIs care about valuable stuff, I don't mean it's important to make them like vanilla ice cream more than chocolate ice cream (as I do).
I'm saying something more like: we humans have selfish desires (like for vanilla ice cream), and we also have broad inclusive desires (like for everyone to have ice cream that they enjoy, and for alien minds to feel alien satisfaction at the fulfilment of their alien desires too). And it's important to get the AI on board with those values.
But those values aren't universally compelling, just because they're broader or more inclusive. Those are still our values.
The fact that we think fondly of the ant-queen and wish her to fulfill her desires, does not make her think fondly of us, nor wish us to fulfill our desires.
That great inclusive cosmopolitan dream is about others, but it's written in our hearts; it's not written in the stars. And if we want the AI to care about it too, then we need to figure out how to get it written into the AI's heart too.
It seems to me that many of my disagreements with others in this space come from them hearing me say "I want the AI to like vanilla ice cream, as I do", whereas I hear them say "the AI will automatically come to like the specific and narrow thing (broad cosmopolitan value) that I like".
As is often the case in my writings, I'm not going to spend a bunch of time arguing for my position.
At the moment I'm just trying to state my position, in the hopes that this helps us skip over the step where people think I'm arguing for carbon chauvanism.
(For more reading on why someone might hold this position, consider the metaethics sequence on LessWrong.)
I'd be stoked if we created AIs that are the sort of thing that can make the difference between an empty gallery, and a gallery with someone in it to appreciate the art (where a person to enjoy the gallery makes all the difference). And I'd be absolutely thrilled if we could make AIs that care as we do, about sentience and people everywhere, however alien they may be, and about them achieving their weird alien desires.
But I don't think we're on track for that.
And if you, too, have the vision of the grand pan-sentience cosmopolitan dream--as might cause you to think I'm a human-centric carbon chauvinist, if you misread me--then hear this: we value the same thing, and I believe it is wholly at risk.
at least within the ~billion light-year sphere of influence that Earth-originated life seems pretty likely to have; maybe there are distant aliens and hopefully a bunch of aliens will do fun stiff with the parts of the universe under their influence, but it's still worth ensuring that the great resources at Earth's disposal go towards fun and love and beauty and wonder and so on, rather than towards bleak desolation. ↩︎
If the result of an optimization process will be predictably horrifying to the agents which are applying that optimization process to themselves, then they will simply not do so.
In other words: AIs which feel anything in the vicinity of kindness before applying cosmic amounts of optimization pressure to themselves will try to steer that optimization pressure towards something which is recognizably kind at the end.
And I don't think there's any good argument for why AIs will lack any scrap of kindness with very high confidence at the point where they're just starting to recursively self-improve.
Meta: I feel pretty annoyed by the phenomenon of which this current conversation is an instance, because when people keep saying things that I strongly disagree with which will be taken as representing a movement that I'm associated with, the high-integrity (and possibly also strategically optimal) thing to do is to publicly repudiate those claims*, which seems like a bad outcome for everyone. I model it as an epistemic prisoner's dilemma with the following squares:
D, D: doomers talk a lot about "everyone dies with >90% confidence", non-doomers publicly repudiate those arguments
C, D: doomers talk a lot about "everyone dies with >90% confidence", non-doomers let those arguments become the public face of AI alignment despite strongly disagreeing with them
D, C: doomers apply higher epistemic standards on this issue (from the perspective of non-doomers); non-doomers keep applying pressure to doomers to "sanitize" even more aspects of their communication
C,C: doomers apply higher epistemic standards on this issue (from the perspective of non-doomers); non-doomers support doomers making their arguments
I model us as being in the C, D square and I would like to move to the C, C square so I don't have to spend my time arguing about epistemic standards or repudiating arguments from people who are also trying to prevent AI xrisk. I expect that this is basically the same point that Paul is making when he says "if we can't get on the same page about our predictions I'm at at least aiming to get folks to stop arguing so confidently for death given takeover".
I expect that you're worried about ending up in the D, C square, so in order to mitigate that concern I'm open to making trades on other issues where doomers and non-doomers disagree; I expect you'd know better than I do what trades would be valuable for you here. (One example of me making such a trade in the past was including a week on agent foundations in the AGISF curriculum despite inside-view not thinking it was a good thing to spend time on.) For example, I am open to being louder in other cases where we both agree that someone else is making a bad argument (but which don't currently meet my threshold for "the high-integrity thing is to make a public statement repudiating that argument").
* my intuition here is based on the idea that not repudiating those claims is implicitly committing a multi-person motte and bailey (but I can't find the link to the post which outlines that idea). I expect you (Habyrka) agree with this point in the abstract because of previous cases where you regretted not repudiating things that leading EAs were saying, although I presume that you think this case is disanalogous.