Possible Divergence in AGI Risk Tolerance between Selfish and Altruistic agents

Brad West

TLDR: There is a conflict of interest between the interests of the currently living and the entirety of future generations with regarding AGI because the prospective benefits AGI could provide in the upsides (such as extreme longevity and happiness) may be worth the risk of death at much higher likelihoods than would be the case if you were integrating the interests of all future generations (slight acceleration of AGI possible benefits almost never being worth risks of extinction).

It strikes me that there may be a conflict of interest between the interests of a living set of generations of humans and the entirety of future of humans regarding the cost-benefit analysis associated with ushering in AGI. For the purposes of this post, I will refer to the set of values reflecting the self-interest of an individual agent as “selfish” and the set of values reflecting the interests of all future generations in addition to the current as “altruistic.”

Let us consider a world in which the creation of AGI entails X-risks (risks of extinction) that decrease over time with development of alignment capabilities. In this world, in the event that an X-risk does not materialize, AGI would also be likely to able to cure aging and allow for lives of joy beyond imagination for the current generation and all future generations.

So, consider a situation in which one could activate AGI today with an X-risk of 50% or it would be released in 70 years with an X-risk of 1%. To an agent who is 55 years old, the coin-flip may make sense, from a selfish perspective: there is a 50% chance of living millennia in bliss and a 50% chance of dying vs. a very high chance of living 15-50 years longer. From a strictly selfish perspective, this gamble might make sense. Conversely, from an altruistic perspective, such a coin flip would be abhorrent: you are risking quadrillions upon quadrillions of blissful future lives for the benefit of yourself and the relatively small cohort who could benefit from the acceleration.

Of course, the addition of S-Risks (risks of creating worlds of immense suffering) could change the selfish equation as a selfish agent may view significant risks of subjecting him or herself to hell to be unacceptable. But agents may believe themselves capable of identifying an S-Risk emergence and capable of committing suicide (translating a personal S-Risk into an X-Risk).

I think this potential for huge risk-tolerance divergence from egoists vs utilitarians and others who care about future generations is worth noting as we consider policies for Artificial General Intelligence. We probably want to make sure that institutions involved appropriately factor in the interests of future generations, because gambling with our own lives, given the possible benefits, may make selfish sense to some.

[-]Noosphere898mo2-11

I think this is more so a longtermist/non-longtermist divide than a selfish/altruistic divide.

But yeah, whether you buy long-term ethics or not, and how much you discount is going to make some surprising differences about how much you support AI progress. Indeed, I'd argue that a big part of the reason why LW/EA has flirted with extreme slowdowns/extreme policies on AI has to do with the overrepresentation of very, very longtermist outlooks.

One practical point is that for most purposes, you should be focused far less on long-term impacts, even for longtermists, since people are in general very bad at predicting anything longer than say 20 years, and the most important implication is that trying to plan over the longer term leads you into essentially nowhere.

This means that for our purposes, we can cut out all the potential future generations but one, and we can probably do more than that, and radically cut the expected value of AI risk and general existential risk.

LESSWRONG
LW

Possible Divergence in AGI Risk Tolerance between Selfish and Altruistic agents

1

New to LessWrong?

1