The AI Endgame: A counterfactual to AI alignment by an AI Safety newcomer

LESSWRONG
is fundraising!
LW

The AI Endgame: A counterfactual to AI alignment by an AI Safety newcomer — LessWrong

What is the counterfactual to an aligned AI?

How does the extinction risk of misaligned AI compare to the suffering risk of an aligned AI?

What is the counterfactual to an aligned AI?

I think of the counterfactual to an aligned AI, as being a misaligned AI. There has arguably been a significant amount of research that has gone into highlighting the benefits of an aligned AI and the risks of a misaligned AI, but how much has there been for the counterfactual? How much research has happened towards the benefits of a misaligned AI, and the risks of an aligned AI?

Particularly, how does the extinction risk of misaligned AI compare to the suffering risk of an aligned AI?

Often misaligned AI is referenced as a global catastrophic risk, as relates to extinction. There are a lot of resources on the subject. However, I want to delve into how significant is the suffering risk (ie, an unrecoverable dystopia) posed by an aligned AI?

My hypothesis prior to research on the subject is that the suffering risk from aligned AI is likely significantly more probable (and likely more neglected, albeit maybe for good reasons) than extinction risk from misaligned AI. At the center of this hypothesis is the question of the “safeness” to whose commands the AI is aligned to.

The hypothesis is that more than likely the entity controlling the AI will be a resource-rich board and executive team of whichever for-profit Silicon Valley Corporation builds the AI first. Imagine what it would look like when this corporation or handful of individuals controls a practically omnipotent, omniscient tool like transformative AI.

We could ask what are this group’s motivations today? By their nature it is to maximize profits. How will that group’s motivations evolve if they were to in essence achieve control of the whole world? Would their motivations change, and if so why? What would likely change, would be that they can no longer be checked by the rest of the world (which arguably doesn’t happen enough even today).

Hypothetically, how confident are you in any single individual’s judgement if they have absolute power in the world? Is there such an individual or group you would trust with this responsibility today? I want to highlight an old quote “Power tends to corrupt, absolute power corrupts absolutely.” A transformative AI would provide just that: absolute power. Is there is an individual or small group you trust completely today with absolute power? If so, can you be certain you will be able to trust future generations to whom this power is passed on? Or trust a process by which such power is transferred?

Today we barely have systems in place that can control some of the biggest corporations in the world. Instead, these corporations end up controlling even the most powerful governments in the world. There would no longer be any feasible checks for whomever is controlling the AI.

Taking all this into account the hypothesis states that creating an aligned transformative AI will very likely lead to a suffering risk and an unrecoverable dystopia.

So the endgame question might become: Does humanity have better odds with aligned AI, or with unaligned AI? Which do you trust more, random all-powerful individual(s) to not abuse absolute power, or a transformative AI to not destroy everything?

Please share with me some thoughts to the contrary, I would love to see aligned AI in a more positive light, not based on its positives, but based on its risks.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

1

The AI Endgame: A counterfactual to AI alignment by an AI Safety newcomer

1

1