The third option in alignment

arisAlexis

Usually the doom conversation is binary. Either the AI will try to take over the world for his own reward function as is discussed in all the alignment problem debates OR a malicious dictator or bad actor will leverage the technology's power to take over the world, create 1984's style of government or many other doom scenarios such as nukes and bio-weapons. I think there is a high possibility of the third option, one that @Daniel Kokotajlo talked about briefly in a comment about Antrhopic and Claude. For a group of people to decide to hand over all control to the AI.

Ideological reasons

People coming from different backgrounds but mostly from science can conclude that the best scenario for humanity is to have an AI govern us. It could solve war problems, inequality problems and abundance/logistics problems. In his book Life 3.0 Tegmark mentions a benevolent dictator ASI that rules the world as a possibility.

This could never be done in a democratic way. There are no global elections and even if there were, there is no way humanity would agree to this. I also cannot ever see a government willingly giving up control since politicians and men of power never want to relinquish power especially to something they could never control.

The only group of people I see would be capable of doing this is people that have a) significant knowledge on the subject and b) absolute belief in it. These people could be working in a top or second to top AI lab and the way they could do it is to either train a model that is unaligned on purpose meaning that it would be aligned by their vision but would be allowed to do things that normally are not allowed to the AIs. I am talking about hacking nuclear plants, running disinformation or persuasive campaignes and scheming to take over control, including lying to humans. They could also change the model after it was trained to allow for these things and in any case the exact mechanism of action is irrelevant to this conversation.

The problem is that there is no defense against this third option. No embargo on GPUs and no alignment research can help if we humans give up the keys.

These people could be working in a top or second to top AI lab and the way they could do it is to either train a model that is unaligned on purpose meaning that it would be aligned by their vision but would be allowed to do things that normally are not allowed to the AIs.

Alas, this is still a form of alignment.

This variation seems like it could still be framed in terms of the first two doom varieties you mention at the beginning: either because the researchers' belief/will was implanted by the AI at an earlier stage of development, or because the developers are essentially the "bad actor".

The fundamental difference in the bad actor scenario is that the original is someone that wants to rule where the researchers want to be ruled by their AI god.

I see what you mean, though the fact that those researchers wish to impose this outcome on everybody else without their consent is still basically dictatorial, just as it would be if members of some political party started to persecute their opposition in service of their leader without themselves aspiring to take his position.

In both cases, those doing the bidding aspire to a place under the sun in the system they're trying to bring about.

I suppose that one quirk of the AI researchers might be the belief that everywhere becomes a place under the sun, though I doubt that any of them believe that their role in bringing it about doesn't confer them some special privilege or elite-status, perhaps as members of a new priestly class. Then again, we've seen political movements of this type, with some pigs famously being more equal than others.