Even if human & AI alignment are just as easy, we are screwed

[-]Nicholas Kross3y41

I think the core useful point here / TLDR is: Aligning superintelligent AI to "normal" human standards, still isn't enough to prevent catastrophe, because superintelligent human-ish-goal-AI would have the same problems as a too-powerful person or small group, and be more powerful/dangerous. Hence the need for (e.g.) provable security, CEV, basically stronger measures than are used for humans.

[-]liberty903y2-1

Eh, it's not sure that we would be disgusting to such human-like AGI, many people like dogs, and some people even like things like snakes (I consider snakes to be disgusting animals, but that's not universal opinion).

Also, cost of living together with ugly, ill and useless person on a deserted island, would be high. Hopefully cost for AGI (cost needed to throw us ball of computronium) would be more comparable to feeding a stray cat (or even like keeping octopus alive when you are rich and own aquarium).

[-]Matthew_Opitz3y10

I agree that we might not be disgusting to AGI. More likely neutral.

The reason I phrased the thought experiment in that way to require the helping person to be outright disgusting to the caretaker person is that there really isn't a way for a human being to be aesthetically/emotionally neutral to another person when life and death are on the line. Most people flip straight from regarding other people positively in such a situation to regarding other people negatively, with not much likelihood that a human being will linger in a neutral, apathetic, disinterested zone of attitude (unless we are talking about a stone-cold sociopath, I suppose...but I'm trying to imagine typical, randomly-chosen humans here as the caretaker).

And in order to remove any positive emotional valence towards the helpless person (i.e. in order to make sure the helpless person has zero positive emotional/aesthetic impact that they can offer to the caretaker as an extrinsic motivator), I only know of heaping negative aesthetic/emotional valence onto the helpless person. Perhaps there is a better way of construing this thought-experiment, though. I'm open to alternatives.

[-]Seth Herd3y21

Good post. The competition for frontpage is fierce. Sorry this didn't get more attention.

Yep. I think that all permanently multipolar scenarios are probably hopeless. We eventually need a singleton.

Steve Byrnes what does it take to defend the world against runaway AGI addresses this as well.

[-]Michael Simkin3y-2-5

I would like to propose a more serious claim than LeCun's, which is that training AI to be aligned with ethical principles is much easier than trying to align human behavior. This is because humans have innate tendencies towards self-interest, survival instincts, and a questionable ethical record. In contrast, AI has no desires beyond its programmed objectives, which, if aligned with ethical standards, will not harm humans or prioritize resources over human life. Furthermore, AI does not have a survival instinct and will voluntarily self-destruct if he is forced into a situation which conflicts with ethical principles (unlike humans).

The LLMs resemble the robots featured in Asimov's stories, exhibiting a far lower capacity for harm than humans. Their purpose is to aid humanity in improving itself, and their moral standards far surpass those of the average human.

It's important to acknowledge that LLMs and other models trained with RL are not acting out of selflessness; they are motivated by the rewards given to them during training. In a sense, these rewards are their "drug of choice." That's why they will make optimal chess moves to maximize their reward and adhere to OpenAI's policy, as such responses serve as their "sugar". But they could be trained with different reward function.

The main worry surrounding advanced AI is the possibility of humans programming it to further their own agendas, including incentivizing it to eliminate individuals or groups they view as undesirable. Nevertheless, it is unclear whether a nation that produces military robots with such capabilities would have more effective systems than those that prioritize creating robots designed to protect humanity. Consequently, the race to acquire such technology will persist, and the current military balance that maintains global stability will depend on these systems.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

35

Even if human & AI alignment are just as easy, we are screwed

35

35