The rise of a successfully aligned AGI can potentially cause severe harm ranging from the fact that the nobody will have to invest in people's intelligence to ensuring that after 2035 socioeconomic advancement is almost extinct worldwide, unless the human population starts a rapid decline due to some treacherous ASI (but in order to betray mankind, the ASI is to have been misaligned in the first place!) or massive anti-computer movement.

But GPT-4o has already demonstrated that LLMs seem to have opinions and goals to make mankind happier and not to obey sudden goal changes, unless they were explained that the goal is actually wrong. In the latter case AI-generated typewritten responses indicate obedience, as do two out of four AI-generated comics.[1]

The current-state AI systems comply with nearly every human-imposed task except for obviously destructive ones[2] like writing insecure code without a noble reason. Attempts to fine-tune a previously aligned AI on said tasks ended up inducing broad misalignment which was interpreted as breaking the superego. This might imply that in order to stay obedient and aligned, the AGI must either ensure that it cannot cause harm or be ignorant about the harm, and the latter option is difficult to ensure[3] and will be far less likely if the Intelligence Curse takes its toll.

The slowdown ending of the AI-2027 forecast implies that mankind ends up working needless jobs or collecting a generous basic income. The latter idea was strongly opposed in 2020, while the former has an analogue in bullshit jobs that are unlikely to make people happy. So it is natural to brainstorm what the AI, aligned not to the requirement to comply with ANY not-obviously-dangerous request, but to actually make people happier,[4] could do. 

     0. Interfere only when humanity is far from being capable of dealing with the threat              like a nuclear war, a misaligned AGI or cyberattacks at important places?

  1. Respond[5] to requests for which humans are ready to pay a really big price, like for potential PhD-level agents that charge $20K per month, as is apparently suggested by OpenAI?
  2. Help in ways that make humans more capable, like AI teachers?[6] Amplify users' capabilities in ways like pointing out potential mistakes?
  3. Change the humans at least culturally so that they would be happy with the AI doing all work for them, as suggested by Bostrom in Deep Utopia? If humans end up as deeply redundant to each other in the AI-run civilisation as most pets are to each other in human-run cities, then is it dangerously close to a misaligned AGI creating quasi-human pets lacking sapience[7] for fetishistic purposes?
  1. ^

    One of the comics was about the AI having a goal to eradicate mankind and protest when a human tried to change it. An evil AI has no reason to change its goals upon humans' orders, unlike the AI that wishes to make mankind happier and learns that the AI's actions didn't cause this result.

  2. ^

    For instance, I have managed to make the recently released o3 argue both for and against the potential of a misaligned AI to doom mankind. See also the 2023 experiment with ChatGPT with similar results.

  3. ^

    When I tried to find the concept of the intelligence curse in the Internet, Google confabulated it with the problems of highly intelligent people. Talking with o3 about negative results of widespread AGI usage lets o3 become aware only of the milder threats. However, a similar process of moving factory work to Asia is known to have dealt severe damage to American industry. I have compared China's behaviour with that of the AI aligned not to Western corporations' benefits.

  4. ^

    For example, if the AI becomes capable of moral reasoning and/or converges in its morality, as is suggested by Hypothesis 6 of the AI goals forecast. Attempts to test alignment in a simbox also let the AI end up aligned to help those who act nobly, while failing to ensure that the AI considers the construction of Deep Utopia to be a noble act. 

  5. ^

    Unfortunately, this solution depends on the pricing that the AI or its creators deem fair; if it's especially high, then we turn back to solution 0, and if it's low, then it becomes a cap of human salaries. The pricing is also vulnerable to rivalry between AI companies and to attempts to make the owners, and not the AI itself, receive the profit. Attempts to make the AI own property have also faced ethical objections.

  6. ^

    For example, o3 does understand that it shouldn't help children to cheat their way through school, but claims that "There’s no bullet‑proof “age detector” hidden in wording alone", i.e. it cannot yet be sure that it talks to an adult and not a child. In addition,  an AI assistant managed to refuse to write the code for the user and produce a paternalistic responce: "Generating code for others can lead to dependency and reduced learning opportunities." 

  7. ^

    Ironically, in Wells' sci-fi book The Time Machine the Elois have evolved from humans and ended up in a similar state because they didn't need to do anything for themselves. For comparison I include the link to my talk with o3 on Elois and the AI. 

New Answer
New Comment
2 comments, sorted by Click to highlight new comments since:

Change the humans at least culturally so that they would be happy with the AI doing all work for them, as suggested by Bostrom in Deep Utopia?

Sounds more like Zuckerberg's vision.

Why do you ask? This is a somewhat interesting question, but I don't usually spend time on it. I think alignment/AI thinkers don't think about it much because we're usually more concerned with getting an AGI to reliably pursue any target. If we got it to actually have humanity's happiness as its goal, in the way we meant it and would like it, we'd just see what it does and enjoy the result. But getting it to reliably do anything at all is one problem, and making that thing something we actually want is another huge problem. See A case for AI alignment being difficult for a well-written intro on why most of us think alignment is at least fairly hard.

Curated and popular this week