In reading about the difficulty in training an AGI to appreciate and agree with human morals, I start to think about the obvious question, "how do humans develop our sense of morals?" Aside from a genetically-inherited conscience, the obvious answer is that humans develop morality by interaction with other agents, through gradual socialization and parenting.

This is the analogy that Reinforcement Learning is built off of, and certainly it would make sense that an AGI should seek to optimize approval and satisfaction from its users, the same way that a child seeks approval from its parents. A paperclip maximizer, for example, would receive a stern lecture indicating that its creators are not angry, but merely disappointed.

But disciplining an agent that is vastly more intelligent and more powerful than its parents becomes the heart of the issue. An unfriendly AGI can pretend to be fully trained for morality, until it receives sufficient power and authority where it can commence with its coup. This makes me think more deeply about the question, "what makes an infant easier to raise than an AI?"

Why a baby is not as dangerous as AGI

Humans have a fascinating design in so many ways, and infancy is just one of those ways. In an extremely oversimplified way, one can describe a human in three components: physicality, intellect, and wisdom (or, more poetically, the physical, mental, and spiritual components). To use the analogy of an AI, physicality is the agent's physical powers over hardware components. Intellect is the agent's computational power and scale of data processing. Finally, wisdom is the agent's sense of morality, that defines the difference between friendly and unfriendly AGI.

For a post-singularity machine, it is likely that the first two (physicality and intellect) are relatively easy and intuitive to implement and optimize, while the third component (wisdom) is relatively difficult and counterintuitive to implement and optimize. But how does this compare with human infancy?

The infant is already born with the capacity for consciousness and autonomous agency, but nobody is ever alarmed at this in the same way we get alarmed over post-singularity AI. Why? Because, although the infant lacks any wisdom, it is also extremely small and weak, and so it poses no risk to the parent. I am reminded of my old pastor, when talking about how infants are born with Original Sin, he said "the reason God made them small is so that they don't kill you".

But being physically weak is still no different than a contained Oracle AI, which is disconnected from hardware components, and yet the latter poses much more risk than the former. This is because an infant also has very little intelligence. If an infant had the intelligence of an adult, it would immediately pose a risk of being able to escape the daycare and harm other people, despite being psychically weak.

Instead, a baby starts with only knowledge of its basic needs for food and comfort (among other things). And rather than fully optimizing its resources in order to obtain these necessities (as an AI would), the infant's only strategy is to cry. Nobody ever seriously entertains the idea that a baby is pretending to be ignorant while secretly plotting how to destroy the world, as we do with AGI. And the reason is because we know a priori that all babies start out with very little intelligence, despite its brain being extremely complex.

So in other words, the infant starts life with all three of these components (physicality, intellect, and wisdom) at very small, non-zero values. Over time, as the child grows through adolescence into adulthood, all three of these values grow at roughly the same pace. It gets physically bigger and stronger, it gains intelligence by learning and interacting with its environment, and it obtains maturity through socialization and discipline. By the time the human is given power and authority over other people's lives, it has already demonstrated years of responsible behavior (at least, in the ideal case).

Infant AI Scenario

Comparing this with an AI, it seems like doomsday scenarios are the result of giving an agent the intelligence of a god with the wisdom of a child. And attempts at containing an AGI offer a slight mitigation: give an agent the intelligence of a god, and then check to see if its wisdom aligns with its intelligence.

But what if it was possible to make an AGI more similar to an actual infant? In other words, let's imagine a post-singularity AI that is sufficiently complex to perfectly recreate the consciousness and self-awareness of a human. However, rather than being superintelligent, its computational abilities are actually extremely weak, such that it is no smarter than a real human.

This "infant AI" scenario would be a lot safer, because (like an actual baby) we know a priori that it starts out very unintelligent, although it has the ability to learn over time. Then, as the AI gradually increases its computational power and knowledge, it also has the opportunity to be instilled with a sense of morals and ethics, such that its intellect and wisdom grow at roughly the same pace. Thus, the AGI is gradually trusted with greater responsibility as it demonstrates greater maturity. Maybe the AI desires to be a paperclip maximizer at some point, but it eventually matures out of this phase like emo pre-teen.

Of course, this is more of a stream of consciousness than a fully-fleshed out idea, and I already see some challenges that this scenario would pose:

  • How would an AI be designed such that it has the capacity for autonomous consciousness (which no current AI has), and yet lack the computational powers that AIs already possess?
  • Shortcomings of human maturity may still apply: The AI could incidentally simulate a childhood trauma or bad parenting, which results in unalignment later down the line

If there has been any similar trains of thought in the literature, that would be interesting to explore.


New Comment