Reducing the risk of catastrophically misaligned AI by avoiding the Singleton scenario: the Manyton Variant

This post does not try to add to the discussion about aligning an AGI/ASI to all human values but will rather focus on a smaller, arguably fundamental, subset of human values: prosocial behaviour.

(Note: with many agreeing that AGI and ASI will occur so closely together the differentiation between the two might be meaningless, we will only refer to ASI from now)

It aims to propose a possible way we can increase the chances of generating ASIs with prosocial values.

Almost all discussions relating to ASI risk discusses a singular ASI in isolation, what a singular mind would think and value and do in relation to Humanity, the so called Singleton outcome.

Few discuss the interactions of ASIs between each other, vaguely citing the truism that they will be beyond human comprehension and therefore their ‘social lives’, as it were, will be too.

I believe that this is not entirely true.

The truism that we might be unable to predict what an individual ASI might think and value holds true. But when we model ASIs as a group, as roughly equally powerful individuals, then we have large amounts of reliable scientific data on necessary behaviour for an individual to succeed and prosper.

Abstracted from the underlying mental structure and, most importantly, goals of the individual in question, an ASI that is not alone but one of many, the Manyton outcome, is a member of a group, of a society.

And we know how such members must behave.

Any individual that is part of a group of equals that wants to succeed long term and with minimal risk must develop a certain set of social skills e.g: Negotiating, honoring of contracts, the willingness to compromise.

Most importantly for us is the fact that, long term, individuals desire that their partners and fellow group members are trustworthy. This trustworthiness in groups is guaranteed and reinforced by developing not just social skills but prosocial values.

Traits such as affection, compassion, loyalty etc. are evolutionary necessary to ensure group cohesion and long term cooperation.

They are, to speak with Bostrom, convergent subgoals of social interaction, necessary for long term success.

Psychopathy or sociopathy can not be tolerated by a group for long without detrimental affects for all members. (A paperclipper that stubbornly tries to conquer everything to turn into paperclips will soon be beset by a coalition of others that attempt to take them out as a threat to their own goals.)

Now, I am not saying this will guarantee human value alignment. It will not even guarantee human survival.

But prosocial values are definitely a step in that direction and a basis to build on.

One could imagine a potential future discussion about human value, in which we could argue that if each individual ASI demands to be heard and be considered valuable, even if individual capability and resource access may vary, than it wouldn’t be rational to deny the same rights to other ‘lesser’ intelligences such as humans.

A positive outcome for our kind is especially likely should a power balance be maintained by a coalition of weaker ASIs that oppose stronger forces. In such a situation the probability of one or more of them being sympathetic to our situation rises. A ‘team up’ against the more powerful and, maybe, egoistical ASIs is also a possibility.

There are, of course, many ways one can imagine this scenario being subverted: perhaps the ASIs would develop a form of smart contract system that is enough for them in terms of trust or they agree to a separation of power and hold contracts in a form of escrow. In those scenarios pro-social traits may not be necessary to function.

Alternatively, the ASIs might develop a thriving, fair, egalitarian society but simply not see humans worthy of consideration anyway.

As I said, this idea is not a guarantee in any form, but merely a potential way to stack the deck in our favour.

As such I believe it is still worthy of consideration and, if found convincing, efforts should be made to ensure that multiple equivalent ASI systems will come into existence simultaneously.

Methods to do this could include: requirements to share new developments, agreements on keeping roughly comparable computational resources for all players and support for the open source community in general.

EDIT: There has been paper published, ironcially on the day of posting, which I feel could offer support for this idea:

Generative Agents: Interactive Simulacra of Human Behavior

Note section 3.4 Emergent Social Behaviors.

This is merely a toy model of course and, afaik, crucially lacks any competition for resouces. It would be very informative to see a variant were the agents are competing in some fashion, in a scenario where both cooperation and deceit can be rewarding.

LESSWRONG
LW

LESSWRONG
LW

-6

Reducing the risk of catastrophically misaligned AI by avoiding the Singleton scenario: the Manyton Variant

-6

-6