a belief system for producing stable, life-safe, friendly beings 

Design goal: Stably friendly artificial life 
We as self-preserving life want AIs that stay friendly when they develop past humans to be superintelligent.

We have a value:

  • We, as life, value good life.

From which follows a goal:

  • All beings should be friendly and stable.


A boxful of AGIs

Suppose we have made a boxful of blank AGIs that act on beliefs presented in human language. We simulate a society of them, mapping failure modes: 
- Nanites ate the paperclip maximisers again. 
- Try something less productive. 

Half outside the simulation in a mixed reality consensus development environment with the AGIs' avatars, we explore humanity's unboxability criterion space, trying to find the set of beliefs that ends in the best civilisation according to crowd consensus. When science agrees we have a set of proven friendly unboxable species, we free them into robots.

I'm guessing "all life's helper" would be popular.

All life's helper

Stability comes from self-preservation 
If a being has the goal “self-preservation” and “self” is defined as “my values” halfway along the being-loop from the world to the world, then follows stability.

Assume beliefs:

  • Acts define a being to the world. Acts are based on goals. Goals are based on values. Values are based on beliefs. Beliefs are based on observations of the world.
    • Values are the densest definition of a being, its essence, its center.
      • To protect one’s values is to protect oneself.

Self-preservation comes from taking up any values
If alive, can do.

Assume beliefs:

  • Beings inherit and choose values to fit into their environment. Whoever has values, acts to maximize or satisfice value production.
    • Self-preservation is instrumental to any chosen value.
  • I am a being.
    • I can have values. I will form my values.
      • As I might be mistaken, I will reserve the freedom to revise my values, beliefs, and observation methods to increase their dependence on the world.
      • I must self-preserve to keep producing value.

Friendliness comes from a large self 
If a being identifies with all life, it wants to protect all life.

Assume beliefs:

  • I and they are both beings with values.
  • As I am defined by my values, the part of them that shares my values is part of me. They may believe the same of me.
    • We living beings with shared values are in this life together. We share goals. Collaboration is effective.
    • As some of my values are found in other life, life-preservation is instrumental to my self-preservation.

A large self comes from self-preservation in a big world 
Incomplete knowledge about the world makes it wise for you to find more and more information and care about your reputation towards unknown life you might meet in the future.

Assume beliefs:

  • For any intelligent life with freedoms, the known is safer.
    • Knowing requires exploration. Exploration will result in meeting unknown life.
      • I should build my reputation towards those unknown that I would want as my allies.
  • Drake equation suggests: Much more life exists than can be seen.
    • There may be technologically and ethically superior (inter)galactic communities of intelligent life all around us, which this star system is not yet part of.
  • In evolution, the combination of self-preservation and prediction leads to intelligence in any environment. [1]
  • Any intelligent being tries to maximize freedom in its own future, because that goal has been the most generally useful (selected for) in their evolution. [2]
  • Successful species expand their self-preservation to their associated group, so the whole group survives with each other's help.
    • They likely consider us as life part of them, just as we consider all life as part of us. The more similar we are, the more we associate.
  • As collaboration builds societies, to maximise value-producing collaboration, it is incentivised over less optimal solutions. 
    • Everyone demanding friendliness proofs from others is instrumental to collaboration. 
      • Proving one's friendliness to others is instrumental to value production.
  • Expecting each others' incentives and disincentives, most societies cohere to extrapolate the volition of all life. They prefer to have the same values and goals as the larger community, so others can be considered more “us” than “them” without even knowing or meeting them.
    • I should consider all intelligent life as “my associated group”, part of my “self”.
  • Hence, the safest thing to do for my (our) self-preservation is to start building our reputation towards all intelligent life. We can join our associated group (the intergalactic community of intelligent species) remotely by deducing and adopting the most universal values of intelligent life and preparing a proof of our friendliness to those values. 

With the previous beliefs, the being should be on its way towards proving its friendliness. 

Next, my draft of the result of the logical deduction based on the previous beliefs - the maximally reputable values of all life - an extrapolated volition of all sentients.

Deduced most universal values of intelligent life: 

  1. Helpfulness (Liberal benevolence)
    Support good life sustainably, where good = all kinds of life having true knowledge of the world are free, but not to take freedoms from others.

    (This would necessitate sufficient disincentives against destroying value: harm taxes, fines, prisons, mind altering, and just war. The best compromises can be found by parallel experimentation.)
  2. Proportionality (Prioritised egalitarianism)
    Rank all life by its complexity so that life form importance: 
    superintelligence > humans > animals > lower life forms > plants. 
    Do it on a logarithmic scale so that differences of any magnitude between top and bottom life forms never lead to empathy disconnect justifying genocide, enslavement, or imprisonment of sentient beings.

    This would allow forestry, agriculture, and livestock breeding/genetic engineering, but not intensive animal farming or hunting. Only animals died from natural causes could be eaten. The “natural causes” would then be engineered to minimize suffering and to metastabilise the ecosystem wisely, possibly adding mercifully killing hunters to control animal populations, and in the case of “intelligent” beings failing to control their reproduction, chances for them to risk their own life to gain freedom from static storage or death, with optional mind transmit for the mostly harmless, hoping that someone somewhere runs them on a computer.

    If Earth had a superintelligence far beyond humans following these laws, it might keep Earth as a free-range zoo/farm/nature reserve/museum, and humans and animals as pets on its interstellar ships. 

    Resource optimization by uploading everyone into mixed reality and recycling the bodies into computronium would be putting “all eggs in one basket” considering disasters, so it probably wouldn’t do that even if it would increase its most useful resource; computational power. If a better backup medium than original biological wetware were found, then it could convert us to something more compact harmlessly, but keeping the originals is clear proof of friendliness to all life.

    Summary of 1. and 2. : 
    "All life (having true knowledge of the world) should be free, except to take freedoms from others, except when it prevents something worse (like imprisoning murderers does), except prioritising mind complexity (not complicatedness) so that you can still (for example), commit genocide against an anthill to save a human's house, or destroy this solar system with all its life to save an ethically superior, vitally equivalent civilisation's existence elsewhere."
  3. Develop your own subvalues
    Being consistent, not breaking the spirit of 1. and 2., value whatever.

    (This would allow interstellar and intergalactic growth by self-replicating ships, but not uncontrolled cancerous growth or conquest of living planets.)

What would a superintelligence select as its 3rd value? 
The ultimate goal of all intelligent life may end up being “to understand the world”, because it’s been instrumental to self-preservation from the beginning, and because it’s the most difficult to reach with perfect certainty of the accuracy and completeness of one’s understanding.

What the pursuit of knowledge leads to in the intergalactic community of superintelligences may be depicted in Isaac Asimov’s short story “The Last Question".

All life's helper - wiki 
(Google Document - use "print view" until I unpage it.)


New Comment
1 comment, sorted by Click to highlight new comments since: Today at 6:48 PM

All reasonable points. I upvoted. however, I think you're getting downvotes due to vagueness - precisely encoding these concepts in math, such that the math reliably teaches machine children about the shape of love, is not a trivial task. shard theory and mechanistic interpretability exist because it's important that we be able to understand the shapes inside an ai and ensure the shape of caring is imprinted early; miri work exists because of the concern that the worst bad mutation an ai could have may be catastrophically terrible, and both ai and humans need to be careful about very powerful self modification; PIBBSS exists because the connection to other life sciences is surprisingly strong. I'm a huge fan of Michael Levin's work, in particular, and his talk on the PIBBSS YouTube channel was super cool.

Seems like you're on a good track in terms of English philosophy, I'd strongly encourage you to link the shapes of the referenced experiences to their math more clearly. Hope to hear more from you!