I think it's fair to say that most of us would trust a superintelligent human over a superintelligent AI. While humans have done a lot of evil in the world, our objective function as individuals seems to line up pretty well with the objective function of other individuals and our species as a whole.

My attempt in this post is to create a rough approximation of the human objective function and get feedback from the more knowledgeable LessWrong community.

The Human Objective Function (bulletpoints):

-Desire for a variety of basic personal pleasure. It is not enough to simply have one type of pleasure; our desires constantly fluctuate. We are hungry, horny, sleepy, and lonely at different times. 

In my opinion, this prevents maximization towards a single goal, as we fear AGI would. If an AGI sometimes wants to make paperclips, sometimes wants to make people happy, and sometimes wants to be turned off, it would probably be easier to deal with than an AI solely focused on maximizing paperclips. We also can't predict our desires well in advance, so we typically take the most strides towards them when we feel like it.

-Desire for stability. We don't want the world to change much. Change often makes us afraid.

In the case of AI, this is really nice too. There's probably some ideal world for us, a bit like a videogame or something, but we don't want to be immediately thrown into it now. It would be a lot nicer to ease into it and see if it is really something we're interested in. This slow change gives us a good sense of what we actually want, because it lets our fluctuating desires average out over time. This is important because...

-Desire for change in a positive direction. Despite the painfulness of exercising, the feeling of growing in a positive direction is incredibly motivating. It gives us purpose. There's more to it than just the goal. The journey is often more important than the destination. We like stories where the protagonist grows, and we're annoyed when we're spoiled about an ending. The progress and struggle is meaningful as long as it is in a positive direction.

We can apply this to AI by making it content to be turned off or altered when it makes what we consider to be a mistake. One of the biggest problems is that we probably can't alter a superintelligent machine once it is already running, because it will have some motivation to avoid being altered or turned off, as that would interfere with its goal. An AI will probably happily modify itself to be better, but we also want to allow it to be modified by humans rather than itself.

-We value the objective function of others. It makes us happy when others are happy. While we prioritize our well-being over others and prioritize the well-being of those we know over those we don't, most humans are inherently good to others. We also value the free will of others. If we see someone about to make a bad decision, and we advise them against it, we will often let them carry through with their decision despite our own knowledge

Obviously, this is important for AI as well. We want AI to value our objective function as well as its own. This is trait is also why I've been so optimistic for the future of the human race. Even if someone like Jeff Bezos gains absolute power, I really do feel like he would attempt to improve the quality of our lives and give us agency over our decisions. Not just because of legacy, or tax-deduction, or anything else, but because most humans like to make others happy by default. So even if some future billionaire owns a solar system, as long as I get a continent too, I'll be happy.


Those are the elements of the human objective function that I'm confident about. Outside of AI, I think this is useful for living a better life. A close friend of mine became depressed after reading about AI Alignment timelines and stopped exercising or eating healthy because he would rather 'enjoy the moment while we still have it'. I saw that as a tremendous mistake, because he was no longer focused on positive growth. Even if the timeline is short and/or hopeless (which it is not!), it is still important to retain a sense of identity and positive change. Regardless of how your perception of the world changes (short AI timeline, cancer, unexpected positive/negative change), understanding your own personal objective function seems extremely important to improve your actual happiness.

 

 I'd like to open this to discussion here on LessWrong and make addendums to this post later as I have more ideas. 

New Comment