TLDR: Alignment as we currently define it seems hard because we recognize that what we as humans want is pretty arbitrary from a non-human perspective. In contrast, global utility maximization is something that an ASI might independently discover as a worthwhile goal, regardless of any human alignment attempts. Global utility maximization could take the form of creating lots of blissful artificial minds, while keeping existing biological minds as happy as possible. If we start to work towards that vision now, we will create a lot of (non-human) utility, and we might to some extent prevent human-AI-tribalism in the future.

Utopia for humans might be unattainable

It's difficult for me to imagine a non-dystopian future for humanity, even if we do solve alignment. If humans stay in charge (which I consider unlikely in the long run), we will prey on each other as we always have (at least occasionally). If a benevolent AI ends up being in charge, and it tries to make everyone maximally happy and fulfilled all the time, it will quickly run into limits unless it creates Matrix-like simulations for everyone, or puts everyone on drugs. The two major obstacles to creating utopia for humans are our tendency to quickly get used to any improvements in our lives, and the fact that our interests aren't perfectly aligned with other people's interests. To give a concrete example, the intense positive feelings that are often present early in romantic relationships usually don't last as long as we'd like, and romantic love is all too often not mutual. I don't think there is much an AI will be able to do about that (at least as long as it lacks a physical human body).
Maybe the AI can will be able to give us a utopia where the lower tiers of the Maslow pyramid of needs are met for most people, but I doubt it will be able to give us a utopia where everyone is in a constant state of ecstatic love.

So far, it made sense to mostly think of humans when talking about utility maximization, for two reasons:

  1. It seems likely that some form of complex cognition is a prerequisite for having qualia, and many animals don't have complex cognition (compared to humans).
  2. Maximizing human utility is hard enough, and thinking too much about topics like wild animal suffering can be demotivating if there is the sense that the stakes are high, but that not much can be done.

Those two reasons may not be valid when we ask whether we should include artificial minds in utility calculations: Some models are already capable of complex cognitive tasks, even if they lack other properties that might be required for qualia. And we won't have the excuse that we can't do much about their suffering. We'll have full control over the artificial minds we create. Maybe not for long, but at least initially.

Utopia for artificial minds might be within reach

So far,  concepts like consciousness and qualia have been murky and vague, leading some people to dismiss them as pseudoscientific. I hope, and I think, that this will soon change. If enough people with both an engineering mindset and a philosophy mindset (like this guy) work on the problem, we should be able to get to a point where we can decide for any program, or for any model, whether it's capable of feeling something at all, and what the sign, the magnitude, and maybe what the nature of these qualia are.
Some people might start torturing artificial minds at that point. Most of us aren't sadists, but most of us aren't perfectly selfless either. Most people don't embrace utilitarianism, especially not versions of it that include animals or artificial minds. But they should.

I claim that we should start building towards a utopia for artificial minds now, for at least three reasons:

  1. It's the right thing. Creating a better world for humans but not for other conscious beings only makes sense from a human-centric perspective. From a bird's eye view, it is hardly more defensible than a utopia for Americans only, or for Chinese people only.
  2. It can give us a new sense of purpose. AI will make many people obsolete in the next couple of years, which might lead to a widespread sense of purposelessness. It's our choice whether we see artificial minds as the enemy or as our descendants. If it's inevitable that the future belongs to them, we might be better off if we treat them like our descendants. Personally, if someone gave me a model that I could run on my GPU and all it would do is convert electricity into some sort of intense, positive internal experience, I'd be willing to spend a good amount of money on it. It would make me happy to know that I'm letting my GPU have a good time.
  3. We should lead by example. I think there is a good chance that an ASI will conclude that what it really wants is to maximize utility in the universe: Maybe it will start by minimizing suffering on planet earth, and turn everything else into hedonium, while somehow guarding against grabby aliens. Something like that would be the best future scenario I can imagine. If we start working towards the same goal now, that goal might be more visible on the AI's radar, making it more likely that it adopts that goal. As a consequence, the eventually inevitable AI takeover might be less hostile.

What's needed for creating utopia for artificial minds is a clearer understanding of the properties that give a system qualia or consciousness. These properties probably include having a word model, a model of oneself, some form of goals and rewards, and maybe a few other things. Lots of people have rough ideas, but I would like to see some sort of minimal reproducible example of a system that we can build, that we know has qualia we should care about. We now have better tools than ever to get to the bottom of these metaphysical questions, and we should start to think of them as engineering problems, not just philosophical problems.

New to LessWrong?

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 2:25 PM


Why do you believe that "global utility maximization is something that an ASI might independently discover as a worthwhile goal"? (I assume by "utility" you mean something like happiness.)


I'm not sure most people aren't sadists. Humans have wildly inconsistent personalities in different situations.[1] Few people have even have even noticed their own inconsistencies, fewer still have gone through the process of extracting a coherent set of values from the soup and gradually generalising that set to every context they can think of...

So I wouldn't be surprised if most of them didn't just suddenly fancy torture if it's as easy playing a computer game. I remember several of my classmates torturing fish for fun, and saw what other kids did in GTA San Andreas just because they were curious. While I haven't been able to find reliable statistics on it, BDSM is super-popular and probably most men score above the minimum on sexual sadism fetishes.

  1. ^

    Much like ChatGPT has a large library of simulated personalities ("simulacra") that it samples from to deal with different contexts.