TLDR: Alignment as we currently define it seems hard because we recognize that what we as humans want is pretty arbitrary from a non-human perspective. In contrast, global utility maximization is something that an ASI might independently discover as a worthwhile goal, regardless of any human alignment attempts. Global utility maximization could take the form of creating lots of blissful artificial minds, while keeping existing biological minds as happy as possible. If we start to work towards that vision now, we will create a lot of (non-human) utility, and we might to some extent prevent human-AI-tribalism in the future.
It's difficult for me to imagine a non-dystopian future for humanity, even if we do solve alignment. If humans stay in charge (which I consider unlikely in the long run), we will prey on each other as we always have (at least occasionally). If a benevolent AI ends up being in charge, and it tries to make everyone maximally happy and fulfilled all the time, it will quickly run into limits unless it creates Matrix-like simulations for everyone, or puts everyone on drugs. The two major obstacles to creating utopia for humans are our tendency to quickly get used to any improvements in our lives, and the fact that our interests aren't perfectly aligned with other people's interests. To give a concrete example, the intense positive feelings that are often present early in romantic relationships usually don't last as long as we'd like, and romantic love is all too often not mutual. I don't think there is much an AI will be able to do about that (at least as long as it lacks a physical human body).Maybe the AI can will be able to give us a utopia where the lower tiers of the Maslow pyramid of needs are met for most people, but I doubt it will be able to give us a utopia where everyone is in a constant state of ecstatic love.
So far, it made sense to mostly think of humans when talking about utility maximization, for two reasons:
Those two reasons may not be valid when we ask whether we should include artificial minds in utility calculations: Some models are already capable of complex cognitive tasks, even if they lack other properties that might be required for qualia. And we won't have the excuse that we can't do much about their suffering. We'll have full control over the artificial minds we create. Maybe not for long, but at least initially.
So far, concepts like consciousness and qualia have been murky and vague, leading some people to dismiss them as pseudoscientific. I hope, and I think, that this will soon change. If enough people with both an engineering mindset and a philosophy mindset (like this guy) work on the problem, we should be able to get to a point where we can decide for any program, or for any model, whether it's capable of feeling something at all, and what the sign, the magnitude, and maybe what the nature of these qualia are.Some people might start torturing artificial minds at that point. Most of us aren't sadists, but most of us aren't perfectly selfless either. Most people don't embrace utilitarianism, especially not versions of it that include animals or artificial minds. But they should.
I claim that we should start building towards a utopia for artificial minds now, for at least three reasons:
What's needed for creating utopia for artificial minds is a clearer understanding of the properties that give a system qualia or consciousness. These properties probably include having a word model, a model of oneself, some form of goals and rewards, and maybe a few other things. Lots of people have rough ideas, but I would like to see some sort of minimal reproducible example of a system that we can build, that we know has qualia we should care about. We now have better tools than ever to get to the bottom of these metaphysical questions, and we should start to think of them as engineering problems, not just philosophical problems.
Why do you believe that "global utility maximization is something that an ASI might independently discover as a worthwhile goal"? (I assume by "utility" you mean something like happiness.)
I'm not sure most people aren't sadists. Humans have wildly inconsistent personalities in different situations. Few people have even have even noticed their own inconsistencies, fewer still have gone through the process of extracting a coherent set of values from the soup and gradually generalising that set to every context they can think of...
So I wouldn't be surprised if most of them didn't just suddenly fancy torture if it's as easy playing a computer game. I remember several of my classmates torturing fish for fun, and saw what other kids did in GTA San Andreas just because they were curious. While I haven't been able to find reliable statistics on it, BDSM is super-popular and probably most men score above the minimum on sexual sadism fetishes.
Much like ChatGPT has a large library of simulated personalities ("simulacra") that it samples from to deal with different contexts.