Hedonium is AI Alignment

Coil

1. Introduction

I will argue that the outcome we should aim to achieve from aligned AI is Hedonium: the conversion of the universe into the densest possible packing of positive valence. Valence represents the preferability of a given moment, as though we asked you to compare which moments of life you would like to live again and then numbered them such that you predictably preferred to live higher-numbered moments. Said another way, valence is a measure of happiness in its most exalted form, adjusting for the reductive implications of the word “happy.” Hedonium can be analogized to Computronium, but with computation replaced by positive valence. Thus, one image of what Hedonium might look like is a universe tiled with devices that simulate the experience of an enlightened monk at peak bliss. While Hedonium can seem initially unattractive, I argue the more one comes to understand it, the more one realizes it is aligned with their preferences and desires that it come to pass.

2. Preferences

This article will take a moral anti-realist perspective, meaning that when I argue that you should aim for Hedonium, I am arguing it is aligned with what you want. This angle is a bit slippery because it’s easy for such an argument to be dismissed along the following lines: “I know what I want better than anybody, and when I imagine Hedonium, I don’t want it; therefore, there is no argument that could convince me I want it.” As a response to this point, consider the following scenario: Jack is choosing between two piles of money. One is a large pile of $1 bills, and the other is a smaller pile of $100 bills; however, Jack is too far from the piles to make out the denominations of the bills.The $100 pile is worth more, but Jack chooses the larger pile because he assumes size tracks value. I will call the “error” he makes here value conflation (VC). He conflates the size of the money piles with their value and then chooses based on the size even though his intention is to maximize value. In this hypothetical, it’s a suboptimal but reasonable VC to make because Jack is bottlenecked on information that would help him evaluate the status of his terminal goal (maximizing total value) and so must use this intermediary. However, sometimes VC can be more subtle. Imagine Jack lives in a society that only has $1 bills such that the value of any money pile is always proportional to its size. He’s gotten so used to this conflation that if you try to convince him that, in this case, the smaller pile has more value, he’ll have trouble even recognizing the distinction you are making. This parallels the case I want to make for Hedonium: it does satisfy your terminal goals but in ways that are so unfamiliar that it goes against some deeply ingrained VCs that are otherwise non-problematic and, therefore, so subtle that you do not recognize they are there. To illustrate what I mean, I will discuss a couple examples of VCs that often arise in evaluating Hedonium and then go over a bottom-up approach to evaluating Hedonium that I believe avoids VCs.

2.1. Speciesism VC

The first is the Speciesism VC, which manifests as a preference for humans. The idea is that Hedonium may be less desirable because it may not be humans which are recipients of the infinite bliss, but some kind of machine or other artificial consciousness. I believe that humanity, here, is conflated with consciousness (or, equivalently, capability to have qualia such as pleasure and pain). As an example, imagine that humanity lasts another million years, and over this time some amount of genetic drift occurs until eventually humanity becomes a new species, Homo futuris. Homo futuris are similar to Homo sapiens, but they are smarter, happier, and so on. Presumably, Homo futuris would still be moral patients, meaning most people would care about their well-being. The example demonstrates that species-level distinctions are actually quite uninformative in the context of preference.

2.2. Variety VC

The second VC is the Variety VC, the conflation between variety of experience and quality of experience. In general, if our lives are too unvaried, then they become dull, and so many people have an association between variety and other positive qualities such as excitement, happiness, and fulfillment; however, these qualities are ultimately separate.

Analyzing the following two scenarios helps to illustrate the Variety VC:

A) In the first scenario, you are at a beach for 5 seconds, then teleported to a waterfall for 5 seconds. Throughout both of the 5-second intervals your experiences have equal valence.

B) In the second scenario, you are at the beach for 10 seconds, where you have an experience of equivalent valence to the first scenario’s 5-second beach interval, but prolonged to 10 seconds.

Initially, the first scenario may appear more preferable due to the increased variety. I would argue that the mechanism for this preferability is imagining that small amount of extra relief or excitement that would come with a change of scenery. However, simulating this extra boost of excitement on top of otherwise equivalently preferable scenes would make the second interval higher valence overall, which would be an inaccurate simulation of the proposed scenario. A more accurate simulation would be to assume that the waterfall is slightly less interesting than the beach but becomes more enjoyable through the change of scenery. Alternatively, you can imagine that they are equally enjoyable, but that each second you are overcome with amnesia, negating any novelty effects. These reframes are more accurate because they avoid the inclination to double-count novelty’s contribution to valence. Once one adopts these reframes, it becomes more intuitive that the scenarios are, in fact, equally preferable, owing to the fact that the sums of their moments have equal valence.

The following more counterintuitive example puts this concept into practice. Consider the following two scenarios:

A) A normal middle-class US citizen life, relatively happy and stable, with quite a lot of experiences.

B) A life equal in length to the first, where the only experience is sitting in a chair and staring at a blank wall, and it has the maximum possible valence at every instant.

Ignoring the impact or usefulness of a life outside of its experience, I posit the second scenario becomes preferable once the Variety VC is disentangled. While the first scenario has many of the markers of fulfillment, the second scenario has fulfillment itself, nearly by definition.

2.3. Disentangling Multiple VCs

If we replace our person living in complete bliss while staring at a chair with a member of Homo futuris in the same scenario, we can effectively contend with both of these VCs at the same time. Entangling multiple VCs helps us build closer to the intuition for Hedonium. We can further extend the example as follows:

A) A normal middle-class US citizen life, relatively happy and stable, with quite a lot of experiences.

B) A life equal in length to the first, where the only experience is that of utmost bliss possible at every instant with the maximum possible valence. This experience is being simulated by a conscious unit in some kind of universe-tiling.

It’s important to note that even someone who agrees with the intuitions in the VCs we walked through might find this scenario initially unappealing. While there may be a few non-specified VCs straggling around while one is considering these scenarios, as VCs are dependent on one’s psychology, I would posit that after determining what these VCs are and disentangling them, you will find Scenario B preferable.

3. Bottom-up

While I could continue attempting to carve out examples of VC, I believe it will also be useful for me to build a model of what I believe to be terminal values from the ground up. I will first posit a weak form of consequentialism, which is that people prefer one scenario over another based on properties that manifest in those scenarios. This is useful because it tells us that if someone gives reason X for preferring Scenario 1 over Scenario 2, then reason X must correspond to some discrepancy between the scenarios that we can point at and describe. If this can’t be done, then X must be involved in a VC with some unnamed discrepancy. We can build on consequentialism with non-duality. From a non-dual perspective, it is equally valid and complete to look at reality either as a physical phenomenon or as a conscious phenomenon, in the same way it is equally valid and complete to look at a cell as either a biological phenomenon or a chemical phenomenon. Critically, this means that all of the necessary information to describe a scenario can be obtained from its corresponding conscious states.

We can combine consequentialism and non-duality to get the anti-speciesism argument I gave for free. By consequentialism, in order to bias against Hedonium due to humans’ unique moral patienthood, this property must “become manifest” (there must be a specific thing in reality you can point to that acts as a delineator). By non-duality, the delineator must necessarily be a conscious property rather than isolated biological technicalities. Thus, species-level distinctions only become relevant in a moral sense to the extent they encode some information about consciousness. Justifying “preference for humans,” speciesism, would require us having the capacity for some unique and preferable qualia, and for this capacity to be completely dependent on the exact same factors that define our humanity. This claim seems unlikely.

From consequentialism and non-duality, we can also build toward the claim that the preferability of a scenario is linear with respect to the valence of its constituent experiences, which is to say that you don’t have to worry about combined effects of experiences when analyzing the preferability of a scenario (such as a preference for variety would imply). To derive this claim, we start with non-duality, which indicates that constituent experiences offer a complete description of a scenario. Applying consequentialism, we see that any preference-relevant variables must be among those denoted in the description of the experiences. Thus, there is no preferability-relevant inter-experience interaction because this would imply the existence of some further variable that is outside of the experiences themselves, breaking the rule that consciousness completely describes reality.

Thus, we see that once scenarios are accurately analyzed, the most desirable outcome is that which has the maximum total valence. Since the fate of the future will most likely be determined by the outcomes of AI, aligning AI to this end is likely the most important goal for getting what we desire.

LESSWRONG
LW