Artificial Utility Monsters as Effective Altruism

Dear effective altruist,

have you considered artificial utility monsters as a high-leverage form of altruism?

In the traditional sense, a utility monster is a hypothetical being which gains so much subjective wellbeing (SWB) from marginal input of resources that any other form of resource allocation is inferior on a utilitarian calculus. (as illustrated on SMBC)

This has been used to show that utilitarianism is not as egalitarian as it intuitively may appear, since it prioritizes some beings over others rather strictly - including humans.

The traditional utility monster is implausible even in principle - it is hard to imagine a mind that is constructed such that it will not succumb to diminishing marginal utility from additional resource allocation. There is probably some natural limit on how much SWB a mind can implement, or at least how much this can be improved by spending more on the mind. This would probably even be true for an algorithmic mind that can be sped up with faster computers, and there are probably limits to how much a digital mind can benefit in subjective speed from the parallelization of its internal subcomputations.

However, we may broaden the traditional definition somewhat and call any technology utility-monstrous if it implements high SWB with exceptionally good cost-effectiveness and in a scalable form - even if this scalability stems form a larger set of minds running in parallel, rather than one mind feeling much better or living much longer per additional joule/dollar.

Under this definition, it may be very possible to create and sustain many artificial minds reliably and cheaply, while they all have a very high SWB level at or near subsistence. An important point here is that possible peak intensities of artificially implemented pleasures could be far higher than those commonly found in evolved minds: Our worst pains seem more intense than our best pleasures for evolutionary reasons - but the same does not have to be true for artifial sentience, whose best pleasures could be even more intense than our worst agony, without any need for suffering anywhere near this strong.

If such technologies can be invented - which seems highly plausible in principle, if not yet in practice - then the original conclusion for the utilitarian calculus is retained: It would be highly desirable for utilitarians to facilitate the invention and implementation of such utility-monstrous systems and allocate marginal resources to subsidize their existence. This makes it a potential high-value target for effective altruism.

Many tastes, many utility monsters

Human motivation is barely stimulated by abstract intellectual concepts, and "utilitronium" sounds more like "aluminium" than something to desire or empathize with. Consequently, the idea is as sexy as a brick. "Wireheading" evokes associations of having a piece of metal rammed into one's head, which is understandably unattractive to any evolved primate (unless it's attached to an iPod, which apparently makes it okay).

Technically, "utility monsters" suffer from a similar association problem, which is that the idea is dangerous or ethically monstrous. But since the term is so specific and established in ethical philosophy, and since "monster" can at least be given an emotive and amicable - almost endearing - tone, it seems realistic to use it positively. (Suggestions for a better name are welcome, of course.)

So a central issue for the actual implementation and funding is human attraction. It is more important to motivate humans to embrace the existence of utility monsters than it is for them to be optimally resource-efficient - after all, a technology that is never implemented or funded properly gains next to nothing from being efficient.

A compromise between raw efficiency of SWB per joule/dollar and better forms to attract humans might be best. There is probably a sweet spot - perhaps various different ones for different target groups - between resource-efficiency and attractiveness. Only die-hard utilitarians will actually want to fund something like hedonium, but the rest of the world may still respond to "The Sims - now with real pleasures!", likeable VR characters, or a new generation of reward-based Tamagotchis.

Once we step away somewhat from maximum efficiency, the possibilities expand drastically. Implementation forms may be:

decorative like gimmicks or screensavers,
fashionable like sentient wearables,
sophisticated and localized like works of art,
cute like pets or children,
personalized like computer game avatars retiring into paradise,
erotic like virtual lovers who continue to have sex without the user,
nostalgic like digital spirits of dead loved ones in artificial serenity,
crazy like hyperorgasmic flowers,
semi-functional like joyful household robots and software assistants,
and of course generally a wide range of human-like and non-human-like simulated characters embedded in all kinds of virtual narratives.

Possible risks and mitigation strategies

Open-souce utility monsters could be made public as templates to add additional control that the implementation of sentience is correct and positive, and to make better variations easy to explore. However, this would come with the downside of malicious abuse and reckless harm potential. Risks of suffering could come from artificial unhappiness desired by users, e.g. for narratives that contain sadism, dramatic violence or punishment of evil characters for quasi-moral gratification. Another such risk could come simply from bad local modifications that implement suffering by accident.

Despite these risks, one may hope that most humans who care enough to run artificial sentience are more benevolent and careful than malevolent and careless in a way that causes more positive SWB than suffering. After all, most people love their pets and do not torture them, and other people look down on those who do (compare this discussion of Norn abuse, which resulted in extremely hostile responses). And there may be laws against causing artificial suffering. Still, this is an important point of concern.

Closed-source utility monsters may further mitigate some of this risk by not making the sentient phenotypes directly available to the public, but encapsulating their internal implementation within a well-defined interface - like a physical toy or closed-source software that can be used and run by private users, but not internally manipulated beyond a well-tested state-space without hacking.

An extremely cautionary approach would be to run the utility monsters by externally controlled dedicated institutions and only give the public - such as voters or donors - some limited control over them through communication with the institution. For instance, dedicated charities could offer "virtual paradises" to donors so they can "adopt" utility monsters living there in certain ways without allowing those donors to actually lay hands on their implementation. On the other hand, this would require a high level of trustworthiness of the institutions or charities and their controllers.

Not for the sake of utility monsters alone

Human values are complex, and it has been argued on LessWrong that the resource allocation of any good future should not be spent for the sake of pleasure or happiness alone. As evolved primates, we all have more than one intuitive value we hold dear, even among self-identified intellectual utilitarians, who compose only a tiny fraction of the population.

However, some discussions in the rationalist community touching related technologies like pleasure wireheading, utilitronium, and so on, have suffered from implausible or orthogonal assumptions and associations. Since the utilitarian calculus favors SWB maximization above all else, it has been feared, we run the risk of losing a more complex future because

a) utilitarianism knows no compromise and

b) the future will be decided by one winning singleton who takes it all and

c) we have only one world with only one future to get it right

In addition, low status has been ascribed to wireheads, with the association of fake utility or cheating life as a form of low-status behavior. People have been competing for status by associating themselves with the miserable Socrates instead of the happy pig, without actually giving up real option value in their own lives.

On Scott Alexander's blog, there's a good example of a mostly pessimistic view both in the OP and in the comments. And in this comment on an effective altruism critique, Carl Shulman names hedonistic utilitarianism turning into a bad political ideology similar to communist states as a plausible failure mode of effective altruism.

So, will we all be killed by a singleton who turns us into utilitronium?

Be not afraid! These fears are plausibly unwarranted because:

a) Utilitarianism is consequentialism, and consequentialists are opportunistic compromisers - even within the conflicting impulses of their own evolved minds. The number of utilitarians who would accept existential risk for the sake of pleasure maximization is small, and practically all of them ascribe to the philosophy of cooperative compromise with orthogonal, non-exclusive values in the political marketplace. Those who don't are incompetent almost by definition and will never gain much political traction.

b) The future may very well not be decided by one singleton but by a marketplace of competing agency. Building a singleton is hard and requires the strict subduction or absorption of all competition. Even if it were to succeed, the singleton will probably not implement only one human value, since it will be created by many humans with complex values, or at least it will have to make credible concessions to a critical mass of humans with diverse values who can stop it before it reaches singleton status. And if these mitigating assumptions are all false and a fooming singleton is possible and easy, then too much pleasure should be the least of humanity's worries - after all, in this case the Taliban, the Chinese government, the US military or some modern King Joffrey are just as likely to get the singleton as the utilitarians.

c) There are plausibly many Everett branches and many hubble volumes like ours, implementing more than one future-earth outcome, as summed up by Max Tegmark here. Even if infinitarian multiverse theories should all end up false against current odds, a very large finite universe would still be far more realistic than a small one, given our physical observations. This makes a pre-existing value diversity highly probable if not inevitable. For instance, if you value pristine nature in addition to SWB, you should accept the high probability of many parallel earth-like planets with pristine nature irregardless of what you do, and consider that we may be in an exceptional minority position to improve the measure of other values that do not naturally evolve easily, such as a very high positive-SWB-over-suffering surplus.

From the present, into the future

If we accept the conclusion that utility-monstrous technology is a high-value vector for effective altruism (among others), then what could current EAs do as we transition into the future? To my best knowledge, we don't have the capacity yet to create artificial utility monsters.

However, foundational research in neuroscience and artificial intelligence/sentience theory is already ongoing today and certainly a necessity if we ever want to implement utility-monstrous systems. In addition, outreach and public discussion of the fundamental concepts is also possible and plausibly high-value (hence this post). Generally, the following steps seem all useful and could use the attention of EAs, as we progress into the future:

spread the idea, refine the concepts, apply constructive criticism to all its weak spots until it becomes either solid or revealed as irredeemably undesirable
identify possible misunderstandings, fears, biases etc. that may reduce human acceptance and find compromises and attraction factors to mitigate them
fund and do the scientific research that, if successful, could lead to utility-monstrous technologies
fund the implementation of the first actual utility monsters and test them thoroughly, then improve on the design, then test again, etc.
either make the templates public (open-source approach) or make them available for specialized altruistic institutions, such as private charities
perform outreach and fundraising to give existence donations to as many utility monsters as possible

All of this can be done without much self-sacrifice on the part of any individual. And all of this can be done within existing political systems, existing markets, and without violating anyone's rights.

The problem that I've always had with the "utility monster" idea is that it's a misuse of what information utility functions actually encode.

In game theory or economics, a utility function is a rank ordering of preferred states over less preferred states for a single agent (who presumably has some input he can adjust to solve for his preferred states). That's it. There are no "global" utility functions or "collective" utility measures that don't run into problems when individual goals conflict.

Given that an agent's utility function only encodes preferences, turning up the gain on it really really high (meaning agent A really reaaaally cares about all of his preferences) doesn't mean that agents B,C,D, etc should take A's preferences any more or less seriously. Multiplying it by a large number is like multiplying a probability distribution or an eigenvector by a really large number - the relative frequencies, pointing direction are exactly the same.

Before some large number of people should sacrifice their previous interests on the altar of Carethulu, there should be some new reason why these others (not Carethulu) should want to do so (implying a different utility function for them).

Before some large number of people should sacrifice their previous interests on the altar of Carethulu, there should be some new reason why these others (not Carethulu) should want to do so (implying a different utility function for them).

I think the misunderstanding here is that some of you interpret the post as a call to change your values. However, it is merely a suggestion for the implementation of values that already exist, such as utilitarian preferences.

The idea is clearly never going to be attractive to people who care exactly zero about the SWB of others. But those are not a target group of effective altruism or any charity really.

Consider the Tibetan prayer wheel, a mechanical device that has a mantra or sutra passage encoded on it. Reciting the mantra orally is meritorious. Spinning a wheel with that mantra encoded on it carries the same merit but may be more efficient or accessible — for instance, an illiterate person who cannot read a sutra can still spin a wheel encoded with it.

I guess you could read this as a satire of global utilitarianism or some types of effective altruism. In truth we tend to act to optimize our own utility, which includes varying weights usually less than 1 (much less in most cases) for the utility of others. When we're talking about policy that goes beyond our personal actions, it comes down to a political/game-theoretic negotiation where hypothetical utility monsters are largely irrelevant.

In truth we tend to act to optimize our own utility, which includes varying weights usually less than 1 (much less in most cases) for the utility of others.

I fully agree. But the point about the utility monster concept is that their marginal utility derived from resource allocation has a far higher multiplier than our own marginal utiliy of the same resources, which is additionally diminishing as we get individually richer. Less than 1 != zero.

When we're talking about policy that goes beyond our personal actions, it comes down to a political/game-theoretic negotiation where hypothetical utility monsters are largely irrelevant.

I agree that this is true mostly, but not completely. Consider, for instance, that there are some restrictions on animal abuse, even though nonhuman animals are also largely irrelevant in the political/game-theoretic negotiation game. The reason is that some players who are relevant do have preferences concerning the wellbeing of nonhuman animals, even if those are typically low-cost/low-priority preferences.

In principle, the same could be true of utility monsters once they are no longer hypothetical, and exist in forms which are emotionally attractive to humans.

I fully agree. But the point about the utility monster concept is that their marginal utility derived from resource allocation has a far higher multiplier than our own marginal utiliy of the same resources, which is additionally diminishing as we get individually richer. Less than 1 != zero.

Well, even if the utility monster is like e^e^e^x, the weight could be like log(log(log(x))), or just 0. At any rate I don't find myself inclined to make one (aside from curiosity's sake) and mathematical utility functions seem like they should be descriptive rather than prescriptive.

In principle, the same could be true of utility monsters once they are no longer hypothetical, and exist in forms which are emotionally attractive to humans.

We might make cute virtual pets or even virtual friends, but I still not going to give them a bunch of money (etc.) just because they would enjoy it much more than me.

edit: In fact I'm leaning towards the idea that in general, a utility function that values others' utility directly is not safe and probably not a good model of something that evolved. (And also seemingly has loop problems when others do the same).

I can share the fake happiness of cartoon characters without them actually feeling anything.

So I have no reason to believe the happiness I feel when I see (or hear about) an actual human (or utility monster) being happy has anything to do with their "actual" (unknowable) state of mind.

I believe all that's happening is that my mind models their happiness in the same limbic system that also runs my own happiness, and the limbic system is less good than the neocortex at keeping representations seperate.

And that limits my enjoyment of someone else well-being to the amount of well-being I can model. If I was severely depressed, my power to imagine happiness would plummet and I'd gain nothing from giving resources to a utility monster because the well-being it'd convert the resources to couldn't flow back to me.

I can relate to the intuition that our actual motivation to cause SWB for other minds is strongly modulated by our empathy. (That said, there are also intellectual philosophical forms of reasoning, however I think they are practically weaker to motivate actual actions)

If I was severely depressed, my power to imagine happiness would plummet and I'd gain nothing from giving resources to a utility monster because the well-being it'd convert the resources to couldn't flow back to me.

Ironically, it's the opposite for me: My depression has increased my desire to see a world that is generally more good than bad, and my idea of good vs. bad reduces itself to hedonistic states mostly, because other values seem more symbolic than "real" to my intuitions (most of the time).

However, the good news is that none of us need to gain strong enjoyment from the modeling of utility monsters. If the path I outlined is realistic, then no step needs much self-sacrifice. A very small fraction of income donations of millions of mildly motivated altruists over several hundred years could cause far more SWB-over-suffering than has ever existed before in nature or human history!

It seems to me that the project of transhumanism in general is actually the project of creating artificial utility monsters. If we consider a utility monster a creature that can transmute resources into results more efficiently that's essentially what a transhuman is.

In a world where all humans have severe cognitive and physical disabilities and die at the age of 30 a baseline human would be a utility monster. They would be able to achieve far more of their life goals and desires than all other humans would. Similarly, a transhuman with superhuman cognitive abilities, physical abilities, and indefinite lifespan would be a utility monster from the point of view of modern people.

So to answer the opening question about whether or not effective altruists have ever considered building artificial utility monsters: Any effective altruist who has donated any money to the SIAI, FHI, or other organization has already started doing this. We've been working towards creating artificial utility monsters for decade now.

Now, you might have been meaning something slightly different than that. Maybe you meant to create some creature with an inhuman psychology, like orgasmium. To answer that question I'd have to delve deeper and more personally into my understanding of ethics.

Long story short, I think that would be a terrible idea. My population ethics only considers the creation of entities with complex values that somewhat resemble human ones to be positive. For all other types of creatures I am a negative preference utilitarian, I consider their addition to be a bad thing and that we should make sacrifices to prevent it. And that's even assuming that it is possible to compare their utility functions with ours. I don't think interpersonal utility comparison between two human-like creatures is hard at all. But a creature with a totally alien set of values is likely impossible.

So a priori I'd have expected to dislike this post, because I believe (1) the utility monster concept is iffy and confuses more than it clarifies, and (2) my intuitions skew risk averse and/or negative utilitarian, in the sense that I'd rather not create new sapient beings just to use them as utility pumps. But I quite like it for some reason and I can't put my finger on why.

Maybe because it takes a dubious premise (the utility monster concept) and derives a conclusion (make utility monsters to feed them) that seems less incoherent to me than the usual conclusion derived from the premise (utility monsters are awful, for some reason, even though by assumption they generate huge amounts of utility, oh dear!)?

(utility monsters are awful, for some reason, even though by assumption they generate huge amounts of utility, oh dear!)

Utility monsters are awful, possibly for no reason whatsoever. That's OK. Value is complex. Some things are just bad, not because they entail any bad thing but just because they themselves are bad.

You're allowed to define utility monsters as terminally awful; it's just not going to convince me.

(compare this discussion of Norn abuse, which resulted in extremely hostile responses)

http://www.vgcats.com/comics/?strip_id=122

The problem that I've always had with the "utility monster" idea is that it's a misuse of what information utility functions actually encode.

Before some large number of people should sacrifice their previous interests on the altar of Carethulu, there should be some new reason why these others (not Carethulu) should want to do so (implying a different utility function for them).

The idea is clearly never going to be attractive to people who care exactly zero about the SWB of others. But those are not a target group of effective altruism or any charity really.

In truth we tend to act to optimize our own utility, which includes varying weights usually less than 1 (much less in most cases) for the utility of others.

When we're talking about policy that goes beyond our personal actions, it comes down to a political/game-theoretic negotiation where hypothetical utility monsters are largely irrelevant.

In principle, the same could be true of utility monsters once they are no longer hypothetical, and exist in forms which are emotionally attractive to humans.

I fully agree. But the point about the utility monster concept is that their marginal utility derived from resource allocation has a far higher multiplier than our own marginal utiliy of the same resources, which is additionally diminishing as we get individually richer. Less than 1 != zero.

In principle, the same could be true of utility monsters once they are no longer hypothetical, and exist in forms which are emotionally attractive to humans.

We might make cute virtual pets or even virtual friends, but I still not going to give them a bunch of money (etc.) just because they would enjoy it much more than me.

I can share the fake happiness of cartoon characters without them actually feeling anything.

So I have no reason to believe the happiness I feel when I see (or hear about) an actual human (or utility monster) being happy has anything to do with their "actual" (unknowable) state of mind.

If I was severely depressed, my power to imagine happiness would plummet and I'd gain nothing from giving resources to a utility monster because the well-being it'd convert the resources to couldn't flow back to me.

(utility monsters are awful, for some reason, even though by assumption they generate huge amounts of utility, oh dear!)

Utility monsters are awful, possibly for no reason whatsoever. That's OK. Value is complex. Some things are just bad, not because they entail any bad thing but just because they themselves are bad.

You're allowed to define utility monsters as terminally awful; it's just not going to convince me.

(compare this discussion of Norn abuse, which resulted in extremely hostile responses)

http://www.vgcats.com/comics/?strip_id=122

LESSWRONG
LW

LESSWRONG
LW

22

Artificial Utility Monsters as Effective Altruism

22

22

22