Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.


             I did it... my way!

      Frank Sinatra, Paul Anka, and Elvis Presely, moral philosophers.

I like winning games. I particularly like winning when I can come up with a new and ingenious way of using the rules to get a bold new strategy. I don't particularly want to win by exploiting loopholes[1] in the rules, but some people - munchkins, hackers - seem to really enjoy doing things that way.

If a group I'm in comes up with a great solution to a problem, I'd prefer that that solution be mine, or a small tweak to my own solution. I like understanding mathematical concepts myself, even if I fully trust that they're correct.

I still sometimes avoid walking on sidewalk cracks, and enjoy the challenge or trying to navigate oddly shaped tiles while only never stepping on their edges. In some circumstances (say in new restaurants, or old ones) I remind myself that I should want to try something new. But I also have small rituals I enjoy with some foods that I eat regularly.


All the examples above share the same feature: they are not about the outcome of the process (my win, the problem is solved, the mathematical concept is proved true, I get to a destination, food is eaten), but about the process.

As my changing use of terms like "like" and "prefer" show, there examples could be formally grouped under either preferences utilitarianism or hedonic utilitarianism. They clearly can be preferences: I'd like to do things these ways rather than other ways. But they're also ways I get enjoyment from everyday processes.

But, even though they could be either preferences or hedonism, I think it's useful to put them in their own category, which I'm calling gratification until someone comes up with a better name[2].

The central example of a preference in preference utilitarianism is a preference for a state of the world or the outcome of some process; gratifications are about details of how the process is implemented. The central example of hedonism is happiness or joy; gratifications mix some enjoyment (which can be quite low) with the specifics of how what caused that enjoyment.

So, though the categories overlap, I'd argue that it's useful to consider gratifications as a separate category. To give it some formalisation, I'll preliminarily denote it as a "preference over the details of a process (rather than the outcome) that an agent derive some value from". An informal definition is that it's any part of your behaviour which, if questioned, you're tempted to answer "I did it my way!".

What this clarifies

First of all, note that gratifications are almost all identity preferences. They don't fit perfectly into that category; for example, I might value a ritual I take part in, which also includes the actions of others as well.

Nevertheless, most aspects of gratifications are very much identity preferences: we generally value them because we take part in the process, not because the process happens. This may allow us to get some extra analysis about what "identity preferences" actually consist of.

There is a strong connection between gratification and slack. Slack allows you the space to be yourself; gratification is one of the elements of being yourself. As pressure on you piles up, you have less slack; as outcomes become more important, you sacrifice your gratification to achieve them.

This is one reason to distinguish outcome-preferences and hedonism from gratification: we are willing to not be perfectly efficient in reaching outcomes, or perfectly hedonistic, in order to have some gratification. Note that "not perfectly efficient" means we are willing to sacrifice outcomes and hedonism for gratification[3], at least to some extent.

Some aspects of my research agenda can better be understood as modelling gratifications.

For example, the human refusal to accept the outcome of the synthesis process (section 4.5). When we do so, it's not that we have a particular outcome in mind that we would think would be better. It's that, intrinsically, we value not having values imposed upon us ("MYYYY WAYYYYY!!!"). Indeed, if the process were slightly different, we would be more willing to accept the outcome: maybe a process that asked for our explicit feedback, run by a humble and appreciative AI, that allowed us to tweak the final synthesis, and that was presented in a way that was more gratifying to our self-image. This suggests that this preference is much more about the process than the outcome.

The same might be true for (some versions of) our preference for continued moral growth. "Every day, in every way, I'm getting better": for many people, it's the process of improvement that's a key component, not becoming immediately perfectly moral. Again the connection with slack and sacrificing outcomes: allowing your values to drift and change is inefficient, but we value that. So an ideal process would be some compromise between the some idealised outcome of our values, and some allowance for the process of moral growth.

  1. What's the difference between exploiting a loophole and coming up with a new ingenious way of using the rules? Well, in part it's within my own preferences - "up to here, it's ingenious, after that, it's exploitation". But many humans also feel there is a difference in the concepts, and draw a line between them, even if the line is drawn in different places for different people. ↩︎

  2. Or, more likely, points me to some philosophical text where the concept is better-defined - if anyone knows of such a text, please link it to me! ↩︎

  3. The concept of gratification came to me when I was thinking about what sacrifices perfect efficiency would impose on us, and I realised this would cause us to lose something we valued. Since that was a loss, that means that a preference was not being expressed, so I investigated what that preference was. ↩︎


Ω 9

New Comment
7 comments, sorted by Click to highlight new comments since: Today at 10:04 PM

In which ways is the concept "gratificación" different from the well established "intrinsic motivation", where you like doing an activity for its intrinsic enjoyment rather than for its usefulness as an instrument towards a goal or its leading to an external reward?

intrinsic motivation

That might be the concept I'm looking for. I'll think whether it covers exactly what I'm trying to say...

I've heard this referred to as "experience preference", or sometimes "experiential utility", in that they are things you want to experience, distinct from states you want the universe to be in. (skipping the rabbit hole of whether all experience is memory or whether this is a preference for having a memory vs having an experience).

It occurs a lot in the negative as well - things you don't want to experience (or don't want ANYONE to experience), regardless of the state of the universe afterward. Many torture-tradeoff discussions hinge on this point - to a lot of people, suffering is bad not because of consequences or because is reduces a hedonic sum, but is a dimension of bad in itself.

Cool, thanks. I see the torture example as being closer to hedonism (or rather, anti-hedonism), though.

True. There are two distinctions (I think) you're making from base utilitarianism (preferences over state of the universe in terms of agent-experienced utility):

1) This is about path, not state. You have an opinion about something to do/experience that's independent of any difference in expected value of a future state. It's also (I think) explicitly indexical - you care that it's you having this experience, not that it's experienced by more people.

2) This is about ... something ... which isn't on the pain/pleasure axis. I'm less sure of this one, as I tend to experience identity-affirming things as somewhat pleasurable and I'm not sure that's any less comparable on this dimension than any other pleasure or personal disappointment.

The torture example is similar on the first point, but misses the second. Is that roughly correct?

Basically yes. My take on 2) is that identity-affirming things can be somewhat pleasurable - but they're unlikely to be the most pleasurable thing the human could do at that moment. So they can be valued for something else than pure pleasure.

And you can get other examples where someone, say, is truthful, even if that causes them more pain than a simple lie would.

I do things my way because I want to display my independence (not doing what others tell me) and intelligence (ability to come up with novel solutions), and because I would feel bored otherwise (this is a feature of how my brain works, I can't help it).

"I feel independent and intelligent", "other people see me as independent and intelligent", "I feel bored" are all perfectly regular outcomes. They can be either terminal or instrumental goals. Either way, I disagree that these cases somehow don't fit in the usual preference model. You're only having this problem because you're interpreting "outcome" in a very narrow way.

New to LessWrong?