Gratification: a useful concept, maybe new

by Stuart_Armstrong 25d25th Aug 20197 comments

17

Ω 7


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

 

             I did it... my way!

      Frank Sinatra, Paul Anka, and Elvis Presely, moral philosophers.

I like winning games. I particularly like winning when I can come up with a new and ingenious way of using the rules to get a bold new strategy. I don't particularly want to win by exploiting loopholes[1] in the rules, but some people - munchkins, hackers - seem to really enjoy doing things that way.

If a group I'm in comes up with a great solution to a problem, I'd prefer that that solution be mine, or a small tweak to my own solution. I like understanding mathematical concepts myself, even if I fully trust that they're correct.

I still sometimes avoid walking on sidewalk cracks, and enjoy the challenge or trying to navigate oddly shaped tiles while only never stepping on their edges. In some circumstances (say in new restaurants, or old ones) I remind myself that I should want to try something new. But I also have small rituals I enjoy with some foods that I eat regularly.

Gratification

All the examples above share the same feature: they are not about the outcome of the process (my win, the problem is solved, the mathematical concept is proved true, I get to a destination, food is eaten), but about the process.

As my changing use of terms like "like" and "prefer" show, there examples could be formally grouped under either preferences utilitarianism or hedonic utilitarianism. They clearly can be preferences: I'd like to do things these ways rather than other ways. But they're also ways I get enjoyment from everyday processes.

But, even though they could be either preferences or hedonism, I think it's useful to put them in their own category, which I'm calling gratification until someone comes up with a better name[2].

The central example of a preference in preference utilitarianism is a preference for a state of the world or the outcome of some process; gratifications are about details of how the process is implemented. The central example of hedonism is happiness or joy; gratifications mix some enjoyment (which can be quite low) with the specifics of how what caused that enjoyment.

So, though the categories overlap, I'd argue that it's useful to consider gratifications as a separate category. To give it some formalisation, I'll preliminarily denote it as a "preference over the details of a process (rather than the outcome) that an agent derive some value from". An informal definition is that it's any part of your behaviour which, if questioned, you're tempted to answer "I did it my way!".

What this clarifies

First of all, note that gratifications are almost all identity preferences. They don't fit perfectly into that category; for example, I might value a ritual I take part in, which also includes the actions of others as well.

Nevertheless, most aspects of gratifications are very much identity preferences: we generally value them because we take part in the process, not because the process happens. This may allow us to get some extra analysis about what "identity preferences" actually consist of.

There is a strong connection between gratification and slack. Slack allows you the space to be yourself; gratification is one of the elements of being yourself. As pressure on you piles up, you have less slack; as outcomes become more important, you sacrifice your gratification to achieve them.

This is one reason to distinguish outcome-preferences and hedonism from gratification: we are willing to not be perfectly efficient in reaching outcomes, or perfectly hedonistic, in order to have some gratification. Note that "not perfectly efficient" means we are willing to sacrifice outcomes and hedonism for gratification[3], at least to some extent.

Some aspects of my research agenda can better be understood as modelling gratifications.

For example, the human refusal to accept the outcome of the synthesis process (section 4.5). When we do so, it's not that we have a particular outcome in mind that we would think would be better. It's that, intrinsically, we value not having values imposed upon us ("MYYYY WAYYYYY!!!"). Indeed, if the process were slightly different, we would be more willing to accept the outcome: maybe a process that asked for our explicit feedback, run by a humble and appreciative AI, that allowed us to tweak the final synthesis, and that was presented in a way that was more gratifying to our self-image. This suggests that this preference is much more about the process than the outcome.

The same might be true for (some versions of) our preference for continued moral growth. "Every day, in every way, I'm getting better": for many people, it's the process of improvement that's a key component, not becoming immediately perfectly moral. Again the connection with slack and sacrificing outcomes: allowing your values to drift and change is inefficient, but we value that. So an ideal process would be some compromise between the some idealised outcome of our values, and some allowance for the process of moral growth.


  1. What's the difference between exploiting a loophole and coming up with a new ingenious way of using the rules? Well, in part it's within my own preferences - "up to here, it's ingenious, after that, it's exploitation". But many humans also feel there is a difference in the concepts, and draw a line between them, even if the line is drawn in different places for different people. ↩︎

  2. Or, more likely, points me to some philosophical text where the concept is better-defined - if anyone knows of such a text, please link it to me! ↩︎

  3. The concept of gratification came to me when I was thinking about what sacrifices perfect efficiency would impose on us, and I realised this would cause us to lose something we valued. Since that was a loss, that means that a preference was not being expressed, so I investigated what that preference was. ↩︎

17

Ω 7