Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
  • "I ache for the day when I am in heaven singing the praise of God".

That's a preference that many religious believers have. Not being religious myself, I don't have it, and even find it factually wrong - but how can a preference be wrong?

Here's a preference I do have:

  • "I ache for a world where people spontaneously help each other overcome their problems".

That preferences is also "wrong", in a similar sense. Let's see how.

Decomposing partially erroneous variables

For the purpose of this post, I am assuming atheism, which is the position I myself hold; for those of a religious disposition, it may help to assume that "God" refers here to a non-existent god from a different, wrong, religion.

Now, the "praising God" preference could be decomposed into other, simpler variables. It may feel like it's a single node, but it is likely composed of many different preferences linked together:

So, for example, those with that preference might think it good to be humble, feel there's something intrinsically valuable about holiness, enjoy the ecstatic feeling they get when praising God in a religious context, and feel that one should praise God out of a feeling of gratitude. We can consider that these form four of the foreground variables that the believer "cares about" in their partial model (what about the fifth node? We'll get back to that).

Note that the human didn't necessarily use a partial model with these four variables; they may instead have just a single variable entitled "praise god in heaven". However, we're decomposing the partially erroneous variable (since God doesn't exist), using the believer's web of connotation, into simpler variables.

Erroneous foreground variables

There's nothing wrong with the humbleness and ecstatic preferences. The gratitude node is problematic, because there ain't any God. If the believer only feels gratitude to God, then we can just erase that node as an error. However, it's possible that they might feel general gratitude towards their superiors, or towards their parents/mentors/those who've made them who they are. In that case, the preference for "Gratitude to God" is simply a special case of a very valid preference applied to an erroneous situation.

Then we have the interesting preference for "holiness". Holiness is not a well-grounded variable, but has its own web of connotation, maybe consisting of religious rituals, feelings of transcendence, spiritual experiences, and so on. Further splitting the node could allow us to home in on more precise preferences.

Affective spirals and subjective experience

There is a problem with decomposing complex variables into components: affective death spirals / halo effects. "Holiness" probably has a huge amount of positive connotations, as compared to the variables it could be decomposed into.

I'd tend to favour using the decomposed variables as the main indicator of preference. We can still use the halo-ed top variable for some components of human experience; for example, we might assume that the believer here really enjoys the subjective experience of holiness, even if that preference over the outside world is not as strong.

Erroneous values for background variables

What of the last node? Well, as Mark Twain pointed out,

Singing hymns and waving palm branches through all eternity is pretty when you hear about it in the pulpit, but it's as poor a way to put in valuable time as a body could contrive.

This brings us to the last node: no humans alive today is capable of spending their lives praising, nor would they enjoy the experience much, after the first few minutes or hours. Believers presumably don't think of themselves in heaven as mindless zombies; I'm going to assume that this hypothetical believer might imagine that the mere presence of God will transform their desires in such a way that makes praising a desirable activity for them.

Even if that is the case, it still means putting the believer in a situation where most aspects of their identity are irrelevant or overwritten. Their sexual preferences, favourite movies and computer games, favourite sporting events, sense of humour, quirks of personality, etc... all become irrelevant. Therefore, despite the believer's beliefs, they are contemplating a partial model in which the background variables are very different, even though the believer does not realise this difference (see here for another example of preference where one of the variables is wrong).

This is the main way in which the "praise God" preference is erroneous: it involves huge changes to their identity and preferences, but they are not aware of this in their partial models.

Spontaneous help decomposition

My "spontaneous help" preference can probably be decomposed in a similar fashion:

This has a "same identity" problem, just as the "praise God" has, but it's milder: people cooperate far more often than they sing praises for days on end. The bigger problem is the implicit assumption that when people do things for other people, this automatically has the same efficiency as markets or other more selfish coordination mechanisms. People helping each other is not supposed to leave people worse off.

If I was aware of that flaw AND still had that preference (which I mildly do, in practice), then I would have a non-wrong preference. Which doesn't mean that you have to agree with me, for you have your own preferences, but means that my preference is no longer "wrong" in the sense of this post: "Same efficiency" becomes the foreground variable "Efficiency", one that is actually negatively activated in this model, but not enough to overcome the positive activation of the other variables:


Ω 8

New Comment