Automatically crossposted

Creating really good outcomes for humanity seems hard. We get bored. If we don’t get bored, we still don’t like the idea of joy without variety. And joyful experiences only seems good if they are real and meaningful (in some sense we can’t easily pin down). And so on.

On the flip side, creating really bad outcomes seems much easier, running into none of the symmetric “problems.” So what gives?

I’ll argue that nature is basically out to get us, and it’s not a coincidence that making things good is so much harder than making them bad.

First: some other explanations

Two common answers (e.g. see here and comments):

  • The worst things that can quickly happen to an animal in nature are much worse than the best things that can quickly happen.
  • It’s easy to kill or maim an animal, but hard to make things go well, so “random” experiences are more likely to be bad than good.

I think both of these are real, but that the consideration in this post is at least as important.

Main argument: reward errors are asymmetric

Suppose that I’m building an RL agent who I want to achieve some goal in the world. I can imagine different kinds of errors:

  • Pessimism: the rewards are too low. Maybe the agent gets a really low reward even though nothing bad happened.
  • Optimism: the rewards are too high. Maybe the agent gets a really high reward even though nothing good happened, or gets no reward even though something bad happened.

Pessimistic errors are no big deal. The agent will randomly avoid behaviors that get penalized, but as long as those behaviors are reasonably rare (and aren’t the only way to get a good outcome) then that’s not too costly.

But optimistic errors are catastrophic. The agent will systematically seek out the behaviors that receive the high reward, and will use loopholes to avoid penalties when something actually bad happens. So even if these errors are extremely rare initially, they can totally mess up my agent.

When we try to create suffering by going off distribution, evolution doesn’t really care. It didn’t build the machinery to be robust.

But when we try to create incredibly good stable outcomes, we are fighting an adversarial game against evolution. Every animal forever has been playing that game using all the tricks it could learn, and evolution has patched every hole that they found.

In order to win this game, evolution can implement general strategies like boredom, or an aversion to meaningless pleasures. Each of these measures makes it harder for us to inadvertently find a loophole that gets us high reward.

Implications

Overall I think this is a relatively optimistic view: some of our asymmetrical intuitions about pleasure and pain may be miscalibrated for a world where we are able to outsmart evolution. I think evolution’s tricks just mean that creating good worlds is difficult rather than impossible, and that we will be able to create an incredibly good world as we become wiser.

It’s possible that evolution solved the overoptimism problem in a way that is actually universal—such that it is in fact impossible to create outcomes as good as the worst outcomes are bad. But I think that’s unlikely. Evolution’s solution only needed to be good enough to stop our ancestors from finding loopholes, and we are a much more challenging adversary.

89

22 comments, sorted by Highlighting new comments since Today at 11:54 AM
New Comment
Pessimistic errors are no big deal. The agent will randomly avoid behaviors that get penalized, but as long as those behaviors are reasonably rare (and aren’t the only way to get a good outcome) then that’s not too costly.

Also, if an outcome really is very bad, evolution has no reason to limit the amount of suffering experienced.

Getting burned is bad for you. Evolution makes it painful so you know to avoid it. But if strong and extreme pain result in the same amount of avoidance, evolution has no reason to choose "strong" over "extreme". In fact, it might prefer "extreme" to get a more robust outcome.

And that's how we get the ability to inflict the kind of torture which is 'worse than death', and to imagine a Hell (and simulations thereof) with infinite negative utility, even though evolution doesn't have a concept of a fate being "worse than death" - certainly not worse than the death of your extended family.

I'm unsure that "extreme" would necessarily get a more robust response, considering that there comes a point where the pain becomes disabling.

It seems as though there might be some sort of biological "limit" insofar as there are limited peripheral nerves, the grey matter can only process so much information, etc., and there'd be a point where the brain is 100% focused on avoiding the pain (meaning there'd be no evolutionary advantage to having the capacity to process additional pain). I'm not really sure where this limit would be, though. And I don't really know any biology so I'm plausibly completely wrong.

I'm unsure that "extreme" would necessarily get a more robust response

I meant robust in the sense of decreasing the number of edge cases where the pain is insufficiently strong to motivate the particular individual as strongly as possible. (Since pain tolerance is variable, etc.) Evolution "wants" pain to be a robust feedback/control mechanism that reliably causes the desired amount of avoidance - in this case, the greatest possible amount.

there comes a point where the pain becomes disabling.

That's an excellent point. Why would evolution allow (i.e. not select against) the existence of disabling pain (and fear, etc)? 

Presumably, in the space of genotypes available for selection - in the long term view, and for animals besides humans - there are no cheap solutions that would have an upper cut-off to pain stimuli (below the point of causing unresponsiveness) without degrading the avoidance response to lower levels of pain.

There is also the cutoff argument: a (non-human) animal can't normally survive e.g. the loss of a limb, so it doesn't matter how much pain exactly it feels in that scenario. Some cases of disabling pain fall in this category. 

Finally, evolution can't counteract human ingenuity in torture, because humans act on much smaller timescales. It is to be expected that humans who are actively trying to cause pain (or to imagine how to do so) will succeed in causing amounts of pain beyond most anything found in nature.

Evolution "wants" pain to be a robust feedback/control mechanism that reliably causes the desired amount of avoidance - in this case, the greatest possible amount.

I feel that there's going to be a level of pain for which a mind of nearly any level of pain tolerance would exert 100% of its energy to avoid. I don't think I know enough to comment on how much further than this level the brain can go, but it's unclear why the brain would develop the capacity to process pain drastically more intense than this; pain is just a tool to avoid certain things, and it ceases to become useful past a certain point.

There are no cheap solutions that would have an upper cut-off to pain stimuli (below the point of causing unresponsiveness) without degrading the avoidance response to lower levels of pain.

I'm imagining a level of pain above that which causes unresponsiveness, I think. Perhaps I'm imagining something more extreme than your "extreme"?

It is to be expected that humans who are actively trying to cause pain (or to imagine how to do so) will succeed in causing amounts of pain beyond most anything found in nature.

Yeah, agreed.

 it's unclear why the brain would develop the capacity to process pain drastically more intense than this

The brain doesn't have the capacity to correctly process extreme pain. That's why it becomes unresponsive or acts counterproductively. 

The brain has the capacity to perceive extreme pain. This might be because:

  • The brain has many interacting subsystems; the one(s) that react to pain stop working before the ones that perceive it
  • The range of perceivable pain (that is, the range in which we can distinguish stronger from weaker pain) is determined by implementation details of the neural system. If there was an evolutionary benefit to increasing the range, we would expect that to happen. But if the range is greater than necessary, that doesn't mean there's an evolutionary benefit to decreasing it; the simplest/most stable solution stays in place.

even though evolution doesn't have a concept of a fate being "worse than death" - certainly not worse than the death of your extended family.

But as long as you're alive, evolution can penalize you for losses of 'utility' (reproductive ability). For example:

Loss of a limb:

  • is excruciating
  • seems likely to decrease ability to reproduce and take care of offspring and relatives - especially in "the evolutionary environment".

The only weird part of this story is the existence of suicide.*

 

*From a basic evolutionary perspective, it makes more sense in the piece History is Written by the Losers where it is revealed that the man who 'invented history**' in China chose to live rather than commit suicide after being sentenced to castration - an act which was inconceivable at the time. The context in which people would rather commit suicide than avoid other circumstances however, paints a picture of suicide as a social phenomenon. Arguably this might makes sense evolutionarily, but it's a more complicated picture.

"because he was principled enough to contradict the emperor in the presence of his court, Sima Qian was sentenced to castration. This was a death sentence—any self-respecting man of his day would commit suicide before submitting to the procedure. Everyone expected Sima Qian to do so. But in the end Sima Qian decided to accept the punishment and live the rest of his life in shame, because if he did not he would never finish the history he had started."

**The idea of writing things down as they happened so future generations would know. 

"That a great thinker could profitably spend his time sorting through evidence, trying to tie together cause and effect, distinguishing truth from legend, then present what is found in a written historical narrative—it is an idea that seems to have never occurred to anyone on the entire subcontinent. Only in Greece and in China did this notion catch hold."

We get bored. If we don’t get bored, we still don’t like the idea of joy without variety.

For what it's worth I think this popular sentiment is misplaced. Joy without variety is just as good as joy with variety, but most people either never experience it or fail to realize they are experiencing it and so don't learn that joy without variety is equally good to joy with variety. Instead I think most people experience rare moments of peak joy and since they don't understand how they happened they come to believe that joy comes from variety itself rather than anywhere else.

Perhaps hobbies are areas where people understand this about themselves, albeit narrowly.

If we don’t get bored, we still don’t like the idea of joy without variety. And joyful experiences only seems good if they are real and meaningful

I don't understand what you mean about not liking the idea of joy without variety. Do you mean that people don't want to constantly feel joy, that they would rather feel a range of different emotions, including the unpleasant ones? This is not true for me personally.

Also, why do joyful experiences need to be real or meaningful? I think there is meaning in pleasure itself. Perhaps the most joyful experiences you have had were experiences that were "real and meaningful", so you have come to see joy as being inextricably connected to meaningfulness. Throughout evolution, this was probably true. But with modern technology, and the ability to biohack our neurochemistry, this association is no longer a given.

Hedonic adaptation (feeling reward/penalty for the relative change, much more than the absolute situation) may be a key strategy for this. It adjusts both upward and downward, to avoid either mistake for very long.

But optimistic errors are catastrophic. The agent will systematically seek out the behaviors that receive the high reward, and will use loopholes to avoid penalties when something actually bad happens. So even if these errors are extremely rare initially, they can totally mess up my agent.

In other words, make sure your agent can't wirehead! And from evolution's point of view, not spending your time promoting your inclusive genetic fitness is wireheading.

We didn't have the technology to literally wirehead until maybe very recently, so we went about it the long way round. But we still spend a lot of time and resources on e.g. consuming art, even absent signalling benefits, like watching TV alone at home. And evolution doesn't seem to be likely to "fix" that given some more time.

We don't behave in a "Malthusian" way, investing all extra resources in increasing the number or relative proportion of our descendants in the next generation. Even though we definitely could, since population grows geometrically. It's hard to have more than 10 children, but if every descendant of yours has 10 children as well, you can spend even the world's biggest fortune. And yet such clannish behavior is not a common theme of any history I've read; people prefer to get (almost unboundedly) richer instead, and spend those riches on luxuries, not children.

And evolution doesn't seem to be likely to "fix" that given some more time.

Why would you suppose that?

We don't behave in a "Malthusian" way, investing all extra resources in increasing the number or relative proportion of our descendants in the next generation. Even though we definitely could, since population grows geometrically. It's hard to have more than 10 children, but if every descendant of yours has 10 children as well, you can spend even the world's biggest fortune. And yet such clannish behavior is not a common theme of any history I've read; people prefer to get (almost unboundedly) richer instead, and spend those riches on luxuries, not children.

Isn't that just due to the rapid advance of technology creating a world in disequilibrium? In the ancestral environment of pre-agricultural societies these behaviors you describe line up with maximizing inclusive genetic fitness pretty well; any recorded history you can read is too new and short to reflect what evolution intended to select for.

Why would you suppose that?

Exactly for the reason you give yourself - we now change our behavior and our environment on much shorter timescales than evolution operates on, due in large part to modern technology. We have a goal of circumventing evolution (see: this post) and we modify our goals to suit ourselves. Evolution is no longer fast enough to be the main determinant of prevailing behavior.

In the ancestral environment of pre-agricultural societies these behaviors you describe line up with maximizing inclusive genetic fitness pretty well

We don't know almost anything about most relevant human behavior from before the invention of writing. Did they e.g. consume a lot of art (music, storytelling, theater, dancing)? How much did such consumption correlate with status or other fitness benefits, e.g. by conspicuous consumption or advertising wealth? We really don't know.

Ok I see, I was just confused by the wording "given some more time". I've become less optimistic over time about how long this disequilibrium will last given how quickly certain religious communities are growing with the explicit goal of outbreeding the rest of us.

Pessimistic errors are no big deal. The agent will randomly avoid behaviors that get penalized, but as long as those behaviors are reasonably rare (and aren’t the only way to get a good outcome) then that’s not too costly.

But optimistic errors are catastrophic. The agent will systematically seek out the behaviors that receive the high reward, and will use loopholes to avoid penalties when something actually bad happens. So even if these errors are extremely rare initially, they can totally mess up my agent.

I'd love to see someone analyze this thoroughly (or I'll do it if there will be an interest). I don't think it's that simple, and it seems like this is the main analytical argument.

For example, if the world is symmetric in the appropriate sense in terms of what actions get you rewarded or penalized, and you maximize expected utility instead of satisficing in some way, then the argument is wrong. I'm sure there is good literature on how to model evolution as a player, and the modeling of the environment shouldn't be difficult.

I suspect it all comes down to modeling of outcome distributions. If there's a narrow path to success, then both biases are harmful. If there are a lot of ways to win, and a few disasters, then optimism bias is very harmful, as it makes the agent not loss-averse enough. If there are a lot of ways to win a little, and few ways to win a lot, then pessimism bias is likely to miss the big wins, as it's trying to avoid minor losses.

I'd really enjoy an analysis focused on your conditions (maximize vs satisfice, world symmetry) - especially what kinds of worlds and biased predictors lead satisficing to get better outcomes than optimizing.

For example, if the world is symmetric in the appropriate sense in terms of what actions get you rewarded or penalized, and you maximize expected utility instead of satisficing in some way, then the argument is wrong. I'm sure there is good literature on how to model evolution as a player, and the modeling of the environment shouldn't be difficult.

I would think it would hold even in that case, why is it clearly wrong?

I may be mistaken. I tried reversing your argument, and I bold the part that doesn't feel right.

Optimistic errors are no big deal. The agent will randomly seek behaviours that get rewarded, but as long as these behaviours are reasonably rare (and are not that bad) then that’s not too costly.
But pessimistic errors are catastrophic. The agent will systematically make sure not to fall into behaviors that avoid high punishment, and will use loopholes to avoid penalties even if that results in the loss of something really good. So even if these errors are extremely rare initially, they can totally mess up my agent.

So I think that maybe there is inherently an asymmetry between reward and punishment when dealing with maximizers.

But my intuition comes from somewhere else. If the difference between pessimism and optimism is given by a shift by a constant then it ought not matter for a utility maximizer. But your definition goes at errors conditional on the actual outcome, which should perhaps behave differently.

I think this part of the reversed argument is wrong:

The agent will randomly seek behaviours that get rewarded, but as long as these behaviours are reasonably rare (and are not that bad) then that’s not too costly

Even if the behaviors are very rare, and have a "normal" reward, then the agent will seek them out and so miss out on actually good states.

But there are behaviors we always seek out. Trivially, eating, and sleeping.

I think that the intuition for this argument comes from something like a gradient ascent under an approximate utility function. The agent will spend most of it's time near what it perceives to be a local(ish) maximum.

So I suspect the argument here is that Optimistic Errors have a better chance of locking into a single local maximum or strategy, which get's reinforced enough (or not punished enough), even though it is bad in total.

Pessimistic Errors are ones in which the agent strategically avoids locking into maxima, perhaps by Hedonic Adaptation as Dagon suggested. This may miss big opportunities if there are actual, territorial, big maxima, but that may not be as bad (from a satisficer point of view at least).

If this is the case, this seems more like a difference in exploration/exploitation strategies.

We do have positively valenced heuristics for exploration - say curiosity and excitement