Creating really good outcomes for humanity seems hard. We get bored. If we don’t get bored, we still don’t like the idea of joy without variety. And joyful experiences only seems good if they are real and meaningful (in some sense we can’t easily pin down). And so on.
On the flip side, creating really bad outcomes seems much easier, running into none of the symmetric “problems.” So what gives?
I’ll argue that nature is basically out to get us, and it’s not a coincidence that making things good is so much harder than making them bad.
First: some other explanations
Two common answers (e.g. see here and comments):
- The worst things that can quickly happen to an animal in nature are much worse than the best things that can quickly happen.
- It’s easy to kill or maim an animal, but hard to make things go well, so “random” experiences are more likely to be bad than good.
I think both of these are real, but that the consideration in this post is at least as important.
Main argument: reward errors are asymmetric
Suppose that I’m building an RL agent who I want to achieve some goal in the world. I can imagine different kinds of errors:
- Pessimism: the rewards are too low. Maybe the agent gets a really low reward even though nothing bad happened.
- Optimism: the rewards are too high. Maybe the agent gets a really high reward even though nothing good happened, or gets no reward even though something bad happened.
Pessimistic errors are no big deal. The agent will randomly avoid behaviors that get penalized, but as long as those behaviors are reasonably rare (and aren’t the only way to get a good outcome) then that’s not too costly.
But optimistic errors are catastrophic. The agent will systematically seek out the behaviors that receive the high reward, and will use loopholes to avoid penalties when something actually bad happens. So even if these errors are extremely rare initially, they can totally mess up my agent.
When we try to create suffering by going off distribution, evolution doesn’t really care. It didn’t build the machinery to be robust.
But when we try to create incredibly good stable outcomes, we are fighting an adversarial game against evolution. Every animal forever has been playing that game using all the tricks it could learn, and evolution has patched every hole that they found.
In order to win this game, evolution can implement general strategies like boredom, or an aversion to meaningless pleasures. Each of these measures makes it harder for us to inadvertently find a loophole that gets us high reward.
Overall I think this is a relatively optimistic view: some of our asymmetrical intuitions about pleasure and pain may be miscalibrated for a world where we are able to outsmart evolution. I think evolution’s tricks just mean that creating good worlds is difficult rather than impossible, and that we will be able to create an incredibly good world as we become wiser.
It’s possible that evolution solved the overoptimism problem in a way that is actually universal—such that it is in fact impossible to create outcomes as good as the worst outcomes are bad. But I think that’s unlikely. Evolution’s solution only needed to be good enough to stop our ancestors from finding loopholes, and we are a much more challenging adversary.