Dec 29, 2011
(Spawned by an exchange between Louie Helm and Holden Karnofsky.)
The field of formal rationality is relatively new and I believe that we would be well-advised to discount some of its logical implications that advocate extraordinary actions.
Our current methods might turn out to be biased in new and unexpected ways. Pascal's mugging, the Lifespan Dilemma, blackmailing and the wrath of Löb's theorem are just a few examples on how an agent build according to our current understanding of rationality could fail.
Bayes’ Theorem, the expected utility formula, and Solomonoff induction are all reasonable heuristics. Yet those theories are not enough to build an agent that will be reliably in helping us to achieve our values, even if those values were thoroughly defined.
If we wouldn't trust a superhuman agent equipped with our current grasp of rationality to be reliably in extrapolating our volition, how can we trust ourselves to arrive at correct answers given what we know?
We should of course continue to use our best methods to decide what to do. But I believe that we should also draw a line somewhere when it comes to extraordinary implications.
It doesn't feel to me like 3^^^^3 lives are really at stake, even at very tiny probability. I'd sooner question my grasp of "rationality" than give five dollars to a Pascal's Mugger because I thought it was "rational". — Eliezer Yudkowsky
Holden Karnofsky is suggesting that in some cases we should follow the simple rule that "extraordinary claims require extraordinary evidence".
I think that we should sometimes demand particular proof P; and if proof P is not available, then we should discount seemingly absurd or undesirable consequences even if our theories disagree.
I am not referring to the weirdness of the conclusions but the foreseeable scope of the consequences of being wrong about them. We should be careful in using the implied scope of certain conclusions to outweigh their low probability. I feel we should put more weight to the consequences of our conclusions being wrong than being right.
As an example take the idea of quantum suicide and assume it would make sense under certain circumstances. I wouldn’t commit quantum suicide even given a high confidence in the many-worlds interpretation of quantum mechanics being true. Logical implications just don’t seem enough in some cases.
To be clear, extrapolations work and often are the best we can do. But since there are problems such as the above, that we perceive to be undesirable and that lead to absurd actions and their consequences, I think it is reasonable to ask for some upper and lower bounds regarding the use and scope of certain heuristics.
We are not going to stop pursuing whatever terminal goal we have chosen just because someone promises us even more utility if we do what that person wants. We are not going to stop loving our girlfriend just because there are other people who do not approve our relationship and who together would experience more happiness if we divorced than the combined happiness of us and our girlfriend being in love. Therefore we already informally established some upper and lower bounds.
I have read about people who became very disturbed and depressed taking ideas too seriously. That way madness lies, and I am not willing to choose that path yet.
Maybe I am simply biased and have been unable to overcome it yet. But my best guess right now is that we simply have to draw a lot of arbitrary lines and arbitrarily refuse some steps.
Taking into account considerations of vast utility or low probability quickly leads to chaos theoretic considerations like the butterfly effect. As a computationally bounded and psychical unstable agent I am unable to cope with that. Consequently I see no other way than to neglect the moral impossibility of extreme uncertainty.
Until the problems are resolved, or rationality is sufficiently established, I will continue to put vastly more weight on empirical evidence and my intuition than on logical implications, if only because I still lack the necessary educational background to trust my comprehension and judgement of the various underlying concepts and methods used to arrive at those implications.
One of the problems with my current grasp of rationality that I perceive to be unacknowledged are the consequences of expected utility maximization with respect to human nature and our complex values.
I am still genuinely confused about what a person should do. I don't even know how much sense that concept makes. Does expected utility maximization has anything to do with being human?
Those people who take existential risks seriously and who are currently involved in their mitigation seem to be disregarding many other activities that humans usually deem valuable because the expected utility of saving the world does outweigh the pursuit of other goals. I do not disagree with that assessment but find it troubling.
The problem is, will there ever be anything but a single goal, a goal that can either be more effectively realized and optimized to yield the most utility or whose associated expected utility simply outweighs all other values?
Assume that humanity managed to create a friendly AI (FAI). Given the enormous amount of resources that each human is poised to consume until the dark era of the universe, wouldn't the same arguments that now suggest that we should contribute money to existential risk charities then suggest that we should donate our resources to the friendly AI? Our resources could enable it to find a way to either travel back in time, leave the universe or hack the matrix. Anything that could avert the end of the universe and allow the FAI to support many more agents has effectively infinite expected utility.
The sensible decision would be to concentrate on those scenarios with the highest expected utility now, e.g. solving friendly AI, and worry about those problems later. But not only does the same argument always work but the question is also relevant to the nature of friendly AI and our ultimate goals. Is expected utility maximization even compatible with our nature? Does expected utility maximization lead to world states in which wireheading is favored, either directly or indirectly by focusing solely on a single high-utility goal that does outweigh all other goals?
It seems to me that our notion of rationality is not the last word on the topic and that we shouldn't act as if it was.