(Spawned by an exchange between Louie Helm and Holden Karnofsky.)


The field of formal rationality is relatively new and I believe that we would be well-advised to discount some of its logical implications that advocate extraordinary actions.

Our current methods might turn out to be biased in new and unexpected ways. Pascal's mugging, the Lifespan Dilemma, blackmailing and the wrath of Löb's theorem are just a few examples on how an agent build according to our current understanding of rationality could fail.

Bayes’ Theorem, the expected utility formula, and Solomonoff induction are all reasonable heuristics. Yet those theories are not enough to build an agent that will be reliably in helping us to achieve our values, even if those values were thoroughly defined.

If we wouldn't trust a superhuman agent equipped with our current grasp of rationality to be reliably in extrapolating our volition, how can we trust ourselves to arrive at correct answers given what we know?

We should of course continue to use our best methods to decide what to do. But I believe that we should also draw a line somewhere when it comes to extraordinary implications.

Intuition, Rationality and Extraordinary Implications

It doesn't feel to me like 3^^^^3 lives are really at stake, even at very tiny probability.  I'd sooner question my grasp of "rationality" than give five dollars to a Pascal's Mugger because I thought it was "rational". — Eliezer Yudkowsky

Holden Karnofsky is suggesting that in some cases we should follow the simple rule that "extraordinary claims require extraordinary evidence".

I think that we should sometimes demand particular proof P; and if proof P is not available, then we should discount seemingly absurd or undesirable consequences even if our theories disagree.

I am not referring to the weirdness of the conclusions but the foreseeable scope of the consequences of being wrong about them. We should be careful in using the implied scope of certain conclusions to outweigh their low probability. I feel we should put more weight to the consequences of our conclusions being wrong than being right.

As an example take the idea of quantum suicide and assume it would make sense under certain circumstances. I wouldn’t commit quantum suicide even given a high confidence in the many-worlds interpretation of quantum mechanics being true. Logical implications just don’t seem enough in some cases.

To be clear, extrapolations work and often are the best we can do. But since there are problems such as the above, that we perceive to be undesirable and that lead to absurd actions and their consequences, I think it is reasonable to ask for some upper and lower bounds regarding the use and scope of certain heuristics.

We are not going to stop pursuing whatever terminal goal we have chosen just because someone promises us even more utility if we do what that person wants. We are not going to stop loving our girlfriend just because there are other people who do not approve our relationship and who together would experience more happiness if we divorced than the combined happiness of us and our girlfriend being in love. Therefore we already informally established some upper and lower bounds.

I have read about people who became very disturbed and depressed taking ideas too seriously. That way madness lies, and I am not willing to choose that path yet.

Maybe I am simply biased and have been unable to overcome it yet. But my best guess right now is that we simply have to draw a lot of arbitrary lines and arbitrarily refuse some steps.

Taking into account considerations of vast utility or low probability quickly leads to chaos theoretic considerations like the butterfly effect. As a computationally bounded and psychical unstable agent I am unable to cope with that. Consequently I see no other way than to neglect the moral impossibility of extreme uncertainty.

Until the problems are resolved, or rationality is sufficiently established, I will continue to put vastly more weight on empirical evidence and my intuition than on logical implications, if only because I still lack the necessary educational background to trust my comprehension and judgement of the various underlying concepts and methods used to arrive at those implications.

Expected Utility Maximization and Complex Values

One of the problems with my current grasp of rationality that I perceive to be unacknowledged are the consequences of expected utility maximization with respect to human nature and our complex values.

I am still genuinely confused about what a person should do. I don't even know how much sense that concept makes. Does expected utility maximization has anything to do with being human?

Those people who take existential risks seriously and who are currently involved in their mitigation seem to be disregarding many other activities that humans usually deem valuable because the expected utility of saving the world does outweigh the pursuit of other goals. I do not disagree with that assessment but find it troubling.

The problem is, will there ever be anything but a single goal, a goal that can either be more effectively realized and optimized to yield the most utility or whose associated expected utility simply outweighs all other values?

Assume that humanity managed to create a friendly AI (FAI). Given the enormous amount of resources that each human is poised to consume until the dark era of the universe, wouldn't the same arguments that now suggest that we should contribute money to existential risk charities then suggest that we should donate our resources to the friendly AI? Our resources could enable it to find a way to either travel back in time, leave the universe or hack the matrix. Anything that could avert the end of the universe and allow the FAI to support many more agents has effectively infinite expected utility.

The sensible decision would be to concentrate on those scenarios with the highest expected utility now, e.g. solving friendly AI, and worry about those problems later. But not only does the same argument always work but the question is also relevant to the nature of friendly AI and our ultimate goals. Is expected utility maximization even compatible with our nature? Does expected utility maximization lead to world states in which wireheading is favored, either directly or indirectly by focusing solely on a single high-utility goal that does outweigh all other goals?


  1. Being able to prove something mathematically doesn't prove its relation to reality.
  2. Relativity is less wrong than Newtonian mechanics but it still breaks down in describing singularities including the very beginning of the universe.

It seems to me that our notion of rationality is not the last word on the topic and that we shouldn't act as if it was.


New Comment
104 comments, sorted by Click to highlight new comments since: Today at 5:57 AM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

One ideal I have never abandoned and never considered abandoning is that if you disagree with a final conclusion, you ought to be able to exhibit a particular premise or reasoning step that you disagree with. Michael Vassar views this as a fundamental divide that separates sanitykind from Muggles; with Tyler Cowen, for example, rejecting cryonics but not feeling obligated to reject any particular premise of Hanson's. Perhaps we should call ourselves the Modusponenstsukai.

It's usually much harder to find a specific flaw in an argument than it is to see that there is probably something wrong with the conclusion. For example, I probably won't be able to spot the specific flaw in most proposed designs for a perpetual motion machine, but I can still conclude that it won't work as advertised!

I read "ought to be able to" not as "you're not allowed to reject the conclusion without rejecting a premise" so much as "you ought to be able to, so when you find you're not able to, it should bother you; you have learned that there's a key failing in your understanding of that area."

I agree and while reading Eliezer's comment I mentally added in something like "or if you cant then you explicitly model your confusion as being a limitation in your current understanding and so lower your confidence in the related suspect reasoning appropriately - ideally until your confusion can be resolved and your curiosity satisfied" as a footnote.

Just for fun: Classic Mathematical Fallacies - can you spot the step that's wrong?
The probability of us being wiped out by badly done AI is at least at 20%. I agree. The assumption of risks from AI is by itself reasonable. But I am skeptical of making complex predictions that are based on that assumption. I am skpetical of calculating the expected utility of mitigating risks from AI according to the utility associated with its logical implications. Take your following comment: I don't disagree that friendly AI research is currently a better option for charitable giving than charities concerned with environmental problems. Yet I have a hard time to accept that discounting the extinction of most species on the basis of the expected utility of colonizing the Herculus supercluster is sensible. If you want to convince people like Holden Karnofsky and John Baez then you have to show that risks from AI are more likely than they believe and that contributing to SI can make a difference. If you just argue in terms of logical implications then they will continue to frame SI in terms of Pascal's mugging. I can't. I can only voice my discomfort. And according to your posts on the Lifespan Dilemma and Pascal's mugging you share that discomfort, yet you are also unable to pinpoint a certain step that you disagree with.
If there is an argument that relies on many premises I can reject the conclusion, i.e., assign it a low probability while accepting, i.e., assigning high probability to, each individual premise.
One man's modus ponens is another man's modus tollens.

As an example take the idea of quantum suicide. I wouldn’t commit quantum suicide even given a high confidence in the many-worlds interpretation of quantum mechanics being true. Logical implications just don’t seem enough in some cases.

Red herring. The 'logical implications' of quantum suicide are that it's a terrible idea because you'll mostly die. Using quantum suicide as a reason to ignore logical implications is blatantly fallacious.

It's just an example. I'll now add to the post what I thought was already implicitly clear, if quantum suicide would make sense then I wouldn't do it. You know very well that mentioning the example that I really mean to talk about is forbidden around here.
It's just not an example of the phenomenon you are trying to illustrate at all! This should seem like an important consideration - because maintaining it as an example is undermining your post. It was clear, and clearly fallacious. "All X are False. For example, Y is false. If Y was an X then it would be false." The only thing being exemplified is the phenomenon of the people who think logical conclusions being bad just being wrong about what logical conclusions are and the logical conclusions are fine.
Quantum suicide might not increase your expected utility and is therefore the wrong choice for an expected utility maximizer. Yet quantum suicide is being taken seriously by some people and it is a logical implication of an interpretation of quantum mechanics. That I wouldn't suggest to follow through on it even given that it would maximize your expected utility seems to be a good example to highlight what I wanted to argue without talking about the forbidden topic: Discount implied utility of implications of logical implications of interpretations of data (but don't ignore it).
You are wrong about the logical implications. I actually agree that it is a representative example of your thesis as you have advocated it in your post. The aforementioned observations that it is obviously fallacious and relies on being confused about what logical implications are and mean stand.
wedrifid_level_obvious maybe but not universally obvious. Or do you accuse me of acting deliberately stupid? What I mean by logical implications are mechanic inferences made from premises that were previously established to be reasonable inferences given the available evidence. For example the established premises that gunshots to the head are often deadly and that those who die often leave behind mourning friends and family. You could continue to draw further inferences from those premises and establish the expected disutility of shooting someone in the head. But where do you draw the line here? The impact of any decision does propagate causal ripples that may or may not be amplified. Just like the influence of the starship you launched will continue even if you can no longer interact with it.
Your example sounds good for a second because it's "stickin' it to the man" - standing on your principles against the absurd forces of quantum suicide. But once you pause and work within the hypothetical (something humans are notoriously bad at), you're saying that you'd ignore a truth on principle just because it's weird. At that point we need to start breaking out the cautionary tales.
I am saying that my trust in one interpretation of quantum mechanics isn't enough. It might be reasonable to accept MWI, without additional empirical evidence, as the interpretation that makes the most sense given the evidence. But quantum suicide would be something that is based on that inference as a logical implication, which is justified by P(Y|X) ≈ 1, then P(X∧Y) ≈ P(X). And what I am saying is that I believe that we should draw a line somewhere when it comes to such logical implications. Because, even if they don't have to pay rent in future anticipations, the actions that such implications demand might be solely justified by the expected utility of the logical implications while our confidence is only as strong as our trust in the last step that required actual evidence.
I'm all for drawing lines when we have evidence against things, like we do for the proposition that suicide is harmless. But if you want to make up some hypothetical world where it "makes sense," that suicide is harmless, then presumably we wouldn't have that balance of evidence, since one generally includes the evidence when judging what makes sense. So if you're not judging on the evidence in this hypothetical world, you're judging based on the aesthetic properties of the proposition, basically. It's just not a good way to do things, because a non-human spontaneously giving birth to the first human is quite un-aesthetic, don't you think? One reason why it makes sense that we should always reject the harmlessness of suicide may be that humans are bad at hypotheticals. Propositions that we associate with low probability will drag that association with them into the darndest of places.
I can keep a secret and I'm interested in the concept of and theory behind quantum suicide. Can you PM me with an explanation of what you mean? ETA: if this is about the Roko thing, I've read that but I don't see how it relates.

Many of these issues arise from some combination of allowing unbounded utilities, and assuming utility is linear. Where problems seem to arise, this is the place to fix them. It is much easier to incorporate fixes into our utility function (which is already extremely complicated, and poorly understood) than it is to incorporate them into the rules of reasoning or the rules of evidence, which are comparatively simple, and built upon math rather than on psychology.

Bounded utility solves Pascal's Mugging and Torture vs Dust Specs straightforwardly. You choose some numbers to represent "max goodness" and "max badness"; really good and bad things approach these bounds asymptotically; and when you meet Pascal's mugger, you take "max badness", multiply it by an extremely tiny value, and get a very tiny value.

Quantum suicide is also a utility function issue, but not the same one. If your utility function only cares about average utility over the worlds in which you're still alive, then you should commit quantum suicide. But revealed preferences indicate that people care about all worlds, and philosophy seems to indicate that they should care about all worlds, so quantum suicide is wrong.

I actually have a different idea on a possible approach to this Problem. I tried to get a discussion about it started in the decision theory mailing group, but it sort of died, and I'm not sure it's safe to post in public.
Don't be silly; it's completely safe. The original agent, with a utility function that sometimes kicks out infinities, is undefined; those infinities can't be compared, and they propagate into the expected utility of actions so that actions can't be compared either. The replacement agent is defined, but only has a three-valued utility function, which means it can't express many preferences.
Always, independently of what kind of agent and what kind of infinities you use?
You get the equivalence you want if: your utilities lie in a totally ordered field extension of R, infinity is a constant greater than all elements of R, the utility of pure outcomes are restricted to be either R or +- infinity, and the relationship between utility and decision is in a certain generic position (so that the probability of every outcome changes whenever you change your decision, and these changes are never arranged to exactly make the utilities cancel out).
I'm not sure I understood all of that, but the pieces I did sound likely to be true about CEV or a papperclipper to me. Am I missing something?
Even if you had solid evidence that your utility function was bounded, there would still be a small probability that your utility function was unbounded or bounded at a much higher level than you presumed. Pascal's mugger simply has to increase his threat to compensate for your confidence in low-level utility bounding.
You can expand out any logical uncertainty about your utility function to get another utility function, and that is what must be bounded. This requires that the weighted average of the candidate utility functions converges to some (possibly higher) bound. But this is not difficult to achieve; and if they diverge, then you never really had a bounded utility function in the first place.
Another approach is to calibrate my estimation of probabilities such that my prior probability for the claim that you have the power to create X units of disutility decreases the greater X is. That is, if I reliably conclude that P(X) <= P(Y)/2 when X is twice the threat of Y, then increasing the mugger's threat won't make me more likely to concede to it.

Sure, agreed.

But the question arises, compared to what?

If I develop an algorithm A1 for solving certain problems that is more reliable than my own intuition, the fact that A1 is not perfectly reliable is a great reason to try and develop a superior algorithm A2. It's a poor reason to discard A1 and rely on my own intuition instead.

Exactly. What I am arguing is 1) that we should be risk averse about the actions suggested by algorithm A1, if they exceed a certain scope, and 2) that we should devote resources to 2.1) verifying the correctness of A1 by empirical evidence and or 2.2) trying to improve A1 or develop a superior algorithm A2. What we shouldn't do is to simply accept the actions that are recommended by A1 and follow through on them.
But you're evading the "compared to what?" question. I mean, OK, A1 suggests I do X. I perform a variety of empirical tests, on the basis of which I conclude that I should do Y (where Y implies NOT(X)). Fed the results of those empirical tests, A1 continues to suggest X. Granted that I should be working on developing A2 throughout, I'm still faced with a choice: do I do X or Y? Yes, I should be cognizant of the risks of trusting A1, so it's not clear I should do X. I should also be cognizant of the risks of trusting my brain, so it's not clear I should do Y.
If I could answer that then I would probably be the smartest person around here. I only "know" that if people decide that we should walk into death camps because some of our ethical and game theoretic insights suggest that to be the favorable option, then I'd rather go with my intuition and hope for someone like General Thud. I am sorry for giving such an unsatisfying answer. I simply have no clue. But I have this intuition that something is wrong here and that we should think about it.
That's an answer to my question, then: you consider your intuition more reliable than the predictions of ethical and game theory. Yes?
From an very early age on I suffer from various delusional ideas and feelings. I often have a strong feeling that food is poisoned or that I have to walk the same way twice because otherwise really bad things will happen. I could mention countless other example on how my intuition is nothing more than a wreckage. So I am possible better equipped to judge the shortcomings of human intuition than many other people. And yet there are situations in which I would rather trust my intuition.
OK. So a situation arises, and your intuition says you should do X, and the most reliable formal theory you've got says you should do Y, where Y implies NOT(X). For some situations you do X, for others you do Y, depending on how much you trust your intuition and how much you trust your formal theory. As far as I can tell, in this respect you are exactly like everyone else on this site. You see a difference, though, between yourself and the others on this site... a difference important enough that you continue to point out its implications. I can't quite tell what you think that difference is. Some possibilities: 1. You think they are trusting certain formal theories more than their own intuitions in situations where you would trust your intuition more. 2. You think they are trusting certain formal theories more than their own intuitions in situations where they ought to trust their intuitions more. 3. You think their intuitions are poor and they ought to intuit different things.
I can't speak for XiXiDu, but for myself it's a combination of all three. In particular, consciously held theories over time tends to affect one's intuition towards those theories. Thus I worry that by the time they actually wind up in such a conflict between theory and intuition, their intuitions will no longer be up to the task.
This sounds like a general argument in favor of acting on my intuitions rather than implementing theory. For example, if I intuit that turning left at this intersection will get me where I want to go, it seems that this argument suggests that I should turn left at this intersection rather than looking at a map. Am I misunderstanding you?
Come to think of it, I don't actually see how that follows from what I said. I said that intuitions can change as a result of consciously held theories, not that this is necessarily bad, depending on the theory (although it would be nice to keep an copy of an old intuition on ROM and do periodic sanity checks).
Sure. But if you start with intuition I1 and theory T at time T1, and subsequently end up with intuition I2 at time T2, what you seem to be endorsing is following I1 at T1 and I2 at T2. At no time are you endorsing following T if T conflicts with I at that time. Which is what I meant by acting on my intuitions rather than implementing theory. I'm at a complete loss for what a "sanity check" might look like. That is, OK, I have I2 in my brain, and I1 backed up on ROM, and I can compare them, and they make different judgments. Now what?
If I1 finds the judgement returned by I2 completely absurd even after looking at the argument, recognize that I should be confused and act accordingly.
No because I intuitively find that conclusion absurd.
So... is it possible for me to understand what your stated argument actually suggests about X if I don't know what your intuitive judgments on X are?
I don't fully understand your question, so I'll clarify my previous comment in the hope that that helps. Like I said, I find the notion that I should always rely on my intuition at the expense of looking at a map intuitively absurd, and that intuition is "stronger then" (for lack of a better term) then the intuition that I should turn left.
Yeah, I think that answers my question. If all you've got are intuitive judgments and a sense of their relative strength in various situations, then I need to know what your intuitive judgments about a situation are before I can apply any argument you make to that situation.
You should evaluate any argument I make on its merits, not on the basis of the intuitions I used to produce it.
Regardless of my evaluation of your argument, given what you've told me so far, I cannot apply it to real-world situations without knowing your intuitions. Or, at the very least, if I do apply it, there's no reason to expect that you will endorse the result, or that the result will be at all related to what you will do in that situation, since what you will in fact do (if I've understood your account correctly) is consult your intuitions in that situation and act accordingly, regardless of the conclusions of your argument.
Not true! The intuitions used constitute evidence! Evaluating only arguments provided and not the sampling used to provide them will (sometimes) lead you to wrong conclusions.
Accept Y but adjust its associated utility downwards according to your intuition. If after doing so it is still the action with the highest expected utility, then follow through on it and ignore your intuition.

Assume that humanity managed to create a friendly AI (FAI). Given the enormous amount of resources that each human is poised to consume until the dark era of the universe, wouldn't the same arguments that now suggest that we should contribute money to existential risk charities then suggest that we should donate our resources to the friendly AI? Our resources could enable it to find a way to either travel back in time, leave the universe or hack the matrix. Anything that could avert the end of the universe and allow the FAI to support many more agents has

... (read more)

I like this point from Terry Tao:

I think an epsilon of paranoia is useful to regularise these sorts of analyses. Namely, one supposes that there is an adversary out there who is actively trying to lower your expected utility through disinformation (in order to goad you into making poor decisions), but is only able to affect all your available information by an epsilon. One should then adjust one’s computations of expected utility accordingly. In particular, the contribution of any event that you expect to occur with probability less than epsilon should p

... (read more)
The trouble is that you can split a big event into P / epsilon chance-of-epsilon events, or average P / epsilon chance-of-epsilon events into a big event. In order to avoid inconsistency, you have to actually say what information you think should be treated as an average of nearby information.

Assume that humanity managed to create a friendly AI (FAI). Given the enormous amount of resources that each human is poised to consume until the dark era of the universe, wouldn't the same arguments that now suggest that we should contribute money to existential risk charities then suggest that we should donate our resources to the friendly AI? Our resources could enable it to find a way to either travel back in time, leave the universe or hack the matrix. Anything that could avert the end of the universe and allow the FAI to support many more agents has

... (read more)
No, not absurd. I was worried that we'll never get to the point where we actually "enjoy life" as human beings. No, that's not what I wanted to argue. I wrote in the post that we should continue to use our best methods. We should try to solve friendly AI. I said that we should be careful and discount some of the implied utility. Take for example the use of Bayes' theorem. I am not saying that we shouldn't use it, that would be crazy. What I am saying is that we should be careful in the use of such methods. If for example you use probability theory to update on informal arguments or anecdotal evidence you are still using your intuition to assign weight to to evidence. Using math and numeric probability estimates might make you unjustifiably confident of your results because you mistakenly believe that you don't rely on your intuition. I am not saying that we shouldn't use math to refine our intuition, what I am saying is that we can still be wrong by many orders of magnitutes as long as we are using our heuristics in an informal setting rather than evaluating data supplied by experimentation. Take this example. Julia Galef wrote: But by how much should a proper Bayesian reasoner increase her credence in H? Bayes' rule only tells us by how much given the input. But the variables are often filled in by our intuition.

As an example take the idea of quantum suicide. I wouldn’t commit quantum suicide even given a high confidence in the many-worlds interpretation of quantum mechanics being true. Logical implications just don’t seem enough in some cases.

You shouldn't commit quantum suicide because it decreases your measure, which by observation we know is important in ways we don't theoretically understand, and, unless you are very careful, the worlds where you escape death are not likely to be pleasant. You don't need skepticism of rationality itself to reach this conclusion.

I think some people don't understand that if quantum suicide works it wont prevent the gun from being fired: that doesn't kill you; it wont prevent the bullet from hitting your head: that doesn't kill you; it wont prevent the bullet from inflicting massive brain damage as long as certain information survives. It wont prevent the rest of your body from dying, for that doesn't immediately kill you. At this point there are no further worlds of significant quantum measure. Who knows what happens subjectively?
If "quantum immortality" really were a thing, its effect would not be to save you from subjective death after deciding to commit suicide, but to prevent you from deciding to do so, or from landing on a path wherein you'd decide to do so, in the first place. That's where all the measure is.
All the measure is only there later though. Until after the bullet hits your head, the worlds in which you shot the gun have similar measure as the worlds in which you didn't.
Yes, I understand that quantum suicide would be stupid. I always use such examples to avoid mentioning the forbidding topic. To be clear, I started posting here regularly mainly due to the forbidding topic. I was simply shocked but also fascinated by how seriously it was taken. I concluded that those people either are in possession of vast amounts of evidence that I lack or that they are too confident of the methods they use to arrive at correct beliefs.

Our current methods might turn out to be biased in new and unexpected ways. Pascal's mugging, the Lifespan Dilemma, blackmailing and the wrath of Löb's theorem are just a few examples on how an agent build according to our current understanding of rationality could fail.

I don't really get it. For example, building a machine that is sceptical of Pascal's wager doesn't seem harder than building a machine that is sceptical of other verbal offers unsupported by evidence. I don't see what's wrong with the idea that "extraordinary claims require extraordinary evidence".

The verbal offer isn't actually relevant to the problem, it's just there to dramatize the situation. Please formulate that maxim precisely enough to program into an AI in a way that solves the problem. Because the best way we currently have of formulating it, i.e., Bayseanism with quasi-Solomonoff priors doesn't solve it.
The idea of devoting more resources to investigating claims when they involve potential costs is involves decision theory rather than just mere prediction. However, vanilla reinforcement learning should handle this OK. Agents that don't investigate extraordinary claims will be exploited and suffer - and a conventional reinforcement learning agent can be expected to pick up on this just fine. Of course I can't supply source code - or else we would be done - but that's the general idea.
All claims involve decision theory in the sense that you're presumably going to act on them at some point. Would these agents also learn to pick up pennies in front of steam rollers? In fact, falling for Pascal's mugging is just the extreme case of refusing to pick up pennies in front of a steam roller, the question is where you draw a line dividing the two.
That depends on its utility function. The line (if any) is drawn as a consequence of specifying a utility function.
Verbal offers are evidence. Sometimes even compelling evidence. For example, I don't currently believe my friend Sam is roasting a turkey -- in fact, my prior probability of that is < 1%. But if Sam calls me up and says "Wanna come over for dinner? I'm roasting a turkey" my posterior probability becomes > 90%. Designing a system well-calibrated enough that its probability estimates cause it to make optimal choices across a narrow band of likelihoods is a simpler problem than designing a system that works across a much wider band.
True, but dangerous. Nobody really knows anything about general intelligence. Yet a combination of arguments that sound convincing when formulated in English and the reputation of a few people and their utterances are considered evidence in favor of risks from AI. No doubt that all those arguments constitute evidence. But people update on that evidence and repeat those arguments and add to the overall chorus of people who take risks from AI seriously which in turn causes other people to update towards the possibility. In the end much of all conviction is based on little evidence when put in perspective to the actual actions that such a conviction demands. I don't want to argue against risks from AI here. As I wrote many times, I support SI. But I believe that it takes more hard evidence to accept some of the implications and to follow through on drastic actions beyond basic research.
What drastic actions do you see other people following through on that you consider unjustified?
I am mainly worried about future actions. The perception of imminent risks from AI could give an enormous incentive to commit incredible stupid acts. Consider the following comment by Eliezer: I believe that this argument is unwise and that the line of reasoning is outright dangerous because it justifies too much in the minds of certain people. Making decisions on the basis of the expected utility associated with colonizing the Herculus supercluster is a prime example of what I am skeptical of.
Mostly, the actions I see people taking (and exhorting others to take) on LW are "do research" and "fund others doing research," to the negligible extent that any AI-related action is taken here at all. And you seem to support those actions. But, sure... I guess I can see how taking a far goal seriously might in principle lead to future actions other than research, and how those actions might be negative, and I can sort of see responding to that by campaigning against taking the goal seriously rather than by campaigning against specific negative actions. Thanks for clarifying.
Me neither, but quite a few people on lesswrong don't seem to share that opinion or are in possession of vast amounts of evidence that I lack. For example, some people seem to consider "interference from an alternative Everett branch in which a singularity went badly" or "unfriendly AI that might achieve complete control over our branch by means of acausal trade". Fascinating topics for sure, but in my opinion ridiculously far detached from reality to be taken at all seriously. Those ideas are merely logical implications of theories that we deem to reasonable. Another theory that is by itself reasonable is then used to argue that logical implications do not have to pay rent in future anticipations. And in the end, due to a combination of reasonable theories, one ends up with completely absurd ideas. I don't see how this could have happened if one would follow the rule that "extraordinary claims require extraordinary evidence".
I don't understand in what way the linked comment says anything about interference from alternative Everett branches. Did you mean to link to something else? I'm not sure what the majority view is on less wrong, but none of the people I have met in real life advocate making decisions based on (very) small probabilities of (very) large utility fluctuations. I think AI has probability at least 1% of destroying most human value under the status quo. I think 1% is a large enough number that it's reasonable to care a lot, although it's also small enough that it's reasonable not to care. However, I also think that the probability is at least 20%, and that is large enough that I think it is unreasonable not to care (assuming that preservation of humanity is one of your principle terminal values, which it may or may not be). Does this mean that I'm going to drop out of college to work at SingInst? No, because that closes a lot of doors. Does it mean that I'm seriously reconsidering my career path? Yes, and I am reasonably likely to act on those considerations.
Without machine intelligence, every single human alive today dies. One wonders how that value carnage would be quantified - using the same scale.
I agree. No, I think some people here use the +20% estimate on risks from AI and act according to some implications of logical implications. See here, which is the post the comment I linked to talked about. I have chosen that post because it resembled ideas put forth in another post on lesswrong that has been banned because of the perceived risks and because people got nightmares due to it.
I think you only get significant interference from "adjacent" worlds - but sure, this sounds a little strange, the way you put it. If we go back to the Pascal's wager post though - Eliezer Yudkowsky just seems to be saying that he doesn't know how to build a resouce-limited version of Solomonoff induction that doesn't make the mistake he mentions. That's fair enough - nobody knows how to build high quality approximations of Solomonoff induction - or we would be done by now. The point is that this isn't a problem with Solomonoff induction, or with the idea of approximating it. It's just a limitation in Eliezer Yudkowsky's current knowledge (and probably everyone else's). I fully expect that we will solve the problem, though. Quite possibly to do so, we will have to approximate Solomonoff induction in the context of some kind of reward system or utility function - so that we know which mis-predictions are costly (e.g. by resulting in getting mugged) - which will guide us to the best points to apply our limited resources.
It has nothing to do with recourse limitations, the problem is that Solomonoff induction itself can't handle Pascal's mugging. If anything, the resource limited version of Solomonoff induction is less likely to fall for Pascal's mugging since it might round the small probability down to 0.
In what way? You think that Solomonoff induction would predict enormous torture with a non-negligible propbability if it observed the mugger not being paid? Why do you think that? That conclusion seems extremely unlikely to me - assumung that the Solomonoff induction had had a reasonable amount of previous exposure of the world. It would, like any sensible agent, assume that the mugger was lying. That's why the original Pascal's mugging post post directed its criticism at "some bounded analogue of Solomonoff induction".
Because Solomonoff induction bases its priors on minimum message length and it's possible to encode enormous numbers like 3^^^3 in a message of length much less then 3^^^3. Because I understand mathematics. ;) What Eliezer was referring to is the fact that an unbounded agent would attempt to incorporate all possible versions of Pascal's wager and Pascal's mugging simultaneously and promptly end up with an ∞ − ∞ error.
Sure - but the claim there are large numbers of people waiting to be tortured also decreases in probability with the number of people involved. I figure that Solomonoff induction would give a (correct) tiny probability for this hypothesis being correct. Your problem is actually not with Solomonoff induction - despite what you say - I figure. Rather you are complaining about some decision theory application of Solomonoff induction - involving the concept of "utility".
What does this have to do with my point. It does, just not tiny enough to override the 3^^^3 utility difference. I don't have a problem with anything, I'm just trying to correct misconceptions about Pascal's mugging.
Well, your claim was that "Solomonoff induction itself can't handle Pascal's mugging" - which appears to be unsubstantiated nonsense. Solomonoff induction will give the correct answer based on Occamian priors and its past experience - which is the best that anyone could reasonably expect from it.
Hold on. What does "extraordinary claim" mean? I see two possible meanings: (1) a claim that triggers the "absurdity heuristic", or (2) a claim that is incompatible with many things that are already believed. The examples you gave trigger the absurdity heuristic, because they introduce large, weird structures into an area of concept space that does not normally receive updates. However, I don't see any actual incompatibilities between them and my pre-existing beliefs.
It becomes extraordinary at the point where the expected utility of the associated logical implications demands to take actions that might lead to inappropriately high risks. Where "inappropriately" is measured relative to the original evidence that led you to infer those implications. If the evidence is insufficient then discount some of the associated utility. Where "insufficient" is measured intuitively. In conclusion: Act according to your best formal theories but don't factor out your intuition.
So if I'm driving, and someone says "look out for that deer in the road!", that's an extraordinary claim because swerving is a large risk? Or did you push the question over into the word "inappropriately"?
Claims are only extraordinary with respect to theories.

Our current methods might turn out to be biased in new and unexpected ways. Pascal's mugging, the Lifespan Dilemma, blackmailing and the wrath of Löb's theorem are just a few examples on how an agent build according to our current understanding of rationality could fail.

What are you trying to do here? Are you trying to give specific examples of cases in which doing the rational thing could be the wrong thing to do? Surely not, that would be oxymoronic - if you already know that the 'rational thing' is a mistake then it isn't the rational thing. Failing... (read more)

Rational acts can be wrong - if the agent doesn't already know that. This happens regularly in Bayesian hell.
The point here is that giving examples of where in the future you already know that doing the rational thing to be wrong - that is what is absurd. If you already know what not to do then you don't do it.
The OP's point was that the "correct" actions were wrong accoriding to our current understanding of rationality - and his conclusion was that our current understanding of rationality might be wrong.
The OP is wrong. If our current understanding is that something is the wrong thing to do then our current understand of rationality doesn't do it. And that conclusion may be right, despite the argument being wrong.
I wrote that our current understanding of rationality is not the last word and that we should therefore take account of model uncertainty.
If that was the extent of what you wrote I would not have commented. In this case I replied to this: Giving those as examples implies you are saying something more than "our current understanding of rationality is the last word". Rejecting the position that argument supports is not nitpicking on definitions!
Oh boy...I just got what you are doing here. Nitpicking on a definition. Okay...of course rationality is winning and winning is doing what's right according to your utility-function. What I meant is obviously that our methods are not perfect at guiding us and satisfying our utility-functions.
Not even remotely.
I am trying to hint at the possibility that our methods might be mathematically justified but that they might lead to unexpected side-effects when applied by computationally bounded agents under extreme circumstances, as some of our thought experiments indicate. Our methods are the best we have and they work perfectly well on most problems we encounter. I am saying that we should discount some of the associated utility implications if we encounter edge cases. Ignoring the the implications would be irrational but taking them at face value wouldn't be wise either.

I think that we should sometimes demand particular proof P; and if proof P is not available, then we should discount seemingly absurd or undesirable consequences even if our theories disagree.

Possibly, if the theory predicts that proof P would be available, then the lack of such proof is evidence against the theory. Otherwise, alternate proof should be acceptable.

Pascal's mugging, the Lifespan Dilemma, blackmailing and the wrath of Löb's theorem

Sorry, what's wrong with Lob's theorem?

I have only skimmed over a very few comments so this might be very wrong: As far as I can tell, if an agent assumes that its methods are consistent then it might update on its past decisions as evidence on how to act correctly on future occasions. But while that line of reasoning misses the obvious fact that computationally bounded agents are fallible, given what we know an agent has to work under the assumption that its decision procedures are consistent. This leads to contradictions. An agent build according to our current understanding of decision theory can't be expected to self-modify to a correct decision theory (don't ask me why humans can do this). Anyway, my point in mentioning those problems (because they were listed as problems here or elsewhere) was to show that if we were to turn ourselves into the agents we desire to build, according our current laws of thought, then in some cases we would be worse off than we are now and in other cases like Pascal's mugging we couldn't be sure if we were going to make correct decisions (we are not sure if an agent with an unbounded finite utility function over outcomes is consistent). The model uncertainty involved is still profound and we shouldn't factor out human intuition at this point. Also, perceived absurdity does constitute evidence.

But my best guess right now is that we simply have to draw a lot of arbitrary lines and arbitrarily refuse some steps.

Can you name three alternatives, and why you reject them? How hard did you think to come up with alternatives?

wouldn't the same arguments that now suggest that we should contribute money to existential risk charities then suggest that we should donate our resources to the friendly AI?

My answer is "No." How hard did you try to discover if the best answer to that question is actually "No"?

For very large or very small probabilities, I agree it's important to start taking into account the "model uncertainty." And if some argument leads to the conclusion 2=1 (or that you should never act as if you'll die, which is of similar levels of wrong), of course you discount it, not in defiance of probability, but with probability, since we have so much evidence against that claim.

However, in the "donating to SIAI" case, I don't think we're actually talking about particularly large or small probabilities, or fallacious arguments. Implications can be labeled "extraordinary" for being socially unusual. This sort of extraordinary doesn't seem like it should be discounted.

This behavior isn't actually "socially unusual", in fact there are many social institutions that this resembles at least from an outside view, they're commonly called "cults". What this means is that humans seem to have a bias in favor of donating to "their cult" and believing they're acting rationally while doing so. As such you should consider whether you're belief that it's rational to donate to SIAI is affected by the same bias.
You're right, you should. Although there are some serious holes in the claim that SIAI looks like a cult using the outside view, that's not totally relevant. My point is that you should correct for this kind of bias using probabilities, rather than saying "well, I don't find the conclusions aesthetically pleasing, so since I'm biased I can just throw them out." And if you correct for the bias, and model uncertainty, and include all the evidence, and you still get the aesthetically unpleasing answer, well, tough.

It seems to me that what you're saying is that our theories of rationality are ultimately based on a process of reflection that starts with pre-theoretical judgments about what we think is rational, about what dispositions help agents achieve their goals, etc., and that this means that we should take quite seriously our pre-theoretical judgments when they strongly conflict with our theories of rationality.

This is right in principle (as well as in the Pascal's Mugging case), but I think you're too conservative. For example, should I heavily discount argumen... (read more)

True, but science was invented because people are really bad at judging evidence. I am troubled by the prospect of people using the core principles of science, e.g. Bayesianism, and applying them loosely and informally to vaguely understood conjectures and follow through on the implied actions. Prediction, experimentation, peer review and the demand of empirical evidence is what makes science strong. If you think that you can use your rationality in a combination with Bayesianism and run with it then you confound your puny human brain with that of the hypothetical superintelligence that you dreamed up. What I am arguing for is to be more conservative when it comes to the theoretically superior heuristics being discussed within this community.
Most of the science we're impressed with was done before peer review, and I don't think there's much evidence that peer review is helpful on net.

Relativity is less wrong than Newtonian mechanics but it still breaks down in describing singularities

What does it mean for reality to break down? What does it mean for reality to "describe" something?

The quoted sentence doesn't make any sense to me, and doesn't seem to follow from the article text.

We are not going to stop loving our girlfriend

Interesting use of "we" :)

[This comment is no longer endorsed by its author]Reply
Oops, I managed to misread that for some reason.

New to LessWrong?