Bayesian Adjustment Does Not Defeat Existential Risk Charity

34GuySrinivasan

37steven0461

16lukeprog

249eB1

7steven0461

0G0W51

5Dias

4Creutzer

0Dias

5Creutzer

2Dias

4Creutzer

0Creutzer

21Nick_Beckstead

9steven0461

1private_messaging

19Eliezer Yudkowsky

20David_Gerard

10Eliezer Yudkowsky

2A1987dM

5Elithrion

0Eliezer Yudkowsky

2wedrifid

1A1987dM

0hankx7787

13Adele_L

11jsteinhardt

13Paul Crowley

5Larks

0steven0461

10HoldenKarnofsky

12Eliezer Yudkowsky

1private_messaging

8ArisKatsaris

2David_Gerard

4ArisKatsaris

7CarlShulman

-3private_messaging

-4MugaSofer

4CarlShulman

-2MugaSofer

2CarlShulman

-2MugaSofer

7CarlShulman

0G0W51

0MugaSofer

7Eliezer Yudkowsky

0private_messaging

3David_Gerard

3private_messaging

2Eliezer Yudkowsky

1David_Gerard

-3private_messaging

3wedrifid

-3private_messaging

1wedrifid

-1Eliezer Yudkowsky

9private_messaging

13Eliezer Yudkowsky

9private_messaging

5private_messaging

0A1987dM

1G0W51

4Elithrion

8steven0461

6Eliezer Yudkowsky

10Paul Crowley

5A1987dM

2Larks

10xv15

1A1987dM

3xv15

1Decius

2Elithrion

1lukeprog

3gwern

3HoldenKarnofsky

4private_messaging

2private_messaging

2wedrifid

-1private_messaging

1steven0461

-4private_messaging

1Eliezer Yudkowsky

1Decius

0Decius

0pinyaka

0pengvado

0pinyaka

0[anonymous]

New Comment

90 comments, sorted by Click to highlight new comments since: Today at 12:19 PM

Some comments are truncated due to high volume. (⌘F to expand all)

Good post. Asking "okay, how sensitive is Karnofsky's counterargument to the size of the priors?" and *actually answering* that question was very worthwhile IMO.

Your post was funded by MIRI. Can you tell us what they asked? Was it "evaluate Karnofsky's argument", "rebut this post", "check the sensitivity of the argument to the priors' size and expand on it", "see how much BA affects our estimates", or what?

Wonderful post. Thank you.

I have a feeling that the fundamental difference between your position and GiveWell's arises not from a difference of opinion regarding mathematical arguments but because of a difference of values. Utilitarianism doesn't say that I have to value potential people at anything approaching the level of value I assign to living persons. In particular, valuing potential persons at 0 negates many arguments that rely on speculative numbers to pump expected utility into the present, and I'm not even sure if it's not right. Suppose that you had to choose between killing everyone currently alive at the end of their natural life spans, or murdering all but two people whom you were assured would repopulate the planet. My preference would be the former, despite it meaning the end of humanity. Valuing potential people without an extremely high discount rate also leads one to be strongly pro-life, to be against birth control programs in developing nations, etc.

Another possibility is that GiveWell's true reason is based on the fact that recommending MIRI as an efficient charity would decrease their probability of becoming substantially larger (through attracting large numb...

711y

Karnofsky has, as far as I know, not endorsed measures of charitable effectiveness that discount the utility of potential people. (On the other hand, as Nick Beckstead points out in a different comment and as is perhaps under-emphasized in the current version of the main post, neither has Karnofsky made a general claim that Bayesian adjustment defeats existential risk charity. He has only explicitly come out against "if there's even a chance" arguments. But I think that in the context of his posts being reposted here on LW, many are likely to have interpreted them as providing a general argument that way, and I think it's likely that the reasoning in the posts has at least something to do with why Karnofsky treats the category of existential risk charity as merely promising rather than as a main focus. For MIRI in particular, Karnofsky has specific criticisms that aren't really related to the points here.)
While valuing potential persons at 0 makes existential risk versus other charities a closer call than if you included astronomical waste, I think the case is still fairly strong that the best existential risk charities save more expected currently-existing lives than the best other charities. The estimate from Anna Salamon's talk linked in the main post makes investment into AI risk research roughly 4 orders of magnitude better for preventing the deaths of currently existing people than international aid charities. At the risk of anchoring, my guess is that the estimate is likely to be an overestimate, but not by 4 orders of magnitude. On the other hand, there may be non-existential risk charities that achieve greater returns in present lives but that also have factors barring them from being recommended by GiveWell.

09y

Actually, according to this transcript on page four, Holden finds that the claim that the value of creating a life is "some reasonable" ratio of the value of saving a current life is very questionable. More exactly, the transcript sad:

511y

I agree; this is excelent.
In ten years time, you see a nine year old child fall into a pond. Do you save her from drowning? If so, you, in 2023, place value on people who aren't born in 2013. If you don't value those people now, in 2013, you're temporally inconsistent.
Obviously this isn't utilitarianism, but I think many people are unaware of this argument, despite its being from very common intuitions.
Are these programs' net desirability so self-evident that they constitute evidence against caring about future people? Yes, you could say "but they're good for economic growth and the autonomy of women etc.", those are reasons that would support supporting the programs even if we cared about future people. I think in general the desirability of contraception should be an output, rather than an input, to our expected value calculations.
On the other hand, if you're the sort of person who doesn't care about people far away in time, it might be sensible not to care about people far away in space.

411y

What do you mean by "place value on people"? Your example is explained by placing value on the non-occurrence (or lateness) of their death. This is quite independent from placing value on the existence of people, and is therefore irrelevant to contraception, the continuation of humanity, etc.

011y

You care about the deaths of people without caring about people?
What if I changed the example - and it's about whether or not to help educate the child, or comport her, or feed her. Do we are about the education, hunger and happiness of the child also, without caring about the child?

511y

You can say that a death averted or delayed is a good thing without being committed to saying that a birth is a good thing. That's the point I was trying to make.
Similarly, you can "care about people" in the sense that you think that, given that a person exists, they should have a good life, without thinking that a world with people who have good lives is better than a world with no people at all.

211y

No you can't. Consider three worlds, only differing with regards person A.
* In world 1, U(A) = 20.
* In world 2, U(A) = 10.
* In world 3, U(A) = undefined, as A does not exist.
Which world is best? As we agree that people who exist should have a good life, U(1) > U(2). Assume U(2)=U(3), as per your suggest that we're unconcerned about people's existence/non-existence. Therefore, by transitivity of preference, U(1) > U(3). So we do care about A's existence or non-existence.

411y

But U(3) = U(2) doesn't reflect what I was suggesting. There's nothing wrong with assuming U(3) ≥ U(1). You can care about A even though you think that it would have been better if they hadn't been born. You're right, though, about the conclusion that it's difficult to be unconcerned with a person's existence. Cases of true indifference about a person's birth will be rare.
Personally, I can imagine a world with arbitrarily happy people and it doesn't feel better to me than a world where those people are never been born; and this doesn't feel inconsistent. And as long as the utility I can derive from people's happiness is bounded, it isn't.

0[anonymous]11y

U(2)=U(3) isn't "a world with people who have good lives is not better than a world with no people at all". That would be U(1)=U(3).

Thank you for writing this post. I feel that additional discussion of these ideas is valuable, and that this post adds to the discussion.

Note about my comment below: Though I’ve spoken with Holden about these issues in the past, what I say here is what I think, and shouldn’t be interpreted as his opinion.

I don’t think Holden’s arguments are intended to show that existential risk is not a promising cause. To the contrary, global catastrophic risk reduction is one of GiveWell Labs’ priority causes. I think his arguments are only intended to show that one can't appeal to speculative explicit expected value calculations to convincingly argue that targeted existential risk reduction is the best area to focus on. This perspective is much more plausible than the view that these arguments show that existential risk is not the best cause to investigate.

I believe that Holden's position becomes more plausible with the following two refinements:

Define the prior over good accomplished in terms of “lives saved, together with all the ripple effects of saving the lives.” By “ripple effects,” I mean all the indirect effects of the action, including speeding up development, reducing existential ri

911y

Thanks for your detailed comment! I certainly agree that, if one takes into account ripple effects where saving lives leads to reduced existential risk, the disparities between direct ways of reducing existential risk on the one hand and other efficient ways of saving people's lives on the other hand are no longer astronomical in size. I learned of this argument partway into writing the post, and subsection 5.5 was meant to address it, but it's quite rough and far from the final word on that subject, particularly if you compare direct efforts to medium-direct efforts rather than to very indirect efforts.
It sounds as though, to model your intuitions on the situation, instead of putting a probability distribution on how many DALYs one could save by donating a dollar to a given charity, we'd instead have to put a probability distribution on what % of existential risk you could rationally expect to reduce by donating one dollar to a given charity. Does that sound right?
I would weakly guess that such a model would favor direct over semi-direct existential risk reduction and strongly guess that such a model would favor direct over indirect existential risk reduction. This is just based on thinking that some of the main variables relevant to existential risk are being pushed on by few enough people, and in ways that are sufficiently badly thought through, that there's likely to be low-hanging fruit to be picked by those who analyze the issues in a sufficiently careful and calculating manner. But this is a pretty vague and sketchy argument, and it definitely seems worth discussing this sort of model more thoroughly.

111y

I think the number one issue is that in so much as the beneficiaries are putting a lot of effort to advance selectively the lines of argument which benefit them personally, there is a huge issue that the positive components of the sum are extremely over represented (people are actually being paid a lot of money to produce those), whereas other options (money in bank + a strategy when to donate, donations to charities that improve education, etc) are massively under valued.
Keep in mind also that the utility of money in bank also becomes enormous, for people who do not quite donate to a bunch of folks with no background in anything (often not even prior history of economically superior employment!) but would donate to existential risk charity founded by people who have clear accomplishments in competitive fields, who poured in their own money, quitted lucrative jobs, and so on, whose involvement and whose dramatic statements are not explainable by self interest alone in absence of any belief in the impact. (Note that existence of such does not even require selfless people, when we are speaking of, among other things, their own personal survival and/or their own revival from frozen state)

Some really fast comments on the Pascal's Mugging part:

1) For ordinary x-risk scenarios, the Hansonian inverse-impact adjustment for "you're unlikely to have a large impact" is within conceivable reach of the evidence - if the scenario has you affecting 10^50 lives in a future civilization, that's just 166 bits of evidence required.

2) Of course, if you're going to take a prior of 10^-50 at face value, you had better not start spouting deep wisdom about expert overconfidence when it comes to interpreting the likelihood ratios - only invoking "expert overconfidence" on one kind of extreme probability really is a recipe for being completely oblivious to the facts.

3) The Hansonian adjustment starts out by adding up to expected value ratios around 1 - it says that based on your priors, all scenarios that put you in a unique position to affect different large numbers of people in the same per-person way will have around the same expected value. Evidence then modifies this. If Pascal's Mugger shows you evidence with a million-to-one Bayesian likelihood ratio favoring the scenario where they're a Matrix Lord who has put you in a situation to affect 3^^^3 lives, the...

True, that was a strange word. I may have been spending too much time thinking about large numbers lately. My point is that it's not literally unreachable the way a Levin-prior penalty on running speed makes quantum mechanics (in all forms) absolutely implausible relative to any amount of evidence you can possibly collect, or the Hansonian penalty makes ever being in a position to influence 3^^^3 future lives "absolutely implausible" relative to any amount of info you can collect in less than log(3^^^3) time, given that your sensory bandwidth is on the order of a few megabits per second.

As soon as you start trying to be "reasonable" or "skeptical" or "outside view" or whatever about the likelihood ratios involved in the evidence, obviously 10^-50 instantly goes to an eternally unreachable prior penalty since after all over the course of the human species people have completely hallucinated more unlikely things due to insanity on far fewer than 10^50 tries, etcetera. That's part of what I was trying to get at with (2). But if you're saying that, then it's also quite probable that the Hansonian adjustment is inappropriate or that you otherw...

211y

That expresses what I thought better than I could have myself.

511y

We can have a new site slogan. "Participate on LessWrong to increase your simulation measure!"

011y

You should only do things that increase your simulation measure after receiving good personal news or when you are unusually happy, obviously.

211y

This isn't obvious. Or, rather, this is a subjective preference and people who prefer to increase their simulation measure independently of attempts to amplify (one way of measuring the perception of) good events are far from incoherent. For that matter people who see no value in increasing simulation measure specifically for good events are also quite reasonable (or at least not thereby shown to be unreasonable).
Your 'should' here prescribes preferences to others, rather than (merely) explaining how to achieve them.

111y

Previously discussed here.
(EDIT: I see that you already commented on that thread, but I'm leaving this comment here for anyone else reading this thread.)

0[anonymous]11y

jghtgh

It's worth noting that a 1 in a million prior of a charity being extraordinarily effective isn't that unreasonable: there are over 1 million 501(c)(3) organizations in the U.S. alone, and presumably a large fraction of these are charities, and presumably most of them are not extraordinarily effective.

(I'm not claiming that you argue that it is unreasonable, I'm just including the data here for others to refer to.)

If I ask you to guess which of a million programs produces an output that scores highest on some complicated metric, but you don't know anything about the programs, you're going to have a one in a million chance of guessing correctly. Given the further information that these three, and only these three, were written with the specific goal of doing well on that metric, and all the others were trying to do well on related but different metrics, and suddenly it's more likely than not that one of those three does best.

There are very few charities that are trying to be the most efficient from a utilitarian point of view. It's likely that one of them is.

511y

Ok, but if that's your reference class, "isn't a donkey sanctuary" counts as evidence you can update on. It seems there's large classes of charities we can be confident will not be extraordinarily effective, and these don't include FHI, MIRI etc.

011y

Yes. There's a choice as to what to put into the prior and what to put into the likelihood. This makes it more difficult to make claims like "this number is a reasonable prior and this one is not". Instead, one has to specify the population the prior is about, and this in turn affects what likelihood ratios are reasonable.

Thanks for this post - I really appreciate the thoughtful discussion of the arguments I've made.

I'd like to respond by (a) laying out what I believe is a big-picture point of agreement, which I consider more important than any of the disagreements; (b) responding to what I perceive as the main argument this post makes against the framework I've advanced; (c) responding on some more minor points. (c) will be a separate comment due to length constraints.

**A big-picture point of agreement: the possibility of vast utility gain does not - in itself - disqualify a giving opportunity as a good one, nor does it establish that the giving opportunity is strong.** I'm worried that this point of agreement may be lost on many readers.

The OP makes it sound as though I believe that a high enough EEV is "ruled out" by priors; as discussed below, that is not my position. I agree, and always have, that "Bayesian adjustment does not defeat existential risk charity"; however, I think it defeats an existential risk charity that makes no strong arguments for its ability to make an impact, and relies on a "Pascal's Mugging" type argument for its appeal.

On the flip side, I belie...

On the flip side, I believe that a lot of readers believe that "Pascal's Mugging" type arguments are sufficient to establish that a particular giving opportunity is outstanding

Who? I'm against Pascal's Mugging. I invented that term to illustrate something that I thought was a fallacy. I'm pretty sure a supermajority of LW would not pay Pascal's Mugger. I'm on the record as saying that x-risk folk should not argue from low probabilities of large impacts, (1) because there are at least medium-probability interventions against xrisk and these will knock any low-probability interventions off the table if the money used for them is genuinely fungible (admittedly people who donate to anti-asteroid efforts cannot be persuaded to just donate to FAI instead), and (2), with (1) established, that it's logically rude and bad rationalist form to argue that a probability can be arbitrarily tiny because it makes you insensitive to the state of reality. I can reasonably claim to have personally advanced the art of further refuting Pascal's Mugging. Who are these mysterious hosts of silly people who believe in Pascal's Mugging, and what are they doing *here* of all places?

111y

http://lesswrong.com/lw/6w3/the_125000_summer_singularity_challenge/4krk

811y

You can randomly accuse people that what they believe constitutes Pascal's mugging, but that doesn't make the accusation a valid argument, unless you show that it's so.
There's a very simple test to see if someone actually accepts Pascal's mugging: Go to them and say "I'll use my godlike hidden powers to increase your utility by 3^^^3 utilons if you hand over to me the complete contents of your bank account."
Don't just claim that something else they believe is the same as Pascal's mugging or I might equally easily claim that someone buying health insurance is a victim of Pascal's mugging.

211y

Just to be clear: are we saying that a factor of 3^^^3 is a Pascal's mugging, but a factor of 10^30 isn't? (In Holden's comment above, one example in the context of Pascal's mugging-type problems is a factor of 10^10, even as that's on the order of the population of the Earth.)
I think any reasonable person hearing "8 lives saved per dollar donated" would file it with Pascal's mugging (which is Eliezer's term, but the concept is pretty simple and comprehensible even to someone thinking of less extreme probabilities than Eliezer posits; e.g. Holden, above).
In the linked thread, Rain special-pleads that the topic requires very large numbers to talk about, but jsteinhardt counters that that doesn't make humans any better at reasoning about tiny probabilities multiplied by large numbers. jsteinhardt also points out that just because you can multiply a small number by a large number doesn't mean the product actually makes any sense at all.

411y

No. The problem with Pascal's mugging doesn't lie merely in the particular hoped-for payoff, it's that in extreme combinations of small chance/large payoff, the complexity of certain hypotheses doesn't seem sufficient to adequately (as per our intuitions) penalize said hypotheses.
If I said "give me a dollar, and I'll use my Matrix Lord powers to have three dollars appear in your wallet", someone can simply respond that the chances of me being a Matrix Lord is less than one in three, so the expected payoff is less than the cost. But we don't yet to have a clear, mathematically precise way to explain why we should also respond negatively to "give me a dollar, and I'll use my Matrix Lord powers to save 3^^^3 lives.", even though our intuition says we should (and in this case we trust our intuition).
To put it in brief: Pascal's Mugging is a interesting problem regarding decision theory which LessWrongers should be hoping to solve (I have an idea towards that direction, which I'm writing a discusion post about, but I'd need mathematicians to tell me if it potentially leads to anything); not just a catchphrase you can use to bash someone else's calculations when their intuitions differs from yours.

711y

Yes, we do: bounded utility functions work just fine without any mathematical difficulties, and seem to map well to the psychological mechanisms that produce our intuitions. Objections to them are more philosophical and person-dependent.
If we are going to be invoking intuition, then we should be careful about using examples with many extraneous intuition-provoking factors, and in thinking about how the intuitions are formed.
For example, handing over $1 to a literal Pascal's Mugger, a guy who asks for the money out of your wallet in exchange for magic outputs, after trying and failing to mug you with a gun (which he found he forgot at home), is clearly less likely to get a big finite payoff than other uses of the money. The guy is claiming two things: 1) large payoffs (in things like life-years or dollars, not utility, which depends on your psychology) are physically possible 2) conditional on 1, the payoffs are more likely from paying him than other uses of money. Realistic amounts of evidence won't be enough to neutralize 1), but would easily neutralize 2).
Heuristics which tell you not to pay off the mugger are right, even for total utilitarians.
Moreover, many of our intuitions look to be heuristics trained with past predictive success and delivery of individual rewards in one's lifetime. If you save 1000 lives, trillions of person-seconds, you will not get billions of times the reinforcement you would get from eating a chocolate bar. You may get a 'warm glow' and some social prestige for success, but this will be a reward of ordinary scale in your reinforcement system, not enough to overcome astronomically low probabilities. So learned intuitions will tend to move you away from what would be good deals for an aggregative utilitarian, since they are bad deals in terms of discounted status and sex and chocolate.
Peter Singer argues that we should then discount those intuitions trained for non-moral purposes. Robin Hanson might argue that morality is overrat

-311y

And so does speed prior.
Yes. I have an example of why the intuition "but anyone can do that" is absolutely spot on. You give money to this mugger (and similar muggers), then another mugger shows up, and noticing doubt in your eyes, displays a big glowing text in front of you which says, "yes, i really have powers outside the matrix". Except you haven't got the money. Because you were being completely insane, by the medical definition of the term - your actions were not linked to reality in any way, and you failed to consider the utility of potential actions that are linked to reality (e.g. keep the money, give to a guy that displays the glowing text).
The intuition is that sane actions should be supported by evidence, whereas actions based purely on how you happened to assign priors, are insane. (And it is utterly ridiculous to say that low probability is a necessary part of Pascal's wager, because as a matter of fact, probability must be high enough.) . I have a suspicion that this intuition reflects the fact that generally, actions conditional on evidence, have higher utility than any actions not conditional on evidence.

-411y

Such as, for example, the fact that killing 3^^^^^^3 people shouldn't be OK because there's still 3^^^3 people left and my happiness meter is maxed out anyway.
Self-consistent isn't the same as moral.

411y

Bounded utility functions can represent more than your comment suggests, depending on what terms are included. See this discussion.

-211y

Sorry, I might be just blinded by the technical language, but I'm not seeing why that link invalidates my comment. Could you maybe pull a quote, or even clarify?

211y

E.g. the example above suggests something like a utility function of the form "utility equals the amount of quantity A for A<S, otherwise utility is equal to S" which rejects free-lunch increases in happy-years. But it's easy to formulate a bounded utility function that takes such improvements, without being fanatical in the tradeoffs made.
Trivially, it's easy to give a bounded utility function that always prefers a higher finite quantity of A but still converges, although eventually the preferences involved have to become very weak cardinally. A function with such a term on human happiness would not reject an otherwise "free lunch". You never "max out," just become willing to take smaller risks for incremental gains.
Less trivially, one can include terms like those in the bullet-pointed lists at the linked discussion, mapping to features that human brains distinguish and care about enough to make tempting counterexamples: "but if we don't account for X, then you wouldn't exert modest effort to get X!" Terms for relative achievement, e.g. the proportion (or adjusted proportion) of potential good (under some scheme of counterfactuals) achieved, neutralize an especially wide range of purported counterexamples.

-211y

... it is? Maybe I'm misusing the term "bounded utility function". Could you elaborate on this?

711y

Yes, I think you are misusing the term. It's the utility that's bounded, not the inputs. Say that U=1-(1/(X^2) and 0 when X=0, and X is the quantity of some good. Then utility is bounded between 0 and 1, but increasing X from 3^^^3 to 3^^^3+1 or 4^^^^4 will still (exceedingly slightly) increase utility. It just won't take risks for small increases in utility. However, terms in the bounded utility function can give weight to large numbers, to relative achievement, to effort, and all the other things mentioned in the discussion I linked, so that one takes risks for those.

09y

Bounded utility functions still seem to cause problems when uncertainty is involved. For example, consider the aforementioned utility function U(n) = 1 - (1 / (n^2)), and let n equal the number of agents living good lives. Using this function, the utility of a 1 in 1 chance of there being 10 agents living good lives equals 1 - (1 / (10^2)) = 0.99, and the utility of a 9 in 10 chance of 3^^^3 agents living good lives and a 1 in 10 chance of no agents living good lives roughly equals 0.1 0 + 0.9 1 = 0.9. Thus, in this situation the agent would be willing to kill (3^^^3) - 10 agents in order to prevent a 0.1 chance of everyone dying, which doesn't seem right at all. You could modify the utility function, but I think this issue would still to exist to some extent.

011y

Ah, OK, I was thinking of a bounded utility function as one with a "cutoff point", yes. You're absolutely right.

711y

To be really clear, the problem with Pascal's Mugging is that even after eliminating infinity as a coherent scenario, any simplicity prior which defines simplicity strictly over computational complexity will apparently yield divergent returns for aggregative utility functions when summed over all probable scenarios, because the material size of possible scenarios grows much faster than their computational complexity (Busy Beaver function or just tetration).
The problem with Pascal's Wager on the other hand is that it shuts down an ongoing conversation about plausibility by claiming that it doesn't matter how small the probability is, thus averting a logically polite duty to provide evidence and engage with counterarguments.

011y

That seems overly specific. There are many other ways in which priors assigned to highly speculative propositions may not be low enough, or when impact of other available actions on a highly speculative scenario be under-evaluated.
To me, Pascal's Wager is defined by a speculative scenario for which there exist no evidence, which has high enough impact to result in actions which are not based on any evidence, despite the uncertainty towards speculative scenarios.
How THE HELL does the above (ok, I didn't originally include the second quotation, but still) constitute confusion of Pascal's Wager and Pascal's Mugging, let alone "willful misinterpretation" ?

311y

I certainly consider that if you multiply a very tiny probability by a huge payoff and then expect others to take your calculation seriously as a call to action, you're being silly, however it's labeled. Humans can't even consider very tiny probabilities without privileging the hypothesis.

311y

Note also that a crazy mugger could demand $10 or else 10^30 people outside the matrix will die, and then argue that you should rationally trust him 100% so the figure is 10^29 lives/$ , or argue that it is 90% certain that those people will die because he's a bit uncertain about the danger in the alternate worlds, or the like. It's not about the probability which mugger estimates, it's about the probability that the typical payer estimates.

211y

PASCAL'S WAGER IS DEFINED BY LOW PROBABILITIES NOT BY LARGE PAYOFFS
PASCAL'S WAGER IS DEFINED BY LOW PROBABILITIES NOT BY LARGE PAYOFFS
PASCAL'S WAGER IS DEFINED BY LOW PROBABILITIES NOT BY LARGE PAYOFFS

111y

I will certainly admit that the precise label is not my true objection, and apologise if I have seemed to be arguing primarily over definitions (which is of course actually a terrible thing to do in general).

-311y

Maybe look at the context of the conversation here? edit: to be specific, you might want to reply to HoldenKarnofsky; after all, the utility of convincing him that he's incorrect in describing it as "Pascal's Mugging" type arguments ought to be huge...
edit2: and if it's not clear, I'm not accusing anyone of anything. Holden said,
I just linked an example of phenomenon which I think may be the cause of Holden's belief. Feel free to correct him with your brilliant argument that he should simply test if they actually accept Pascal's Mugging by asking them about 3^^^3 utilons.

311y

Not a Pascal's Mugging.

-311y

Still might be the thing Holden Karnofsky refers to in the following passages:
...
...

111y

And yet remains clearly not the thing that is talked about by either Eliezer or your actual comment.
If it is valuable to make the observation that Holden really isn't referring to what Eliezer assumes he is then by all means make that point instead of the one you made.

-111y

PASCAL'S WAGER IS DEFINED BY LOW PROBABILITIES NOT BY LARGE PAYOFFS
PASCAL'S WAGER IS DEFINED BY LOW PROBABILITIES NOT BY LARGE PAYOFFS
PASCAL'S WAGER IS DEFINED BY LOW PROBABILITIES NOT BY LARGE PAYOFFS
I've tried saying this in small letters a number of times, and once in the main post The Pascal's Wager Fallacy Fallacy, and people apparently just haven't paid attention, so I'm just going to try shouting it over and over every time somebody makes the same mistake over and over.

911y

In original Pascal's wager, he had a prior of 0.5 for existence of God.
edit: And in case it's not clear, the point is that Pascal's wager does not depend on the misestimate of probability being low. Any finite variation requires that the probability is high enough .
Likewise, here (linked from the thread I linked) you have both: a prior which is silly high (1 in 2000), and big impact (7 billion lives).
edit: whoops. 1 in 2000 and general talk of low probabilities is in the thread, not in the video. In the video she just goes ahead assigning arbitrary 30% probability to picking an organization with which we live and without which we die, which is obviously so high that much like Pascal's wager going from 0.5 probability to "the probability could be low, the impact is still infinite!", so does the LW discussion of this video progress from un-defensible 30% to it doesn't matter. Let's picture a Pascal Scam: someone says that there is 50% probability (mostly via ignorance) that unless they are given a lot of money, 10^30 people will die. The audience doesn't buy 50% probability, but it does still pay up.

(Reply to edit: In the presentation that 30% is one probability in a chain, not an absolute value. Stop with the willful misrepresentations, please.)

From the article:

However, Pascal realizes that the value of 1/2 actually plays no real role in the argument, thanks to (2). This brings us to the third, and by far the most important, of his arguments...

If there were a 0.5 probability that the Christian God existed, the wager would make a fuckton more sense. Today we think Pascal's Wager is a logical fallacy rather than a mere mistaken probability estimate only because later versions of the argument were put forward for lower probabilities, and/or because Pascal went on to argeu that it would carry for lower probabilities.

If the video is where is the actual instance of Pascal's Wager is being offered in support of SIAI, then it would have been better to link it directly. I also hate video because it's not searchable, but I can hardly blame you for that, so I will try scanning it.

Before scanning, I precommit to renouncing, abjuring, and distancing MIRI from the argument in the video if it argues for no probability higher than 1 in 2000 of FAI saving the world, because I myself ...

911y

It doesn't merely have to have something to do with MIRI, it must be the case that without funding MIRI we all die, and with funding MIRI, we don't, and this is precisely the sort of thing that should have very low probability if MIRI is not demonstrably impressive at doing something else.
Hmm. It is mentioned here and other commenters there likewise talk of low probabilities. I guess I just couldn't quite imagine someone seriously putting a non small probability on "with MIRI we live, without we die" aspect of it. Startups have quite small probability of success, even without attempting to do the impossible.
edit: And of course what actually matters is donor's probability.

511y

For this to work out to 7%, a donor would need 30% probability that their choice of the organization to donate to is such that with this organization we live, and without, we die.
What donor can be so confident in their choice? Is Thiel this confident? Of course not, he only puts in a small fraction of his income, and he puts more into something like this. By the way I am rather curious about your opinion on this project.

011y

Are you sure they are wrong about what constitutes Pascal's mugging, rather than about whether the probability of xrisk is low?

19y

I don't think you've really explained why you don't accept the arguments in the post. Could you please explain why and how the difference between assigning low probability to something and having high confidence it's incorrect is relevant? I have several points to discuss, but I need to fully understand your argument before doing so.
And yes, I know I am practicing the dark art of post necromancy. But the discussion has largely been of great quality and I don't think your comment has been appropriately addressed.

I'm surprised this post doesn't at least *mention* temporal discounting. Even if it's somewhat unpopular in utilitarian circles, it's sufficiently a part of mainstream assessments of the future and of basic human psychology that I would think its effects on astronomical waste (and related) arguments should at the very least be considered.

811y

The post discusses the limiting case where astronomical waste has zero importance and the only thing that matters is saving present lives. Extending that to the case where astronomical waste has some finite level of importance based on time discounting seems like a matter of interpolating between full astronomical waste and no astronomical waste.

611y

Where consistent (i.e. exponential) time discounting is concerned, there is very little intermediate ground between "nothing is important if it happens in 1,000,000 years" and "it is exactly as important as the present day".

511y

Yep. On the other hand, you can (causally or acausally) trade with your future self.

211y

When it comes to "the utility function is not up for grabs", we should jetison hyperbolic discounting far before we reject the idea that I'm the same agent now as in one second's time.

We can't jettison hyperbolic discounting if it actually describes the relationship between today-me and tomorrow-me's preferences. If today-me and tomorrow-me *do* have different preferences, there is nothing in the theory to say which one is "right." They simply disagree. Yet each may be well-modeled as a rational agent.

The default fact of the universe is that you aren't the same agent today as tomorrow. An "agent" is a single entity with one set of preferences who makes unified decisions for himself, but today-you can't make decisions for tomorrow-you any more than today-you can make decisions for today-me. Even if today-you seems to "make" a decision for tomorrow-you, tomorrow-you can just do something else. When it comes down to it, today-you isn't the one pulling the trigger tomorrow. It may turn out that you are (approximately) an individual with consistent preferences over time, in which case it's equivalent to today-you being able to make decisions for tomorrow-you, but if so that would be a very special case.

There are evolutionary pressures that encourage agency and exponential discounting in particular. I have also seen models that tri...

111y

There are such things as commitment devices.

311y

That is true. But there are also such things as holding another person at gunpoint and ordering them to do something. It doesn't make them the same person as you. Their preferences are different even if they seem to behave in your interest.
And in either case, you are technically not deciding the other person's behavior. You are merely realigning their incentives. They still choose for themselves what is the best response to their situation. There is no muscle now-you can flex to directly make tomorrow-you lift his finger, even if you can concoct some scheme to make it optimal for him tomorrow.
In any case, commitment devices don't threaten the underlying point because most of the time they aren't available or cost-effective, which means there will still be many instances of behavior that are best described by non-exponential discounting.

111y

You can, however discount exponentially and remain the same agent.

211y

While that's true, in many cases (e.g. asteroid detection) the interventions may be worthwhile when astronomical waste has vast importance, but not worthwhile when they have zero. It would be informative to know on which of those sides, for example, an exponential discount rate of 5% falls. Also, discounting additionally reduces the value of future years of present lives, so there are some differences because of that as well.

111y

If you're interested, see: Cowen, Caring about the Distant Future.

Rereading this reminds me of something Gelman said, about people who

strain on the gnat of the prior distribution while swallowing the camel that is the likelihood.

In his post, Karnofsky has strained at the gnat of the prior of high-impact interventions existing while swallowing the camel of the normal/log-normal distributions.

Responses on some more minor points (see my previous comment for big-picture responses):

Regarding "BA updates on a point estimate rather than on the full evidence that went into the point estimate" - I don't understand this claim. BA updates on the full probability distribution of the estimate, which takes into account potential estimate error. The more robust the estimate, the smaller the BA.

Regarding "double-counting" priors, I have not advocated for doing both an explicit "skepticism discount" in one's EEV calculation and t...

411y

Yes. I would definitely pay significant money to stop e.g. nuclear war conditional on twelve 6-sided dice all rolling 1 . (In the case of dice, pretty much any natural choice of a prior for the initial state of the dice before they bounce results in probability very close to 1/6 for each side).
Formally, it is the case that a number which can be postulated in an argument grows faster than any computable function of the length of the argument, if the "argument" is at least Turing complete (i.e. can postulate a Turing machine with a tape for it). And, subsequently, if you base priors on the length alone, the sum is not even well defined, and it's sign is dependent on the order of summation, and so on.
If we sum in the order of increasing length, everything is dominated by theories that dedicate largest part of their length to making up a really huge number (as even very small increase in this part dramatically boosts the number), so it might even be possible for a super-intelligence or even human-level intelligence to obtain an actionable outcome out of it - something like destroying low temperature labs because the simplest theory which links a very large number to actions does so by modifying laws of physics a little so that very cold liquid helium triggers some sort of world destruction or multiverse destruction, killing people who presumably don't want to die. Or conversely, liquid helium maximization as it stabilizes some multiverse full of people who'd rather live than die (I'd expect the former to dominate because unusual experiments triggering some sort of instability seems like something that can be postulated more succintly). Or maximization of the number of anti-protons. Something likewise very silly, where the "appeal" is in how much of the theory length it leaves to making the consequences huge. Either way, starting from some good intention (saving people from involuntary death, CEV, or what ever), given a prior that only discounts theories for their le

The much bigger issue is that for some anthropogenic risk (such as AI), the risk is caused by people, and can be increased by funding some groups of people. The expected utility thus has both positive and negative terms, and if you generate a biased list (e.g. by listening to what organization says about itself), and sum it, the resulting sum tells you nothing about the sign of expected utility.

211y

It tells you something about the sign of the expected utility. It is still evidence. Sometimes it could even be evidence in favor of the expected utility being negative.

-111y

Given other knowledge, yes.

111y

I agree: the argument given here doesn't address whether existential risk charities are likely to be helpful or actively harmful. The fourth paragraph of the conclusion and various caveats like "basically competent" were meant to limit the scope of the discussion to only those whose effects were mostly positive rather than negative. Carl Shulman suggested in a feedback comment that one could set up an explicit model where one multiplies (1) a normal variable centered on zero, or with substantial mass below zero, intended to describe uncertainty about whether the charity has mostly positive or mostly negative effects, with (2) a thicker-tailed and always positive variable describing uncertainty about the scale the charity is operating on.

-411y

"Basically" sounds like quite an understatement. It is not just an anthropogenic catastrophe, it's highly-competent-and-dedicated-people-screwing-up-spectacularly-in-a-way-nobody-wants catastrophe. One could naively think that funding more safety conscious efforts can't hurt but this is problematic when the concern with safety is not statistically independent of the unsafety of the approach that's deemed viable or pursued.

A different, but closely related question: Rather than consider lives in isolation, for what x do we prefer

a world which has a 1-x chance of drastically reduced starvation and disease and other effects of charities with easy-to-measure outcomes, and an x total chance of being destroyed by all x-risk factors

over a world in which there is a 1-epsilon chance of modest drop from baseline starvation and disease, and epsilon chance of being destroyed by an x-risk factor?

It is rational to have a preference for taking the riskier choice even for a large x, if one values quality of life over certainty of life.

Another approach to justifying a low prior would be to say, “if such cost-effective strategies had been available, they would have been used up by now,” like the proverbial $20 bill lying on the ground. (Here, it’s a 20-util bill, which involves altruistic rather than egoistic incentives, but the point is still relevant.) Karnofsky has previously argued something similar.

Doesn't that justify a low prior expectation for marginal benefits of marginal investment in all charities?

I got a different updated value ratio in part 2. If my calculations are wrong, would someone correct me?

V = Value; A = Analysis predicted value

Prior Probabilities:

```
P(V=0)=0.4999
P(V=1)=0.5
P(V=100)=0.0001
```

Analysis Result Probabilities:

```
P(A=0)=(0.5*0.4999)+((1/3)*0.5*1)=0.4166
(half the ones that are zero plus a third of the half where the test fails)
P(A=1)=0.4167
P(A=100)=0.1667
```

Accurate analysis results:

```
P(A=0|V=0)=P(A=1|V=1)=P(A=100|V=100)=(1/2)+(1/3)=2/3
(the half when the analysis works and reports accurately plus
the third when it fails but gives the
```

... 011y

"50:4" in the post refers to "P(V=1|A=100)*1 : P(V=100|A=100)*100", not "EV(A=1) : EV(A=100)". EV(A=1) is irrelevant, since we know that A is in fact 100.

011y

I think this confused me:
I see that. Thanks.

(This is a long post. If you’re going to read only part, please read sections 1 and 2, subsubsection 5.6.2, and the conclusion.)

## 1. Introduction

Suppose you want to give some money to charity: where can you get the most bang for your philanthropic buck? One way to make the decision is to use explicit expected value estimates. That is, you could get an unbiased (averaging to the true value) estimate of what each candidate for your donation would do with an additional dollar, and then pick the charity associated with the most promising estimate.

Holden Karnofsky of GiveWell, an organization that rates charities for cost-effectiveness, disagreed with this approach in two posts he made in 2011. This is a response to those posts, addressing the implications for existential risk efforts.

According to Karnofsky, high returns are rare, and even unbiased estimates don’t take into account the reasons

whythey’re rare. So in Karnofsky's view, our favorite charity shouldn’t just be one associated with a high estimate, it should be one that supports the estimate with robust evidence derived from multiple independent lines of inquiry.^{1}If a charity’s returns are being estimated in a way that intuitively feels shaky, maybe that means the fact that high returns are rare should outweigh the fact that high returns were estimated, even if the people making the estimate were doing an excellent job of avoiding bias.Karnofsky’s first post, Why We Can’t Take Expected Value Estimates Literally (Even When They’re Unbiased), explains how one can mitigate this issue by supplementing an explicit estimate with what Karnofsky calls a “Bayesian Adjustment” (henceforth “BA”). This method treats estimates as merely noisy measures of true values. BA starts with a prior representing what cost-effectiveness values are out there in the general population of charities, then the prior is updated into a posterior in standard Bayesian fashion.

Karnofsky provides some example graphs, illustrating his preference for robustness. If the estimate error is small, the posterior lies close to the explicit estimate. But if the estimate error is large, the posterior lies close to the prior. In other words, if there simply aren’t many high-return charities out there, a sharp estimate can be taken seriously, but a noisy estimate that says it has found a high-return charity must represent some sort of fluke.

Karnofsky does not advocate a policy of performing an

explicitadjustment. Rather, he uses BA to emphasize that estimates are likely to be inadequate if they don’t incorporate certain kinds of intuitions — in particular, a sense of whether all the components of an estimation procedure feel reliable. If intuitions say an estimate feels shaky and too good to be true, then maybe the estimate was noisy and the prior is more important. On the other hand, if intuitions say an estimate has taken everything into account, then maybe the estimate was sharp and outweighs the prior.Karnofsky’s second post, Maximizing Cost-Effectiveness Via Critical Inquiry, expands on these points. Where the first post looks at how BA is performed on a single charity at a time, the second post examines how BA affects the estimated relative values of different charities. In particular, it assumes that although the charities are all drawn from the same prior, they come with different estimates of cost-effectiveness. Higher estimates of cost-effectiveness come from estimation procedures with proportionally higher uncertainty.

It turns out that higher estimates aren’t always more auspicious: an estimate may be “too good to be true,” concentrating much of its evidential support on values that the prior already rules out for the most part. On the bright side, this effect can be mitigated via multiple independent observations, and such observations can provide enough evidence to solidify higher estimates despite their low prior probability.

Charities aiming to reduce existential risk have a potential claim to high expected returns, simply because of the size of the stakes. But if such charities are difficult to evaluate, and the prior probability of high expected values is low, then the implications of BA for this class of charities loom large.

This post will argue that competent efforts to reduce existential risk reduction are still likely to be optimal, despite BA. The argument will have three parts:

BA differs from fully Bayesian reasoning, so that BA risks double-counting priors.

The models in Karnofsky’s posts, when applied to existential risk, boil down to our having prior knowledge that the claimed returns are virtually impossible. (Moreover, similar models without extreme priors don’t lead to the same conclusions.)

We don’t have such prior knowledge. Extreme priors would have implied false predictions in the past, imply unphysical predictions for the future, and are justified neither by our past experiences nor by any other considerations.

Claim 1 is not essential to the conclusion. While Claim 2 seems worth expanding on, it’s Claim 3 that makes up the core of the controversy. Each of these concerns will be addressed in turn.

Before responding to the claims themselves, however, it’s worth discussing a highly simplified model that will illustrate what Karnofsky’s basic point is.

## 2. A Simple Discrete Distribution of Charitable Returns

Suppose you’re considering a donation to the Center for Inventing Metawidgets (CIM), but you'd like to perform an analysis of the properties of metawidgets first.

^{2}Before the analysis, you’re uncertain about three possibilities:If we now compute the expected value of a donation to CIM, it ends up as a sum of the following components:

0from the possibility that the return is 00.5from the possibility that the return is 10.01from the possibility that the return is 100In particular, the possibility of a modest return contributes 50 times the expected value of the possibility of an extreme return. The size of the potential return, in this case, didn’t make up for its low probability.

But that’s before you do an analysis that will give you some additional evidence about metawidgets. The analysis has the following properties:

What happens if the analysis says the return is 100?

To find the right probabilities to assign, we have to do Bayesian updating on this analysis result. The outcome of the analysis is four times as likely if the true value is 100 than if it is either 0 or 1. So the ratio of the expected value contributions changes from 50:1 to 50:4.

Applied to this case, Karnofsky’s point is simply this: despite the analysis suggesting high returns, modest returns still come with higher expected value than high returns. High returns should be considered more probable after the analysis than before — we’ve observed a pretty good likelihood ratio of evidence in their favor — but high returns started out so improbable that even after receiving this bump, they still don’t matter.

Now that we’ve seen the point in simplified form, let’s begin a more detailed discussion.

## 3. The Role of BA

This section will add some critical notes on the concept of BA — notes that should apply whether the adjustment is performed explicitly or just used as a theoretical justification for listening to intuitions about the accuracy of particular estimates.

Before discussing the role of BA, let’s guard against a possible misinterpretation. Karnofsky is not arguing against maximizing expected value. He is arguing against a particular estimation method he labels “Explicit Expected Value,” which he considers to give inaccurate answers.

The Explicit Expected Value (EEV) method is simple: obtain an estimate of the true cost-effectiveness of an action, then act as if this estimate is the “true” cost-effectiveness. This “true” cost-effectiveness could be interpreted as an expected value itself.

^{3}In contrast to EEV, Karnofsky advocates “Bayesian Adjustment.” Bayesian reasoning involves multiplying a prior by a likelihood to find a posterior. In this case, the prior describes the charities that are out there in the population; the likelihood describes how likely different true values would have been to produce the given estimate; and the posterior represents our final beliefs about the charity’s true cost-effectiveness. By looking at how common different effectiveness levels are, and how likely they would have been to lead to the given estimate, we judge the probability of various effectiveness levels.

In the sense that we’re updating on evidence according to Bayes’ theorem, what’s going on is indeed "Bayesian." But it’s worth pointing out one difference between Karnofsky’s adjustments and a fully Bayesian procedure: BA updates on a point estimate rather than on the full evidence that went into the point estimate.

This matters in two different ways.

First, the point estimate doesn’t always carry all the available information. A procedure for generating a point estimate from a set of evidence could summarize different possible sets of evidence into the same point estimate, even though they favor different hypotheses. This sort of effect will probably be irrelevant in practice, but one might call BA “half-Bayesian” in light of it.

Second, and more importantly, there’s a risk of misinterpreting the nature of the estimate. Karnofsky’s model, again, assumes that estimates are "unbiased" — that conditional on any given number being the true value, if you make many estimates, they’ll average out to that number. And if that’s actually the case for the estimation procedure being used, then that’s fine.

However, to the extent that an estimate took into account priors, that would make it “biased” toward the prior. As Oscar Cunningham comments:

In the most straightforward case, the source simply gave his own Bayesian posterior mean. If you and the source had the same prior, then your posterior mean should be the source’s posterior mean. After all, the source performed just the same computation that you would.

An old OvercomingBias post advises us to share likelihood ratios, not posterior beliefs. To be fair, in many cases communicating likelihood ratios for the whole space of hypotheses is impractical. One may instead want to communicate a number as a summary. (Even if one is making the estimate oneself, it may not be clear how one’s brain came up with a particular number.) But it’s important not to take a number that has prior information mixed in, and then interpret it as one that doesn’t.

In less straightforward cases, maybe

partof the prior was taken into account. For example, maybe your source shares your pessimism about the organizational efficiency of nonprofits, but not your pessimism in other areas. Even if your source informally ignored lines of reasoning that seemed to lead to an estimate that was “too good to be true,” that is enough to make double-counting an issue.But to put this section in context, the appropriateness of BA isn’t the most important disagreement with Karnofsky. Based on the considerations given here, performing an intuitive BA may well be better than going by an explicit estimate. Differences in priors have room to be far more important than just the results of (partially) double-counting them. So the more important part of the argument will be about which priors to use.

## 4. Probability Models

## 4.1: The first model

Karnofsky defends his conclusions with probabilistic models based on some mathematical calculations by Dario Amodei. This section will argue that these models only rule out optimal existential risk charity because the priors they assign to the relevant hypotheses are extremely low — in other words, because they virtually rule out extreme returns in advance.

In the model in Karnofsky’s first post, it’s easy to see the low priors. Consider the first example (the graphs are from Karnofsky's posts):

This example comes with some particular assumptions about parameters. The prior is normally distributed with mean 0 and standard deviation 1; the likelihood is normally distributed with mean 10 and standard deviation 1. As in the saying that “a Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule,” the posterior ends up in the middle, hardly overlapping with either. As Eliezer Yudkowsky points out, this lack of overlap should practically

never happen. When it does, such an event is a strong reason to doubt one’s assumptions. It suggests that you should have assigned a different prior.Or maybe, instead of the prior, it’s the

likelihoodthat you should have assigned differently — as one of the other graphs does:Here, the outcome makes some sense, because there’s significant overlap. A high true cost-effectiveness would have been

more likelyto produce the estimate found, but a low true cost-effectivenesscouldhave produced it instead. And the prior says the latter case, where the true cost-effectiveness is low, is far more likely — so the final best estimate, indeed, ends up not differing much from the initial best estimate.Note, however, that this prior is extremely confident. The difference in probability density between the expected value and a value ten standard deviations out is a factor of e

^{-50}, or about 10^{-22}. This number is so low it might as well be zero.## 4.2: The second model

The second model builds on the first model, so many of the same considerations about extreme priors will carry over. This time, we’re looking at a set of different estimates that we could be updating on like we did in the first model. For each of these, we take the expectation of the posterior distribution for the true cost-effectiveness, so we can put these expectations in a graph. After all, the expectation is the number that will factor into our decisions!

Here’s one of the graphs, showing

initial estimateson the x-axis andfinal estimateson the y-axis. The initial estimates are what we’re performing a Bayesian update on, and the final estimates are the expectation value of the distribution of cost-effectiveness after updating:So as initial estimates increase, the final estimate rises at first, but then slowly declines. High estimates are good up to a point, but when they become too extreme, we have to conclude they were a fluke.

As before, this model uses a standard normal prior, which means high true values have enormously smaller prior probabilities. Compared to this prior, the evidence provided by each estimate is minor. If the estimate falls one standard deviation out in the distribution, then it favors the estimate value over a value of zero by a likelihood ratio of the square root of e, or about 1.65. So it’s no wonder that the tail end of high cost-effectiveness ends up irrelevant.

According to Karnofsky, this model illustrates that an estimate is safer to take at face value when evidence in its favor comes from multiple independent lines of inquiry. There are some calculations showing this — the more independent pieces of evidence for a given high value you gather, the more these together can overcome the “too good to be true” effect.

While multiple independent pieces of evidence are indeed better, it’s important to emphasize that the relevant variable is simply the evidence’s

strength. Evidence can be strong because it comes from multiple directions, but it can also be strong because it just happens to be unlikely to occur under alternative hypotheses. If we have two independent observations that are both twice as likely to occur given cost-effectiveness 3 than cost-effectiveness 1, that’s equally good as having a single observation that’s four times as likely to occur given cost-effectiveness 3 than cost-effectiveness 1.It’s worth noting that if the multiple observations are all observations of one step in the process, and the other steps are left uncertain, there’s a limit to how much multiple observations can make a difference.

## 4.3: Do the same calculations apply to log-normal priors?

Now that we’ve established that the models use low priors, can we evaluate whether the low priors are essential to the models’ conclusions? Or are they just simplifying assumptions that make the math easier, but would be unnecessary in a full analysis?

One obvious step is to see if Karnofsky's conclusions hold up with log-normal models. Karnofsky states that the conclusions carry over qualitatively:

Assuming a log-normal prior, however, does change the mathematics. Graphs like those in Karnofsky’s first post could certainly be interpreted as referring to the logarithm of cost-effectiveness, but the final number we’re interested in is

the expected cost-effectiveness itself. And if we interpret the graph as representing a logarithm, it’s no longer the case that the point at the middle of the distribution gives us the expectation. Instead, values higher in the distribution matter more.Guy Srinivasan points out that, for the same reason, log-normal priors would lead to different graphs in the second post, weakening the conclusion. To take the expectation of the logarithm and interpret that as the logarithm of the true cost-effectiveness is to bias the result downward.

If, instead of calculating e to the power of the expected value of the logarithm of cost-effectiveness, we calculate the expected value of cost-effectiveness directly, there’s an additional term that increases with the standard deviation.

For an example of this, consider a normal distribution with mean 0 and standard deviation 1. If it represents the cost-effectiveness itself, we should take its expected value and find 0. But if it represents the logarithm of the cost-effectiveness, it won’t do to take e to the power of the expected value, which would be 1. Rather, we add another ½ sigma (which in this case equals ½) before exponentiating. So the final expected cost-effectiveness ends up a factor sqrt(e) ( = 1.65) larger — the most “average” value lies ½ to the right of the center of the graph.

While the mathematical point made here opposes Karnofsky’s claims, it’s hard to say how likely it is to be decisive in the context of the dilemmas that actually confront decision makers. So let’s take a step back and directly face the question of how extreme these priors need to be.

## 4.4: Do priors need to be extreme?

As we’ve seen, Karnofsky’s toy examples use extreme priors, and these priors would entail a substantial adjustment to EV estimates for existential risk charities. This adjustment would in turn be sufficient to alter existential risk charities from good ideas to bad ideas.

^{4}The claim made in this section is: Karnofsky’s models don’t just

useextreme priors, theyrequireextreme priors if they are to have this altering effect. To determine whether this claim is true, one must check whether there are priors that aren’t extreme, but still have the effect.^{5}And indeed, as pointed out by Karnofsky, there exist priors that (1) are far less extreme than the normal prior and (2) still justify a major adjustment to EV estimates for existential risk charities. This is a sense in which his point qualitatively holds.

But the adjustment needs to be not just major, but large

enoughto turn existential risk charities from good ideas into bad ideas. This is difficult. Existential risk charities come with the potential for cost-effectiveness many orders of magnitude higher than that of the average charity. The normal prior succeeds at discounting this potential with its extreme skepticism, as may other priors. But if we can show that all the non-extreme priors justify an adjustment that may be large, but is not large enough to decide the issue, then that is a sense in which Karnofsky’s point does not qualitatively hold.And a prior can be far less extreme than the normal prior, while still being extreme. Do the log-normal prior and various even thicker-tailed priors qualify as “extreme,” and do they entail sufficiently large adjustments? Rather than get hopelessly lost in that sort of analysis, let’s just see what happens when one tries modeling real existential risk interventions as simple all-or-nothing bets: either they achieve some estimated reduction of risk, or the reasoning behind them fails completely.

^{6}Suppose there’s some estimate for the cost-effectiveness of a charity — call it E — and the true cost-effectiveness must be either 0 or E. You assign some probability p to the proposition that the estimate came from a true cost-effectiveness of E. This probability itself then comes from a prior probability that the estimate was E, and a likelihood ratio comparing at what rates true values of 0 and E create estimates of E.

^{7}To find a ballpark number for what returns analyses are saying may be available from existential risk reduction (i.e., what value we should use for E), we can take a few different approaches.

One approach is to look at risks that are relatively tractable, such as asteroid impacts. It’s estimated that impacts similar in size to that involved in the extinction of the dinosaurs occur about once every hundred million years. With the simplifying assumption that each such event causes human extinction, and that lesser asteroid events don’t cause human extinction (or even end any existing lives), this translates to an extinction probability of one in a million for any given century. In other words, preventing all asteroid risk for a given century saves an expected 10

^{4}existing lives and an expected 1/10^{6}fraction of all future value.A set of interventions funded in the past decade ruled out an imminent extinction-level impact at a cost of roughly $10

^{8}.^{8}According to this rough calculation, then, this program saved roughly one life plus a 1/(10

^{10}) fraction of the future for each $10^{4}. Of course, future programs would probably be less effective.For this to have been competitive with international aid ($10

^{3}dollars per life saved), one only has to consider saving a 1 in 10^{10}fraction of humanity’s entire future to be 10 times as important as saving an individual life. This is equivalent to considering saving humanity’s entire future to be 10 times as important as saving all individual people living today. In a straightforward “astronomical waste” analysis, of course, it is farmoreimportant: enough so to compensate a high probability that the estimate is incorrect.As an alternative to looking at tractable classes of risk for a cost-effectiveness estimate, we could look at the classes of existential risk that appear the most promising. AI risk, in particular, stands out. In a Singularity Summit talk, Anna Salamon estimated eight expected existing lives saved per dollar of AI risk research, or about $10

^{-1}per existing life. Each existing life, again, also corresponds to a 10^{-10}fraction of our civilization’s astronomical potential.(There are a number of points where one could quibble with the reasoning that produced this estimate; cutting it down by a few orders of magnitude seems like it may not affect the underlying point too much. The main reason why there is an advantage here might be because we restricted ourselves to a limited class of charities for international aid, but not for existential risk reduction. In particular, the international aid charities we’ve used in the comparison are those that operate on an object level, e.g. by distributing mosquito nets, whereas the estimate in the talk refers to meta-level research about what object-level policies would be helpful.)

For such charities not to be competitive with international aid, just based on saving present-day lives alone, one would need to assign a probability that the estimate is correct of at most 1/10

^{4}. And as before, in a straightforward utilitarian analysis, the needed factor is much larger. This means that the probability that the estimate is correct could be far lower still.Presumably the probability of an estimate of E given a true value of E is far greater than the probability of an estimate of E given a true value of 0. So the 10

^{4}or greater understates the extremeness of the priors you need. If your prior for existential risk-level returns is low because most charities are feel-good local charities, the likelihood ratio brings it back up a lot, because there aren’t any feel-good local charities producing plausible calculations that say they’re extremely effective.^{9}So one genuinely needs to find improbabilities that cut down the estimate by a large factor — although, depending on the specifics, one may need to bring in astronomical waste arguments to establish this point. Is it reasonable to adopt priors that have this effect?

## 5: Priors and their justification

## 5.1: Needed priors

To recapitulate, it turns out that if one uses the concepts in Karnofsky’s posts to argue that (generally competent) existential risk charities are not highly cost-effective, this requires extreme priors. The least extreme priors that still create low enough posteriors are still fairly extreme.

Note that, for the argument to go through, it’s not sufficient for the prior to be decreasing. A prior that doesn’t decrease quickly enough doesn’t even have a tail that’s finite in size. Nor is it sufficient for the size of the prior’s tail to be decreasing. It needs to at least decrease quickly enough to make up for the greater cost-effectiveness values we’re multiplying by. For the expected value to even be finite a priori, with no evidence at all, the tail has to decrease more quickly than just at a minimum rate.

## 5.2: Possible justifications

Having argued that an attempt to defeat x-risk charities with BA requires a low prior — and that it therefore requires a justification for a low prior — let’s look at possible approaches to such a justification.

One place to start looking could be in power laws. A lot of phenomena seem to follow power law distributions — although claims of power laws have also been criticized. The thickness of the tail depends on a parameter, but if, as this article) suggests, the parameter alpha tends to be near 1, then that gives one a specific thickness.

Another approach to justifying a low prior would be to say, “if such cost-effective strategies had been available, they would have been used up by now,” like the proverbial $20 bill lying on the ground. (Here, it’s a 20-util bill, which involves altruistic rather than egoistic incentives, but the point is still relevant.) Karnofsky has previously argued something similar.

For AI risk in particular, one might expect returns to have been driven down to the level of returns available for, e.g., asteroid impact prevention. If much higher returns are available for AI risk than other classes of risk, there must be some sort of explanation for why the low-hanging fruit there hasn’t been picked.

Such an explanation requires us to think about the beliefs and motivations of those who fund measures to mitigate existential risks, although there may also simply be an element of random chance in which categories of threat get attention. Various differences between categories of risk are relevant. For example, AI risk is an area where relatively little expert consensus exists on how imminent the problem is, on what could be done to solve the problem, and even whether the problem exists. There are many reasons to believe that thinking about AI risk, compared to asteroids, is unusually difficult. AI risk involves thinking about many different academic fields, and offers many potential ways to become confused and end up mistaken about a number of complicated issues. Various biases could turn out to be a problem; in particular, the absurdity heuristic seems as though it could cause justified concerns to be dismissed early. Moreover, with AI risks, investment into global-scale risk is less likely to arise as a side effect of the prevention of smaller-scale disasters. Large asteroids pose similar issues to smaller asteroids, but human-level artificial general intelligence poses different issues than unintelligent viruses.

Of course, all these things are evidence against a problem existing. But they could also explain why, even in the presence of a problem, it wouldn’t be acted upon.

## 5.3: Past experience as a justification for low priors

The main approach to justification of low priors cited by Karnofsky isn’t any quantified argument, but is based on gut-level extrapolation from past experience:

It does not seem a straightforward task for a brain to extrapolate from its own life to global-scale efforts. The outcomes it has actually observed are likely to be a biased sample, involving cases where it can actually trace its causal contribution to a relatively small event. In particular, of course, a brain hasn’t had any opportunity to observe effects persisting for longer than a human lifetime.

Extrapolating from the mundane events your brain has directly experienced to far out in the tail, where the selection of events has been highly optimized for utilitarian impact, is likely to be difficult.

“Black swan” type considerations are relevant here: if you’ve seen a million white swans in a row in the northern hemisphere, that might entitle you to assign a low probability that the first swan you see in the southern hemisphere will be non-white, but it doesn’t entitle you to assign a one-in-a-million probability. In just the same way, if you’ve seen a million inefficient charities in a row when looking mostly at animal charities, that doesn’t entitle you to assign a one-in-a-million probability to a charity in the class of international aid being efficient. Maybe things will just be fundamentally different.

But it can be argued that we have already had some actual observations of existential risk-scale interventions. And indeed, Karnofsky says elsewhere that past claims of enormous cost-effectiveness have failed to pan out:

One can argue the numbers: exactly how many actions seemed enormously valuable in the way AI risk reduction seems to? Exactly how few of them panned out? Some examples one might include in this category are religious claims about the afterlife or the end times, particularly leveraged ways of creating permanent social change, or ways to intervene at important points in nuclear arms races. But in general, if your high estimate of cost-effectiveness for an organization is based on, say, a 10% chance that it would visibly succeed at achieving enormous returns over its lifetime, then just a few such failures provide only moderate evidence against the accuracy of the estimate. And as we’ve seen, for the regressive impact created by Karnofsky’s priors to make a difference, it needs to be not just substantial, but enormous.

## 5.4: Intuitions suggesting extremely low priors are unreasonable

To get a feel for how extreme some of these priors are, consider what they would have predicted in the past. As Carl Shulman says:

In other words, with a normal prior, the model assigns extremely small probabilities to events that have, in fact, happened. With a log-normal prior, the problem is not as bad. But as Shulman points out, such a prior still makes predictions for the future that are difficult to square with physics — difficult to square with the observation that existential disasters seem possible, and at least some of them are partly mediated by technology. As a reductio ad absurdum of normal and log-normal priors, he offers a “charity doomsday argument”:

In Karnofsky’s reactions to arguments such as these, he has emphasized that, while his model may not be realistic, there is no better model available that leads to different conclusions:

But the flaw identified here — that the prior in Karnofsky’s models cannot be convinced of astronomical waste — isn’t just an accidental feature of simplifying reality in a particular way. It’s a flaw present in any scheme that discounts the implications of astronomical waste through priors. Whatever the probability for the existence of preventable astronomical waste is, in expected utility calculations, it gets multiplied by such a large number that unless it starts out extremely low, there’s a problem.

As a last thought experiment suggesting the necessary probabilities are extreme, suppose that in addition to the available evidence, you had a magical coin that always flipped heads if astronomical waste were real and preventable — but that was otherwise fair. If the coin came up heads dozens of times, wouldn’t you start to change your mind? If so, unless your intuitions about coins are heavily broken, your prior must not in fact be so extremely small as to cancel out the returns.

## 5.5: Indirect effects of international aid

There is a possible way to argue for international aid over existential risk reduction based on priors without requiring a prior so small as to unreasonably deny astronomical waste. Namely, one could note that international aid itself has effects on astronomical waste. Then international aid is on a more equal level with existential risk, no matter how large the numbers for astronomical waste turn out to be.

Perhaps international aid has effects hastening the start of space colonization. Earlier space colonization would prevent whatever astronomical waste takes place during the interval between the point where space colonization actually happens, and the point where it would otherwise have happened. This could conceivably outweigh the astronomical waste from existential risks even if such risks aren’t astronomically improbable.

Do we have a way to evaluate such indirect effects on growth? The argument goes as follows: international aid saves people’s lives, saving people’s lives increases economic growth, economic growth increases the speed of development of the required technologies, and this decreases the amount of astronomical waste. However, as Bostrom points out in his paper on astronomical waste, safety is still a lot more important than speed:

A more recent analysis by Stuart Armstrong and Anders Sandberg emphasizes the effect of galaxies escaping over the cosmic event horizon: the more we delay colonization, and the more slowly colonization happens, the more galaxies go permanently out of reach. Their model implies that we lose about a galaxy per year of delaying colonization at light speed, or about a galaxy every fifty years of delaying colonization at half light speed. This is out of, respectively, 6.3 billion and 120 million total galaxies reached.

So a year’s delay wastes only about the same amount of value as a one-in-several-billion chance of human extinction. That means safety is usually more important than delay. For delay to outweigh safety requires a highly confident belief in the proposition that we can affect delay but not safety.

Does this give us a way to estimate the indirect returns of saving one person’s life in the Third World?

Since it’s probably good enough to estimate to within a few orders of magnitude, we’ll make some very loose assumptions.

Suppose a Third World country with a population of 100 million makes a total difference of one month in the timing of humanity’s future colonization of space. Then a single person in that country makes an expected difference of 1/(1200 million) years — equivalent to a one-in-billions-of-billions chance of human extinction.

If saving the person’s life is the result of an investment of $10

^{3}, then to claim the astronomical waste returns are similar to those from preventing existential risk, one must claim an existential risk intervention of $10^{6}would have a chance of one in millions of billions of preventing an existential disaster, and an intervention of $10^{9}would have a chance of one in thousands of billions.There are some caveats to be made on both sides of the argument. For example, we assumed that preventing human extinction has billions of times the payoff of delaying space colonization for a year; but what if the bottleneck is some other resource than what’s being wasted? In that case, it could be that, if we survive, we can get a lot more value than billions of times what is lost through a year’s waste. And if one (naively?) took the expectation value of this “billions” figure, one would probably end up with something infinite, because we don’t know for sure what’s possible in physics.

Increased economic growth could have effects not just on timing, but on safety itself. For example, economic growth could increase existential risk by speeding up dangerous technologies more quickly than society can handle them safely, or it could decrease existential risk by promoting some sort of stability. It could also have various small but permanent effects on the future.

Still, it would seem to be a fairly major coincidence if the policy of saving people’s lives in the Third World were also the policy that maximized safety. One would at least expect to see more effect from interventions targeted specifically at speeding up economic growth. An approach to foreign aid aimed at maximizing growth effects rather than near-term lives or DALYs saved would probably look quite different. Even then, it’s hard to see how economic growth could be the policy that maximized safety unless our model of what causes safety were so broken as to be useless.

Throughout this analysis, we’ve been assuming a standard utilitarian view, where the loss of astronomical numbers of future life-years is more important than the deaths of current people by a correspondingly astronomic factor. What if, at the other extreme, one only cared about saving as many people as possible from the present generation? Then delay might be more important: in any given year, a nontrivial fraction of the world population dies. One could imagine a speedup of certain technologies causing these technologies to save the lives of whoever would have died during that time.

Again, we can do a very rough calculation. Every second, 1.8 people die. So if, as above, saving a life through malaria nets makes a difference in colonization timing of 1/(1200 million) years or 25 milliseconds, and if hastening colonization by one second saves those 1.8 lives, the additional lives saved through the speedup are only 1/40 of the lives saved directly by the malaria net.

Since we’re dealing with order-of-magnitude differences, for this 1/40 to matter, we’d need to have underestimated it by orders of magnitude. What we’d have to prove isn’t just that lives saved through speedup outnumber lives saved directly; what we’d have to prove is that lives saved through speedup outnumber lives saved through alternative uses of money. As we saw before, on top of the 1/40, there are still another four orders of magnitude or so between estimates of the returns in current lives saved through AI risk reduction and international aid.

One may question whether this argument constitutes a “true rejection” of the cost-effectiveness of existential risk reduction: were international aid charities really chosen

becausethey increase economic growth and thereby speed up space colonization? If one were optimizing for that criterion, presumably there would be more efficient charities available, and it might be interesting to look at whether one could make a case that they save more current people than AI risk reduction. One would also need to have a reason to disregard astronomical waste.## 5.6: Pascal’s Mugging and the big picture

Let’s take a more detailed look at the question of whether reasonable priors, in fact, bring the expected returns of the best existential risk charities down by a sufficient factor. Karnofsky states a general argument:

In defending the idea that existential risk reduction has a high enough probability of success to be a good investment, we have two options:

Use a prior with a tail that decreases faster than 1/X, and argue that the posterior ends up high enough anyway.

Use a prior with a tail that decreases slower than 1/X, and argue that there are no strange implications; or that there are strange implications but they’re not problematic.

Let’s briefly examine both of these possibilities. We can’t do the problem full numerical justice, but we can at least take an initial stab at answering the question of what alternative models could look like.

## 5.6.1: Rapidly shrinking tails

First, let’s look at an example where the prior probability of impact at least X falls

fasterthan X rises. Suppose we quantify X in terms of the number of lives that can be saved for one million dollars. Consider a Pareto distribution (that is, a power law) for X, with a minimum possible value of 10, and with alpha equal to 1.5 so that the density for X decreases as X^{-5/2}, and the probability mass of the tail beyond X decreases as X^{-3/2}. Now suppose international aid claims an X of at least 1000 and existential risk reduction claims an X of at least 100,000. Then there’s a 1 in 1000 prior for the international aid tail and a 1 in 1000000 prior for the existential risk tail.A one in a million prior sounds scary. However:

Those million charities would consist almost entirely of obviously non-optimal charities. Just knowing the general category of what they’re trying to do would be enough to see they lacked extremely high returns. Picking the ones that are even mildly reasonable candidates already involves a great deal of optimization power.

You wouldn’t need to identify the one charity that had extremely good returns. For purposes of getting a better expected value, it would be more than sufficient to narrow it down to a list of one hundred.

Presumably, some international aid charities manage to overcome that 1 in 1000 prior, and reach a large probability. If reasoning can pick out the best charity in a thousand with reasonable confidence, then maybe once those charities are picked out, reasoning can take a useful guess at which one is the best in a thousand of

thesecharities.Overconfidence studies have trained us to be wary of claims that involve 99.99% certainty. But we should be wary of a confident prior just as we should be wary of a confident likelihood. It’s easy to make errors when caution is applied in only one direction. As a further “intuition pump,” suppose you’re in a foreign country and you meet someone you know. The prior odds against it being that person may be billions to one. But when you meet them, you’ll soon have strong enough evidence to attain nearly 100% confidence — despite the fact that this takes a likelihood ratio of billions.

So in sum, it seems as though even with a prior that declines fairly quickly, an analysis could still reasonably judge existential risk-level returns to be the most important. A quickly declining prior can still be overcome by evidence — and the amount of evidence needed drops to zero as the size of the tail gets closer to decreasing at a speed of 1/X. Again, just because an effect exists in a qualitative sense, that doesn’t mean that, in practice, it will affect the conclusion.

## 5.6.2: Slowly shrinking tails

Second, let’s consider prior distributions where the probability of impact at least X falls slower than X rises. One example of where this happens is a power law with an alpha lower than 1. But priors implied by Solomonoff induction also behave like this. For example, the probability they assign to a value of 3^^^3 is much larger than 1/(3^^^3), because the number can be produced by a relatively short program. Most values that large have negligibly small probabilities, because there’s no short program for them. But some values that large have higher probabilities, and end up dominating any plausible expected value calculation starting from such a prior.

^{10}This problem is known as “Pascal’s Mugging,” and has been discussed extensively on LessWrong. Karnofsky considers it a reason to reject any prior that doesn’t decrease fast enough. But there are a number of possible ways out of the problem, and not all of them change the prior:

Adopting a bounded utility function (with the right bound and functional form) can make it impossible for the mugger to make promises large enough to overcome their improbability.

One could bite the bullet by accepting that one should pay the mugger — or rather that more plausible “muggers,” in the form of infinite physics, say, may come along later.

If the positive and negative effects of giving in to muggers are symmetrical on expectation, then they cancel out... but why would they be symmetrical?

Discounting the utility of an effect by the algorithmic complexity of locating it in the world implies a special case of a bounded utility function.

One could ignore the mugger for game-theoretical reasons... however, the hypothetical can be modified to make game theory irrelevant.

One could justify a quickly declining prior using anthropic reasoning, as in Robin Hanson’s comment: statistically, most agents can’t determine the course of a vast number of agents’ lives. However, while this is a plausible claim about anthropic reasoning, if one has uncertainty about what is the right account of anthropic reasoning, and if one treats this uncertainty as a regular probability, then the Pascal’s Mugging problem reappears.

One could justify a quickly declining prior some other way.

With regard to the last option, one does need some sort of justification. A probability doesn’t seem like something you can choose based on whether it implies reasonable-sounding decisions; it seems like something that has to come from a model of the world. And to return to the magical coin example, would it really take roughly log(3^^^3) heads outcomes in a row (assuming away things like fake memories) to convince you the mugger was speaking the truth?

It’s worth taking particular note of the second-to-last option, where a prior is justified using anthropic reasoning. Such a prior would have to be quickly declining. Let’s explore this possibility a little further.

Suppose, roughly speaking, that before you know anything about where you find yourself in the universe, you expect on average to decisively affect one person’s life. Then your prior for your impact should have an expectation value less than infinity — as is the case for power laws with alpha greater than 1, but not alpha smaller than 1. Of course, the number of lives a rational philanthropist affects is likely to be larger than the number of lives an average person affects. But if some people are optimal philanthropists, that still puts an upper bound on the expectation value. Likewise, if most things that could carry value aren’t decision makers, that’s a reason to expect greater returns per decision maker. Still, it seems like there would be some constant upper bound that doesn’t scale with the size of the universe.

In a world where whoever happens to be on the stage at a critical time gets to determine its long-term contents, there’s a large prior probability that you’re causally downstream of the most important events, and an extremely small prior probability that you live exactly at the critical point. Then suppose you find yourself on Earth in 2013, with an apparent astronomical-scale future still ahead, depending on what happens between now and the development of the relevant technology. This seems like it should cause a strong update from the anthropic prior. It’s possible to find ways in which astronomical waste could be illusory, but to find them we need to look in odd places.

One candidate hypothesis is the idea that we’re living in an ancestor simulation. This would imply astronomical waste was illusory: after all, if a substantial fraction of astronomical resources were dedicated toward such simulations, each of them would be able to determine only a small part of what happened to the resources. This would limit returns. It would be interesting to see more analysis of optimal philanthropy given that we’re in a simulation, but it doesn’t seem as if one would want to predicate one’s case on that hypothesis.

Other candidate hypotheses might revolve around interstellar colonization being impossible even in the long run for reasons we don’t currently understand, or around the extinction of human civilization becoming almost inevitable given the availability of some future technology.

As a last resort, we could hypothesize nonspecific insanity on our part, in a sort of majoritarian hypothesis. But it seems like assuming that we’re insane and that we have no idea

howwe are insane undermines a lot of the other assumptions we’re using in this analysis.If Karnofsky or others would propose other such factors that might create the illusion of astronomical waste, or if they would defend any of the ones named, spelling them out and putting some sort of rough estimate or bounds on how much they tell us to discount astronomical waste seems like it would be an important next move in the debate.

It may be a useful reframing to see things from a perspective like Updateless Decision Theory. The question is whether one can get more value from controlling structures that — in an astronomical-sized universe — are likely to exist many times, than from an extremely small probability of controlling the whole thing.

## 6. Conclusion

BA doesn’t justify a belief that existential risk charities, despite high back-of-envelope cost-effectiveness estimates, offer low or mediocre expected returns.

We can assert this without having to endorse claims to the effect that one must support (without further research) the first charity that names a sufficiently large number. There are other considerations that defeat such claims.

For one thing, there are multiple charities in the general existential risk space and potentially multiple ways of donating to them; even if there weren’t, more could be created in the future. That means we need to investigate the effectiveness of each one.

For another thing, even if there were only one charity with great potential returns in the area, you’d have to check that marginal money wasn’t being negatively useful, as Karnofsky has argued is indeed the case for MIRI (because the "Friendly AI" approach is unnecessarily dangerous, according to Karnofsky).

Systematic upward bias, not just random error, is of course likely to play a role in organizations’ estimates of their own effectiveness.

And finally, some other consideration, not covered in these posts, could prove either that existential risk reduction doesn’t have a particularly high expected value, or that we shouldn’t maximize expected value at all. (Bounded utility functions are a special case of not maximizing expected value, if “value” is measured in e.g. DALYs rather than utils.) Note, however, that Karnofsky himself has not endorsed the use of non-additive metrics of charitable impact.

MIRI, in choosing a strategy, is not gambling on a tiny probability that its actions will turn out relevant. It’s trying to affect a large-scale event — the variable of whether or not the intelligence explosion turns out safe — that will eventually be resolved into a “yes” or “no” outcome. That every individual dollar or hour spent will fail to have much of an effect by itself is an issue inherent to pushing on large-scale events. Other cases where this applies, and where it would not be seen as problematic, are political campaigns and medical research, if the good the research does comes from a few discoveries spread among many labs and experiments.

The improbability here isn’t in itself pathological, or a stretch of expected value maximization. It might be pathological if the argument relied on further highly improbable “just in case” assumptions, for example if we were almost certain that AI is impossible to create, or if we were almost certain that safety will be ensured by default. But even though “if there’s even a chance” arguments have sometimes been made, MIRI does not actually believe that there’s an additional factor on top of that inherent per-dollar improbability that would make it so that all its efforts are probably irrelevant. If it believed that, then it would pick a different strategy.

All things considered, our evidence about the distribution of charities is compatible with AI being associated with major existential risks, and compatible with there being low-hanging fruit to be picked in mitigating such risks. Investing in reducing existential risk, then, can be optimal without falling to BA — and without strange implications.

## Notes

This post was written by Steven Kaas and funded by MIRI. My thanks for helpful feedback from Holden Karnofsky, Carl Shulman, Nick Beckstead, Luke Muehlhauser, Steve Rayhawk, and Benjamin Noble.

^{1}It's worth noting, however, that Karnofsky’s vision for GiveWell is to provide donors with the best giving opportunities that can be found, not necessarily the giving opportunities whose ROI estimates have the strongest evidential backing. So, for Karnofsky, strong evidential backing is a means to the end of finding the best interventions, not an end in itself. In Givewell's January 24th, 2013 board meeting (starting at 24:30 in the MP3 recording), Karnofsky said:"The way ["GiveWell 2", a possible future GiveWell focused on giving opportunities for which strong evidence is less available than is the case with GiveWell's current charity recommendations] would prioritize [giving] opportunities would involve... a heavy dose of personal judgment, and a heavy dose of... "Well, we have laid out our reasons of thinking this. Not all the reasons are things we can prove, but... here's the evidence we have, here's what we do know, and given the limited available information here's what we would guess." We actually do a fair amount of that already with GiveWell, but it would definitely be more noticeable and more prominent and more extreme [in GiveWell 2]...

...What would still be "GiveWell" about ["GiveWell 2"] is that I don't believe that there's another organization that's out there that is publicly writing about what it thinks are the best giving opportunities and why, and... comparing all the possible things you might give to... It's basically a topic of discussion that I don't believe exists right now, and... we started GiveWell to start that discussion in an open, public way, and we started in a certain place, but

thatand not evidence... has always been the driving philosophy of GiveWell, and our mission statement talks about expanding giving opportunities, it doesn't talk about evidence."^{2}Technically, the prior is usually not about a specific charity that we already have information about, but about charities in general. I give an example of a specific fictional charity because I figured that would be more clarifying, and the math works as long as you’re using an estimate to move from a state of less information to a state of more information.^{3}At least in the sense that it might still average over, say, quantum branching and chaotic dynamics. But the “true value” would at least be based on a full understanding of the problem and its solutions.^{4}Of course, it may be the case that particular charities working on existential risk reduction fail to pursue activities thatactuallyreduce existential risk — that question is separate from the questions we have the space to examine here.^{5}For this section, by “extreme priors” I just mean something like “many zeroes.” Does the prior say that what some of us think of as always having been a live hypothesis actually started out as hugely improbable? Then it’s “extreme” for my purposes. Once it’s been established that only extreme priors let the point carry through, one can then discuss whether a prior that’s “extreme” in this sense may nonetheless be justified. This is what the next section will be devoted to. The separation between these two points forces me to use this rather artificial concept of “extreme,” where an analysis would ideally just consider what priors are reasonable and how Karnofsky’s point works with them. Nonetheless, I hope it makes things clearer.^{6}It would be nice to have some better examples of the overall point, but these were the examples that seemed maximally illustrative, clear, and concise given time and space constraints.^{7}This estimate, technically, isn’t unbiased. If the true value is E, the estimate will average lower than E, and if the true value is 0, the estimate will average higher than 0. But this shouldn’t matter for the illustration.^{8}To be sure, if an asteroid had been on its way, we would have also needed to pay the cost of deflecting it. But this possibility was extremely improbable. As long as the cost of deflection wouldn’t have been much more than $10^{14}, this doesn’t increase the expected cost by orders of magnitude.^{9}There are some points to be made here about causal screening, and also that it’s unnatural to think of the prior as being on effectiveness, rather than on things that cause both effectiveness and low priors, unless effectiveness is a thing that causes low priors, for example because people have picked up all the low-hanging fruit off the ground. But due to time and space concerns, I have left those points out of this document.^{10}A more complete argument would involve looking at how often a given structure would be repeated with what probability in a simplicity-weighted set of universes, but the general point is the same.