Existential Risk and Existential Hope: Definitions

by owencb1 min read10th Jan 201538 comments

14

Existential Risk
Personal Blog

I'm pleased to announce Existential Risk and Existential Hope: Definitions, a short new FHI technical report.

Abstract:
We look at the strengths and weaknesses of two existing definitions of existential risk, and suggest a new definition based on expected value. This leads to a parallel concept: ‘existential hope’, the chance of something extremely good happening.

I think MIRI and CSER may be naturally understood as organisations trying to reduce existential risk and increase existential hope respectively (although if MIRI is aiming to build a safe AI this is also seeking to increase existential hope). What other world states could we aim for that increase existential hope?
38 comments, sorted by Highlighting new comments since Today at 11:16 AM
New Comment

Definition (iii): An existential catastrophe is an event which causes the loss of most expected value.

Can you name any past existential (or nearly so) catastrophies?

[-][anonymous]6y 14

Toba. Approx. 1000 breeding pairs of humans survived.

Did the event "cause the loss of most expected value"? Looking around, I'm not so sure.

It's a good example of extinction risk, but doesn't seem to fit the (iii) definition well.

Before and during the event, there was a high probability P of humanity going extinct. That is equivalent to the loss then of P proportion of all future expected value. Expected value is always about the future; that's why it's not actual value.

(Also, I think on some many-worlds theories utility was actually lost due to humanity surviving in less measure.)

there was a high probability P of humanity going extinct

Looking from before the event, true. Fair point.

I'll note that I don't claim much justification for my views, I'm mostly just stating them to promote some thought:

  • The E(V) definition is definitionally true, but needs translation to gain any practical meaning, whereas Bostrom's definition of loss of potential is theoretically reasonable kosher, works across a lot of value systems, and has the additional virtue of being more practically meaningful. Extinction misses a bunch of bad fail-scenarios, but is at least concrete. So it's better to use the latter two by default, and it's rare that we'd need to use the E(V) definition.

  • The main other type of definition for existential risk, which Daniel Dewey had suggested to me is existential risk as the reduction of the number of options available to you. This definition seems kind of doomed to me, because I have a feeling it collapses into some kind of physical considerations. e.g. how many particles you have / how many conformations they can take / how much energy or entropy they have - something like that. I think you either need to anchor your definition on extinction, or expected value, or something in-between like "potential", but if you try to define 'options' objectively, I think you end up with something we don't care about anymore.

  • Existential eucatastrophes are interesting. Perhaps making a simulation would count. However, they might not exist if we believe the technological maturity conjecture. The notions of expanding cubically / astronomical waste feel kinda relevant here - if failing some extinction event, everything's going to end up expanding cubically anyway, then you have to decide how you can counterfactually stuff extra value in there. It still seems like the main point of leverage over the future would be when an AI is created. And in that context, I think we want people to cooperate and satisfice (reduce existential risk) rather than quibble and all try to create personal eucatastrophes. So I'm not sure the terminology helps.

Anyhow, it's all interesting to think about.

And in that context, I think we want people to cooperate and satisfice (reduce existential risk) rather than quibble and all try to create personal eucatastrophes. So I'm not sure the terminology helps.

I tend to agree that the main point of leverage over the future will be whether there is a long future. However I think focusing on the risks may focus attention too much on strategies which address the risks directly, where we would be better aiming at a positive intermediate state.

I like the definition of eucatastrophe, I think it's useful to look at both sides of the coin when assessing risk.

Far out example: we receive a radio transmission from an alien craft that passed by our solar system a few thousand years ago looking for intelligent life. If we fire a narrow beam message back at them in the next 10 years they might turn back, after that they'll be out of range. Do we call them back? It's quite likely that they could destroy Earth, but we also need to consider the chance that they'll "pull us up" to their level of civilization, which would be a eucatastrophe.

More relevant example: a child is growing up, his g factor may be the highest ever measured and he's talking his first computer science class at 8 years old. Certainly, if anyone in our generation is going to be give the critical push towards AGI it's likely to be him. But what if he's not interested in AI friendliness and doesn't want to hear about values or ethics?

Definition (iii): An existential catastrophe is an event which causes the loss of most expected value.

A good place to start, but I don't know about the heavy emphasis on expectation. The problems due to skewed distributions are ever-present. An event with small probability but high value will skew expected value. If a second event were to occur that rendered this impossible, we'd lose a lot of expected value. I'm not sure I'd call that a catastrophe, though.

This seems like exactly the set-up Bostrom has in mind when he talks about existential risks. We have a small chance of colonising the galaxy and beyond, but this carries a lot of our expected value. An event which prevents that would be a catastrophe.

Of course many of the catastrophes that are discussed (e.g. most life is wiped out by a comet striking the earth) coincide with drastically reducing the observed value in the short term. But we normally want to include getting stuck on a trajectory which stops further progress, even if it will be a future which involves good lives for billions of people.

Not sure I like the (iii) definition (" the loss of most expected value"). It just transfers all the burden onto the word "value" which is opaque, slippery, and subject to wildly different interpretation.

Consider that e.g. for all the Christians an irrefutable discovery that the whole Jesus thing was a fake and a hoax would count as an existential catastrophe.

It just transfers all the burden onto the word "value" which is opaque, slippery, and subject to wildly different interpretation.

People can certainly value different things, and value the same things differently. But as long as everyone correctly communicates what they value to everyone else, we can talk about expected value unambiguously and usefuly.

Consider that e.g. for all the Christians an irrefutable discovery that the whole Jesus thing was a fake and a hoax would count as an existential catastrophe.

If true, and if this is much more value than would be gained elsewhere (by me or them or someone else) from them learning the truth, then I as a non-Christian would try to prevent Christians from learning this. What is ambiguous about this?

What is ambiguous about this?

Would you call this "an existential catastrophe"?

It's not one for me, but it might be for somebody else. You presented the counterfactual that it is one to Christians, and I didn't want to deny it.

I'm not sure what your point is. Is it that saying anything might be a existential catastrophe to someone with the right values, dismisses the literal meaning of "existential"?

It's not one for me, but it might be for somebody else.

That's a pretty important point. Are we willing to define an existential catastrophe subjectively?

If you define existential risk as e.g. a threat of extinction, that definition has some problems but it does not depend on someone's state of mind -- it is within the realm of reality (defined as what doesn't go away when you stop believing in it). Once you start talking about expected value, it's all in the eye of the beholder.

This is true - these are two completely different things. And I assume from the comments on this post that the OP does indeed define it subjectively, i.e. via loss of (expected) value. Each is worthy of discussion, and I think the two discussions do mostly overlap, but we should be clear as to what we're discussing.

Cases of extinction that aren't existential risk for some people: rapture / afterlife / end of the world religious scenarios; uploading and consequent extinction of biological humanity (most people today would not accept uploading as substiute to their 'real' life); being replaced by our non-human descendants.

Cases of existential risk (for some peoples' values) that don't involve extinction: scenarios where all remaining humans hold values dramatically different from your own; revelation that one's religion or deeply held morality is objectively wrong; humanity fails to populatte/influence the universe; and many others.

Cases of extinction that aren't existential risk for some people

These are not cases of extinction. Christians wouldn't call the Second Coming "extinction" -- after all, you are getting eternal life :-/ I wouldn't call total uploading "extinction" either.

I would call Armageddon (as part of the Second Coming) extinction. And Christians would call forced total uploading extinction (as a form of death).

That value wasn't lost; they would have updated to reassess their expected value.

That requires a precise meaning of expected value in this context that includes only certain varieties of uncertainty. It would take into account the actual probability that, for example, a comet exists which is on a collision course with the Earth, but could not include the state of our knowledge about whether that is the case.

If it did include states of knowledge, then going from 'low probability that a comet strikes the Earth and wipes out all or most human life' to 'Barring our action to avoid it, near-certainty that a comet will strike the Earth and wipe out all or most human life' is itself a catastrophic event and should be avoided.

That requires a precise meaning of expected value in this context that includes only certain varieties of uncertainty.

Kind-of? You assess past expected values in light of information you have now, not just the information you had then. That way, finding out bad news isn't the catastrophe.

The line seems ambiguous, and I don't like this talk of "objective probabilities" used to explain it. But you seem to be talking about E(V) as calculated by a hypothetical future agent after updating. Presumably the present agent looking at this future possibility only cares about its present calculated E(V) given that hypothetical, which need not be the same (if it deals with counterfactuals in a sensible way). To the extent that they are equal, it means the future agent is correct - in other words, the "catastrophic event" has already occurred - and finding this out would actually raise E(V) given that assumption.

When someone is ignorant of the actual chance of a catastrophic event happening, even if they consider it possible, they will have fairly high EV. When they update significantly toward the chance of that event happening, their EV will drop very significantly. This change itself meets the definition of 'existential catastrophe'.

Sounds like evidential decision theory again. According to that argument, you should maintain high EV by avoiding looking into existential risks.

Yes, that's my issue with the paper; it doesn't distinguish that from actual catastrophes.

I don't know what you think you're saying - the definition no longer says that if you consider it to refer to E(V) as calculated by the agent at the first time (conditional on the "catastrophe").

ETA: "An existential catastrophe is an event which causes the loss of most expected value."

We specified objective probabilities to avoid such discoveries being the catastrophes (but value is deliberately subjective). There may be interesting versions of the idea which use subjective probabilities.

We specified objective probabilities to avoid such discoveries being the catastrophes (but value is deliberately subjective).

I don't understand that sentence. Where do you "objective probabilities" come from?

Exactly how to cash out objective probabilities is a tricky problem which is the subject of a substantial literature. We didn't want to tie our definition to any particular version, believing that it's better to parcel off that problem. But my personal view is that roughly speaking you can get an objective probability by taking something like an average of subjective probabilities of many hypothetical observers.

Sorry, still not making any sense to me. "Taking something like an average of subjective probabilities of many hypothetical observers" looks precisely like GIGO and I don't understand how do you get something objective out of subjective perceptions of hypotheticals(!).

If you don't think the concept of "objective probability" is salvageable I agree that you wouldn't want to use it for defining other things.

I don't want to go into detail of my personal account of objective probability here, not least because I haven't spent enough time working it out to be happy it works! The short answer to your question is you need to define an objective measure over possible observers. For the purposes of defining existential risk, you might be better to stop worrying about the word "objective" and just imagine I'm talking about the subjective probabilities assigned by an external observer who is well-informed but not perfectly informed.

Consider that e.g. for all the Christians an irrefutable discovery that the whole Jesus thing was a fake and a hoax would count as an existential catastrophe.

This seems to conflate people's values with their asserted values. Because of belief-in-belief and similar effects, we can't assume those to be the same when modeling other people. We should also expect that people's values are more complex than the values that they will assert (or even admit).

This seems to conflate people's values with their asserted values.

So replace "Christians" with "people who truly believe in the coming Day of Judgement and hope for eternal life".

I had a college roommate who went through a phase where he wanted to die and go to heaven as soon as possible, but believed that committing suicide was a mortal sin.

So he would do dangerous things — like take walks in the middle of the (ill-lit, semi-rural) road from campus to town, wearing dark clothing, at night — to increase (or so he said) his chances of being accidentally killed.

Most Christians don't do that sort of thing. Most Christians behave approximately as sensibly as *humanists do with regard to obvious risks to life. This suggests that they actually do possess values very similar to *humanist values, and that their assertions otherwise are tribal cheering.

(It may be that my roommate was just signaling extreme devotion in a misguided attempt to impress his crush, who was the leader of the college Christian club.)

Note that one can be a religious Christian and still act that way. Catholics consider taking deliberately risky behavior like that to itself be sinful for example.

I haven't had a chance to read the report fully yet but my immediate reaction from looking at the first two definitions given is that they don't rely on a specific moral framework or class of moral framework whereas the proposed definition seems to rely on some utilitarian or close to utilitarian notion.

Edit: Now read the whole thing, and it would be nice to have some substantial address to the issue raised in the above paragraph. Also, calling this a technical report seems a little overblown given how short it is. And as a matter of signaling a formal bibliography would be nice.

I agree that it's short; now added this as a descriptor above. Technical report was the most appropriate category; they're usually longer.

... my immediate reaction from looking at the first two definitions given is that they don't rely on a specific moral framework or class of moral framework whereas the proposed definition seems to rely on some utilitarian or close to utilitarian notion.

We address this, saying:

A lot of the work of this definition is being done by the final couple of words. ‘Value’ refers simply to whatever it is we care about and want in the world, in the same way that ‘desirable future development’ worked in Bostrom’s definition.

What counts as an existential catastrophe does depend on the moral framework (which seems appropriate), but doesn't seem tied to any specific one. I agree that the simple definition (extinction) dodges anything like this, and that that is a point in its favour.

What counts as an existential catastrophe does depend on the moral framework (which seems appropriate), but doesn't seem tied to any specific one.

Different frameworks can definitely disagree on whether some events are catastrophes. E.g., a new World War erupting might seem a good thing to some who believe in the Rapture.

If you're saying that some nontrivial subset of potential catastrophes are universally regarded as such, then I think that should be substantiated. If OTOH you saying this is true as long as you ignore some parts of humanity, then you should specify which parts.