Good and bad ways to think about downside risks

by MichaelA10 min read11th Jun 202011 comments

17

Information HazardsWorld Modeling
Frontpage

This post was written for Convergence Analysis.

Many actions we could take to make the world better might also have negative effects, or might even be negative overall. In other words, altruistic actions often have downside risks. Perhaps, for example, that project you might start, career path you might pursue, or article you might write could lead to information hazards, memetic downside risks, or risks of diverting resources (such as money or attention) from more valuable things.[1]

So if I’m considering doing something to make the world better, but know it might have downside risks, what should I do? How should I even think about that question?

  1. Should I just stop worrying and go ahead, since I’m excited about this idea and my initial sense is that it’ll probably be beneficial?
  2. Or should I be especially concerned about ensuring my actions don’t cause any harm, and thus steer heavily against any action with downside risks, even if the expected value of that action still seems good?
  3. Or maybe that’s too strong - perhaps it might be fine for me to go ahead, and I certainly still want to, to have that positive impact. But maybe I have a duty to at least think carefully first, and to drop the idea if it is too risky, since these are policies I’d want people in general to comply with?
  4. Or maybe I shouldn’t see this as just a matter of “compliance” - as some “duty” separate from having a positive impact. Maybe I should see avoiding causing accidental harm as just as valuable as “actively making things better” - as just another core part of my efforts to do good?

We (Convergence) have observed different people seeming to implicitly use each of these four broad perspectives on downside risks.[2] We’ll refer to these perspectives as, respectively:

  1. The unconcerned perspective
  2. The harm-avoidance perspective[3]
  3. The compliance perspective
  4. The pure expected value (or pure EV) perspective

In this post, we’ll unpack these four perspectives, and we’ll argue in favour of using the pure EV perspective. Note that this doesn’t require always performing explicit EV calculations; often only a quick, qualitative, intuitive assessment of EV will be warranted. Relatedly, this article is not about methods for estimating EV (see here for links relevant to that topic). Instead, this article focuses on arguing that, in essence, one should consider both potential benefits and potential harms of an action (compared to the counterfactual), without ignoring downside risks, overweighting them, or feeling that avoiding them “limits one’s impact”.

We’ll take a typical, “approximately consequentialist” ethical framework as a starting assumption. We think that the pure EV perspective is the one which fits most naturally with such a framework. But our arguments for it will not be explicitly focused on philosophical debates.[4] Instead, we’ll use a more narrative, conversational, and emotion- or motivation-focused approach.

This is partly because we expect most readers to already accept an “approximately consequentialist” ethical framework, and so we don’t expect many readers to be deliberately and explicitly using the unconcerned, harm-avoidance, or compliance perspectives. In cases where readers are using those perspectives, we expect this to typically be implicit or unintentional.[5] Thus, we’re aiming less to “change people’s minds”, and more to explicitly highlight this menu of options, while making the pure EV perspective appetising to readers’ System 1s (rather than just their System 2s).

1. The unconcerned perspective

Say you’ve got this great opportunity or idea for a project/career path/article/something else. To make things concrete, let’s say a journalist has reached out to you requesting to interview you about AI safety. This interview really could make the world better, by highlighting the importance of the problem, pointing to some potential solutions, and hopefully attracting more funding, attention, and talent to the area. And that’s exciting - you could have a real impact!

But is there a chance you doing this interview could also have negative impacts (compared to the counterfactual, which might someone else being interviewed or no interview occurring)? For example, perhaps highlighting the importance of AI safety would also increase risks by making certain powerful actors more interested in AI (an “attention hazard”)? Or perhaps what you say would be spun as, or evolve into, a simplistic message that makes discussion of AI safety more generally look unfounded, overdramatic, or overly “sci-fi”? Could the interview’s impact even be negative overall? Have you really thought about the possible unintended consequences?

Maybe you don’t have to. Maybe it’d be really nice not to - you’re excited, and doing the interview seems like a good idea, and one that’s more promising than the other actions you have on the table. So maybe you can just go ahead.

In fact, maybe you should just go ahead - wouldn’t worrying about possible risks before every seemingly positive action, or avoiding any action that could be harmful, just paralyse would-be do-gooders, and actually leave the world worse off overall?

We think there’s some validity to this perspective. But it seems to set up a false dichotomy, as if a person’s only options are to:

  1. Go ahead with any actions that seem positive at first, and never think at all about downside risks; or
  2. Worry at length about every single action.

We could add a third option, consisting of the following heuristics (see here for more details):

  • For minor or routine actions (e.g., most conversations), don’t bother thinking at all about downside risks.

  • Typically limit thinking about downside risks to a quite quick check, and without really “worrying”.

    • For example, when the journalist reaches out to you, you notice this isn’t a situation you’ve encountered before, so you spend a minute quickly considering what harms might come from this action.
  • Think at more length about downside risks only in cases where it seems that would be worthwhile (e.g., before major actions, or when the quick check revealed there may be important downsides).

    • For example, when your minute of thought reveals some seemingly plausible downsides to the interview, you decide to think in more detail about the matter, and maybe consult with some people you trust.
  • When substantial downside risks are identified, consider either (a) abandoning the action or (b) taking a version of the action which has lower risks or which allows you to monitor the risks to inform whether to abandon the action later.

That third option, or something like it, doesn’t have to involve emotional anguish or analysis paralysis. Good deeds can still be done, and you can still feel good about doing them.

And ask yourself: Why might it feel nice to take the unconcerned perspective? Isn’t it so you’d get to keep feeling excitement about the positive impact this action might have? Well, if you follow something like that third option, you can still feel that excitement, in all the cases where the action does have a positive impact (in expectation).

You only lose that excitement in the cases where the action would actually be net negative in expectation.[6] But wasn’t your excitement all about the positive impact? Isn’t it therefore appropriate for that excitement to be lost if the impact isn’t positive? “That which can be destroyed by the truth should be.

2. The harm-avoidance perspective

Maybe you’re at the opposite extreme - maybe that third option doesn’t feel like enough. Maybe when you see us suggest you should sometimes not think at all about downside risks, and sometimes go ahead even if there are risks, you wonder: Isn’t that reckless? How could you bear making the world worse in some ways, or risking making it worse overall - how could you bear the chance of your own actions causing harm? Can the potential benefits really justify that?

We again think there’s some validity to this perspective. And we certainly understand the pull towards it. Intuitively, it can feel like there’s a strong asymmetry between causing harm and causing benefits, and a similar asymmetry between doing harm and “merely allowing” harm. It can feel like we have a strong duty to avoid actions that harm things, or that risk doing so, even when those actions are necessary “for the greater good”, or to prevent other harms.

From here, we could move into a debate over consequentialism vs non-consequentialism. But as we noted earlier, we’re taking an approximately consequentialist framework as a starting assumption, and primarily addressing people who we think share that framework, but who still feel a pull towards this harm-avoidance perspective. So we’ll instead try to hammer home just how paralysing and impractical it’d be to fully align your behaviours with this harm-avoidance perspective.

We live in a complicated world, populated with at least billions of beings we care about, each with various, often volatile, often conflicting preferences, each networked and interacting in myriad, often obscured, often implicit ways. And if you want to make that world better, that means you want to change it. If you poke, tweak, add, or remove any gear in that massive machine, the impacts won’t just be local and simple - there’ll be reverberations in distant corners you hadn’t ever thought about.

Those reverberations are certainly worth taking seriously. This is why it’s worth thinking about and trying to mitigate downside risks - including those far removed from the time, place, or intention of your action. This is also why we wrote a series of posts on downside risks.

But reverberations are always happening, with or without you. And anything you do will cause them. And so would whatever you interpret “not doing anything” to mean.

So are you really going to always stand by - let the machine keep grinding down whoever or whatever it may be grinding down; let it chug along towards whatever cliffs it may be chugging along towards - just to avoid doing any harm? Just to avoid any risk of it being you who “causes” harm - even if avoiding that risk means more harm happens?

Are you going to stand by even when your best guess is that an action really is positive in expectation?

For example, if you can see any downside risks from doing the interview on AI safety with that journalist, the harm-avoidance perspective would suggest definitely turning the interview down. This is even if, after seeking out the views of people you trust and putting in a lot of careful thought, it really does seem the downside risks are outweighed by the potential benefits. And this is even though, if the EV is indeed positive, it’s predictable that various other harms will occur if you turn the interview down, such as harms from someone less qualified being interviewed or from AI safety continuing to be neglected.

As you can probably tell, we advocate against this harm-avoidance perspective. We advocate for taking downside risks seriously - perhaps more seriously than most people currently do - but also for being willing to take action when the EV really is positive (as best you can tell).

3. The compliance perspective

Let’s say you’re convinced by the arguments above, so you’re ditching the harm-avoidance perspective, and you can go ahead with an action - that AI safety interview, for example - as long as its EV is positive. Excellent! You’re still pretty excited by the impact this interview could have.

But downside risks are a real concern, so you can’t take the unconcerned perspective either. You do have to do your due diligence, right? You’d advise others to think about the risks of their actions - you think that’s a good norm in general - so you guess you have a responsibility to comply with it too? And you guess that if the interview does turn out to seem too risky, you’d have to turn it down - however annoying it’d be to thereby give up this chance to possibly cause some positive impacts too.

We call this the “compliance” perspective. In some ways, this perspective actually seems pretty ok to us; depending on the details, it might not be “invalid” in any way, might not actually clash with a mostly consequentialist framework, and might not cause issues. But we think there are many people for whom this framework probably isn’t ideal, in terms of motivation.

That’s because the perspective frames caution about downside risks as something like a specific, external obligation, secondary to one’s real, main goal of having a positive impact. It frames caution as the sort of thing “a good person should do”, but not as itself good and impactful in the way that “doing a positive action” would be. It could make caution seem, on some emotional level, like a burdensome duty, getting in the way of the actually impactful things you really want to do.

And if caution does feel like just compliance, it might also feel frustrating and demotivating. So you might apply caution a little too rarely, or a little too lightly. You might convince yourself the risks are worthwhile a little too often. And bit by bit, we might see an accumulation of harms we could’ve prevented by taking a less frustrating, more intrinsically motivating perspective on downside risks.

How can we avoid those issues? By taking the pure expected value perspective. The next section will describe what it looks like to take that perspective.

4. The pure expected value perspective

Let’s start again from the top. This journalist has reached out to you requesting an interview on AI safety, and you realise this might be a way to make the world better. Fantastic! But you haven’t really thought about the downsides yet: it’s possible that, in expectation, the interview would be net negative. It’s also possible that it would be a bit negative, and that you can make some changes to mitigate those downsides.

No problem! Just assess the EV of you doing the interview (compared to the counterfactual), taking account of both its risks and its potential benefits, and adjusting for the unilateralist's curse where necessary. This could involve anything from a quick check to a detailed, lengthy assessment involving input from others; consider how high the value of information would be.[7] And this may or may not involve explicit, quantitative estimates. For example, you might simply spend 30 minutes considering the various effects the interview might have, qualitatively weighing up how probable and how good or bad they are, and arriving at an overall sense of whether the benefits outweigh the risks. [8]

If, after that, you’re fairly confident the interview’s impacts really are net positive in expectation - great, go ahead and do it!

If it seems the expected impacts could be made net positive (or more positive) if you modify the action to reduce its risks or allow you to monitor them - great, go ahead and do that! As noted above, in the interview example, this could include things like asking to provide written rather than verbal answers, and running those answers by people you trust.

If it seems the interview’s impacts are net-negative in expectation, and that that can’t be fixed by just modifying the action to mitigate or monitor those risks - well, maybe that’s not great, but it’s definitely great you found out! Think about all the harm you prevented by assessing the downside risks so you can now avoid going through with this action! Think about how much better your decision to be cautious has made the world! And remember that there’s still a vast array of other potential actions you could take - that wasn’t your only shot to make a difference. (Plus, if the expected effects of the action you were considering look net negative on close inspection, yet this wasn’t obvious from the start, you might be able to do further good by writing up and sharing these insights with other people.)

The “pure EV” perspective rejects the unconcerned perspective’s strange inclination to avoid looking too closely at the potential risks of a planned action in case that’d burst your bubble. It also rejects the harm-avoidance perspective’s emphasis on steering clear of any action with any downside risk, based on perceived asymmetries between causing harm and causing good, or between doing and allowing harm. Further, it rejects the compliance perspective’s sense that preventing downside risks from your actions is some additional, secondary principle you have a duty to comply with, rather than a core part of how you can positively impact the world.

In place of those things, this perspective simply says to:

  1. Invest an appropriate level of effort into working out the EV of an action and into thinking of ways to improve that EV (such as through mitigating and/or monitoring any downside risks).
  2. Take the action (or the best version of it) if that EV is positive (after adjusting for the unilateralist's curse, where necessary).
  3. Feel good about both steps of that process.

Thus, this is the perspective we use ourselves, and the one we recommend. We hope that it will help us and others strike the best balance between taking actions that are worth taking and avoiding actions that truly are too risky.

Closing remarks

We hope this post will be helpful in your efforts to improve the world. Additionally, if you do subscribe to a mostly consequentialist framework, and yet feel a pull you’d rather not feel towards the unconcerned, harm-avoidance, or compliance perspectives, we hope this post will help you better align your patterns of thought and feeling with those which you aim to cultivate.

For discussion of types of downsides risks, situations in which they’re most likely to occur, how and when to assess them, or how to prevent or mitigate them, see other posts in this sequence.

My thanks to Justin Shovelain for helping develop the ideas in this post, and to Justin, David Kristoffersson, Olga Babeeva, and Max Dalton for helpful feedback. This does not imply their endorsement of all points made.


  1. See here for additional sources on downside risks and accidental harm. ↩︎

  2. Of course, it’s also possible to use different versions of these perspectives, to use combinations of multiple perspectives at the same time, or to switch between different perspectives in different situations. ↩︎

  3. There is also a personality trait called “harm avoidance”, which is not what we’re referring to in this post. ↩︎

  4. Some of the debates that are relevant here, but which we won’t explicitly address, are those regarding consequentialism vs non-consequentialism, doing vs allowing harm, the acts/omissions doctrine, ethical offsetting, “excited” vs “obligatory” altruism, and risk-neutral vs risk-averse vs risk-seeking preferences. ↩︎

  5. A person’s use of one of those three perspectives could perhaps result from habits and intuitions shaped by “common sense”, the behaviours and attitudes of people around the person, or experience in fields that lack a long-term, altruistic ethic. ↩︎

  6. You may also lose some of the excitement in cases where the action still seems net-positive in expectation, but less so, due to risks that are notable but not overwhelming. But that partial loss of excitement would seem to us like an instance of emotions tracking reality appropriately. ↩︎

  7. For example, it’s worth putting more effort into assessing the EV of an action the more uncertain you are about the probability and value/disvalue of effects the action might have, the bigger those probabilities and values/disvalues might be, and the likelier that extra effort is to resolve those uncertainties. ↩︎

  8. As noted earlier, this post is not focused on methods for estimating EV, and more information on that can be found in the sources linked to here. ↩︎

17