This post was written for Convergence Analysis.

Many actions we could take to make the world better might also have negative effects, or might even be negative overall. In other words, altruistic actions often have downside risks. Perhaps, for example, that project you might start, career path you might pursue, or article you might write could lead to information hazards, memetic downside risks, or risks of diverting resources (such as money or attention) from more valuable things.[1]

So if I’m considering doing something to make the world better, but know it might have downside risks, what should I do? How should I even think about that question?

  1. Should I just stop worrying and go ahead, since I’m excited about this idea and my initial sense is that it’ll probably be beneficial?
  2. Or should I be especially concerned about ensuring my actions don’t cause any harm, and thus steer heavily against any action with downside risks, even if the expected value of that action still seems good?
  3. Or maybe that’s too strong - perhaps it might be fine for me to go ahead, and I certainly still want to, to have that positive impact. But maybe I have a duty to at least think carefully first, and to drop the idea if it is too risky, since these are policies I’d want people in general to comply with?
  4. Or maybe I shouldn’t see this as just a matter of “compliance” - as some “duty” separate from having a positive impact. Maybe I should see avoiding causing accidental harm as just as valuable as “actively making things better” - as just another core part of my efforts to do good?

We (Convergence) have observed different people seeming to implicitly use each of these four broad perspectives on downside risks.[2] We’ll refer to these perspectives as, respectively:

  1. The unconcerned perspective
  2. The harm-avoidance perspective[3]
  3. The compliance perspective
  4. The pure expected value (or pure EV) perspective

In this post, we’ll unpack these four perspectives, and we’ll argue in favour of using the pure EV perspective. Note that this doesn’t require always performing explicit EV calculations; often only a quick, qualitative, intuitive assessment of EV will be warranted. Relatedly, this article is not about methods for estimating EV (see here for links relevant to that topic). Instead, this article focuses on arguing that, in essence, one should consider both potential benefits and potential harms of an action (compared to the counterfactual), without ignoring downside risks, overweighting them, or feeling that avoiding them “limits one’s impact”.

We’ll take a typical, “approximately consequentialist” ethical framework as a starting assumption. We think that the pure EV perspective is the one which fits most naturally with such a framework. But our arguments for it will not be explicitly focused on philosophical debates.[4] Instead, we’ll use a more narrative, conversational, and emotion- or motivation-focused approach.

This is partly because we expect most readers to already accept an “approximately consequentialist” ethical framework, and so we don’t expect many readers to be deliberately and explicitly using the unconcerned, harm-avoidance, or compliance perspectives. In cases where readers are using those perspectives, we expect this to typically be implicit or unintentional.[5] Thus, we’re aiming less to “change people’s minds”, and more to explicitly highlight this menu of options, while making the pure EV perspective appetising to readers’ System 1s (rather than just their System 2s).

1. The unconcerned perspective

Say you’ve got this great opportunity or idea for a project/career path/article/something else. To make things concrete, let’s say a journalist has reached out to you requesting to interview you about AI safety. This interview really could make the world better, by highlighting the importance of the problem, pointing to some potential solutions, and hopefully attracting more funding, attention, and talent to the area. And that’s exciting - you could have a real impact!

But is there a chance you doing this interview could also have negative impacts (compared to the counterfactual, which might someone else being interviewed or no interview occurring)? For example, perhaps highlighting the importance of AI safety would also increase risks by making certain powerful actors more interested in AI (an “attention hazard”)? Or perhaps what you say would be spun as, or evolve into, a simplistic message that makes discussion of AI safety more generally look unfounded, overdramatic, or overly “sci-fi”? Could the interview’s impact even be negative overall? Have you really thought about the possible unintended consequences?

Maybe you don’t have to. Maybe it’d be really nice not to - you’re excited, and doing the interview seems like a good idea, and one that’s more promising than the other actions you have on the table. So maybe you can just go ahead.

In fact, maybe you should just go ahead - wouldn’t worrying about possible risks before every seemingly positive action, or avoiding any action that could be harmful, just paralyse would-be do-gooders, and actually leave the world worse off overall?

We think there’s some validity to this perspective. But it seems to set up a false dichotomy, as if a person’s only options are to:

  1. Go ahead with any actions that seem positive at first, and never think at all about downside risks; or
  2. Worry at length about every single action.

We could add a third option, consisting of the following heuristics (see here for more details):

  • For minor or routine actions (e.g., most conversations), don’t bother thinking at all about downside risks.

  • Typically limit thinking about downside risks to a quite quick check, and without really “worrying”.

    • For example, when the journalist reaches out to you, you notice this isn’t a situation you’ve encountered before, so you spend a minute quickly considering what harms might come from this action.
  • Think at more length about downside risks only in cases where it seems that would be worthwhile (e.g., before major actions, or when the quick check revealed there may be important downsides).

    • For example, when your minute of thought reveals some seemingly plausible downsides to the interview, you decide to think in more detail about the matter, and maybe consult with some people you trust.
  • When substantial downside risks are identified, consider either (a) abandoning the action or (b) taking a version of the action which has lower risks or which allows you to monitor the risks to inform whether to abandon the action later.

That third option, or something like it, doesn’t have to involve emotional anguish or analysis paralysis. Good deeds can still be done, and you can still feel good about doing them.

And ask yourself: Why might it feel nice to take the unconcerned perspective? Isn’t it so you’d get to keep feeling excitement about the positive impact this action might have? Well, if you follow something like that third option, you can still feel that excitement, in all the cases where the action does have a positive impact (in expectation).

You only lose that excitement in the cases where the action would actually be net negative in expectation.[6] But wasn’t your excitement all about the positive impact? Isn’t it therefore appropriate for that excitement to be lost if the impact isn’t positive? “That which can be destroyed by the truth should be.

2. The harm-avoidance perspective

Maybe you’re at the opposite extreme - maybe that third option doesn’t feel like enough. Maybe when you see us suggest you should sometimes not think at all about downside risks, and sometimes go ahead even if there are risks, you wonder: Isn’t that reckless? How could you bear making the world worse in some ways, or risking making it worse overall - how could you bear the chance of your own actions causing harm? Can the potential benefits really justify that?

We again think there’s some validity to this perspective. And we certainly understand the pull towards it. Intuitively, it can feel like there’s a strong asymmetry between causing harm and causing benefits, and a similar asymmetry between doing harm and “merely allowing” harm. It can feel like we have a strong duty to avoid actions that harm things, or that risk doing so, even when those actions are necessary “for the greater good”, or to prevent other harms.

From here, we could move into a debate over consequentialism vs non-consequentialism. But as we noted earlier, we’re taking an approximately consequentialist framework as a starting assumption, and primarily addressing people who we think share that framework, but who still feel a pull towards this harm-avoidance perspective. So we’ll instead try to hammer home just how paralysing and impractical it’d be to fully align your behaviours with this harm-avoidance perspective.

We live in a complicated world, populated with at least billions of beings we care about, each with various, often volatile, often conflicting preferences, each networked and interacting in myriad, often obscured, often implicit ways. And if you want to make that world better, that means you want to change it. If you poke, tweak, add, or remove any gear in that massive machine, the impacts won’t just be local and simple - there’ll be reverberations in distant corners you hadn’t ever thought about.

Those reverberations are certainly worth taking seriously. This is why it’s worth thinking about and trying to mitigate downside risks - including those far removed from the time, place, or intention of your action. This is also why we wrote a series of posts on downside risks.

But reverberations are always happening, with or without you. And anything you do will cause them. And so would whatever you interpret “not doing anything” to mean.

So are you really going to always stand by - let the machine keep grinding down whoever or whatever it may be grinding down; let it chug along towards whatever cliffs it may be chugging along towards - just to avoid doing any harm? Just to avoid any risk of it being you who “causes” harm - even if avoiding that risk means more harm happens?

Are you going to stand by even when your best guess is that an action really is positive in expectation?

For example, if you can see any downside risks from doing the interview on AI safety with that journalist, the harm-avoidance perspective would suggest definitely turning the interview down. This is even if, after seeking out the views of people you trust and putting in a lot of careful thought, it really does seem the downside risks are outweighed by the potential benefits. And this is even though, if the EV is indeed positive, it’s predictable that various other harms will occur if you turn the interview down, such as harms from someone less qualified being interviewed or from AI safety continuing to be neglected.

As you can probably tell, we advocate against this harm-avoidance perspective. We advocate for taking downside risks seriously - perhaps more seriously than most people currently do - but also for being willing to take action when the EV really is positive (as best you can tell).

3. The compliance perspective

Let’s say you’re convinced by the arguments above, so you’re ditching the harm-avoidance perspective, and you can go ahead with an action - that AI safety interview, for example - as long as its EV is positive. Excellent! You’re still pretty excited by the impact this interview could have.

But downside risks are a real concern, so you can’t take the unconcerned perspective either. You do have to do your due diligence, right? You’d advise others to think about the risks of their actions - you think that’s a good norm in general - so you guess you have a responsibility to comply with it too? And you guess that if the interview does turn out to seem too risky, you’d have to turn it down - however annoying it’d be to thereby give up this chance to possibly cause some positive impacts too.

We call this the “compliance” perspective. In some ways, this perspective actually seems pretty ok to us; depending on the details, it might not be “invalid” in any way, might not actually clash with a mostly consequentialist framework, and might not cause issues. But we think there are many people for whom this framework probably isn’t ideal, in terms of motivation.

That’s because the perspective frames caution about downside risks as something like a specific, external obligation, secondary to one’s real, main goal of having a positive impact. It frames caution as the sort of thing “a good person should do”, but not as itself good and impactful in the way that “doing a positive action” would be. It could make caution seem, on some emotional level, like a burdensome duty, getting in the way of the actually impactful things you really want to do.

And if caution does feel like just compliance, it might also feel frustrating and demotivating. So you might apply caution a little too rarely, or a little too lightly. You might convince yourself the risks are worthwhile a little too often. And bit by bit, we might see an accumulation of harms we could’ve prevented by taking a less frustrating, more intrinsically motivating perspective on downside risks.

How can we avoid those issues? By taking the pure expected value perspective. The next section will describe what it looks like to take that perspective.

4. The pure expected value perspective

Let’s start again from the top. This journalist has reached out to you requesting an interview on AI safety, and you realise this might be a way to make the world better. Fantastic! But you haven’t really thought about the downsides yet: it’s possible that, in expectation, the interview would be net negative. It’s also possible that it would be a bit negative, and that you can make some changes to mitigate those downsides.

No problem! Just assess the EV of you doing the interview (compared to the counterfactual), taking account of both its risks and its potential benefits, and adjusting for the unilateralist's curse where necessary. This could involve anything from a quick check to a detailed, lengthy assessment involving input from others; consider how high the value of information would be.[7] And this may or may not involve explicit, quantitative estimates. For example, you might simply spend 30 minutes considering the various effects the interview might have, qualitatively weighing up how probable and how good or bad they are, and arriving at an overall sense of whether the benefits outweigh the risks. [8]

If, after that, you’re fairly confident the interview’s impacts really are net positive in expectation - great, go ahead and do it!

If it seems the expected impacts could be made net positive (or more positive) if you modify the action to reduce its risks or allow you to monitor them - great, go ahead and do that! As noted above, in the interview example, this could include things like asking to provide written rather than verbal answers, and running those answers by people you trust.

If it seems the interview’s impacts are net-negative in expectation, and that that can’t be fixed by just modifying the action to mitigate or monitor those risks - well, maybe that’s not great, but it’s definitely great you found out! Think about all the harm you prevented by assessing the downside risks so you can now avoid going through with this action! Think about how much better your decision to be cautious has made the world! And remember that there’s still a vast array of other potential actions you could take - that wasn’t your only shot to make a difference. (Plus, if the expected effects of the action you were considering look net negative on close inspection, yet this wasn’t obvious from the start, you might be able to do further good by writing up and sharing these insights with other people.)

The “pure EV” perspective rejects the unconcerned perspective’s strange inclination to avoid looking too closely at the potential risks of a planned action in case that’d burst your bubble. It also rejects the harm-avoidance perspective’s emphasis on steering clear of any action with any downside risk, based on perceived asymmetries between causing harm and causing good, or between doing and allowing harm. Further, it rejects the compliance perspective’s sense that preventing downside risks from your actions is some additional, secondary principle you have a duty to comply with, rather than a core part of how you can positively impact the world.

In place of those things, this perspective simply says to:

  1. Invest an appropriate level of effort into working out the EV of an action and into thinking of ways to improve that EV (such as through mitigating and/or monitoring any downside risks).
  2. Take the action (or the best version of it) if that EV is positive (after adjusting for the unilateralist's curse, where necessary).
  3. Feel good about both steps of that process.

Thus, this is the perspective we use ourselves, and the one we recommend. We hope that it will help us and others strike the best balance between taking actions that are worth taking and avoiding actions that truly are too risky.

Closing remarks

We hope this post will be helpful in your efforts to improve the world. Additionally, if you do subscribe to a mostly consequentialist framework, and yet feel a pull you’d rather not feel towards the unconcerned, harm-avoidance, or compliance perspectives, we hope this post will help you better align your patterns of thought and feeling with those which you aim to cultivate.

For discussion of types of downsides risks, situations in which they’re most likely to occur, how and when to assess them, or how to prevent or mitigate them, see other posts in this sequence.

My thanks to Justin Shovelain for helping develop the ideas in this post, and to Justin, David Kristoffersson, Olga Babeeva, and Max Dalton for helpful feedback. This does not imply their endorsement of all points made.

  1. See here for additional sources on downside risks and accidental harm. ↩︎

  2. Of course, it’s also possible to use different versions of these perspectives, to use combinations of multiple perspectives at the same time, or to switch between different perspectives in different situations. ↩︎

  3. There is also a personality trait called “harm avoidance”, which is not what we’re referring to in this post. ↩︎

  4. Some of the debates that are relevant here, but which we won’t explicitly address, are those regarding consequentialism vs non-consequentialism, doing vs allowing harm, the acts/omissions doctrine, ethical offsetting, “excited” vs “obligatory” altruism, and risk-neutral vs risk-averse vs risk-seeking preferences. ↩︎

  5. A person’s use of one of those three perspectives could perhaps result from habits and intuitions shaped by “common sense”, the behaviours and attitudes of people around the person, or experience in fields that lack a long-term, altruistic ethic. ↩︎

  6. You may also lose some of the excitement in cases where the action still seems net-positive in expectation, but less so, due to risks that are notable but not overwhelming. But that partial loss of excitement would seem to us like an instance of emotions tracking reality appropriately. ↩︎

  7. For example, it’s worth putting more effort into assessing the EV of an action the more uncertain you are about the probability and value/disvalue of effects the action might have, the bigger those probabilities and values/disvalues might be, and the likelier that extra effort is to resolve those uncertainties. ↩︎

  8. As noted earlier, this post is not focused on methods for estimating EV, and more information on that can be found in the sources linked to here. ↩︎

New Comment
12 comments, sorted by Click to highlight new comments since: Today at 10:37 AM

Do not neglect the impacts of the obverse action. If you decline the interview, what impact will that have? Maybe the reporter will pick someone else to interview (so, are you a better or worse candidate than whoever their next contact is?), or just put "could not be reached for comment" (what impact does that have on the rest of the article?)

Good point. I intended "compared to the counterfactual" to be implicit throughout this article, as that's really what "impact" should always mean. I also briefly alluded to it in saying "such as harms from someone less qualified being interviewed". 

But it's true that many people don't naturally interpret "impact" as "compared to the counterfactual", and that it's often worth highlighting explicitly that that's the relevant comparison. 

To address that, I've now sprinkled in a few mentions of "compared to the counterfactual". Thanks for highlighting this :)

Very good point! The effect of not taking an action depends on what the counterfactual is: what would happen otherwise/anyway. Maybe the article should note this.

Nice post!

I would like to highlight that a naive application of the expected value perspective could lead to problems like the unilateralist's curse and think that the post would be even more useful for readers who are new to these kinds of considerations if it discussed that more explicitly (or linked to relevant other posts prominently).

Very good point! Thanks for raising it. I think this was an important oversight, and one I'm surprised I made, as I think the unilateralist's curse is a very useful concept and I've previously collected some sources on it.

To rectify that, I've now added two mentions of the curse (with links) in the section on the pure EV perspective.

When it comes to the downside risk, it's often that there are more unknown unknown that produce harm then positive unknown unknown. People are usually biased to overestimate the positive effects and underestimate the negative effects for the known unknown. 

As such it's worth to err on minimizing harm to some extend. 

When it comes to the downside risk, it's often that there are more unknown unknown that produce harm then positive unknown unknown. People are usually biased to overestimate the positive effects and underestimate the negative effects for the known unknown.

This seems plausible to me. Would you like to expand on why you think this is the case?

The asymmetry between creation and destruction? (I.e., it's harder to build than it is to destroy.)

There are multiple reasons. Let's say you have nine different courses of action and all have utility -1. You have some error function when evaluating the utility of the actions and you think the options have utilities -5, -4, -3, -2, -1, 0, 1, 2, 3. All the negative options won't be on your mind and you will only think about doing those options that score highly. 

Even if you have some options that are actually benefitial if your evaluation function has enough noise, the fact that you don't put any attention on the options that score negatively means that the options that you do consider are biased. 

Confirmation bias will make you further want to believe that the option that you persue are positive. 

Most systems in our modern world are not anti-fragile and suffer if you expose them to random noise. 

I find the idea in those first two paragraphs quite interesting. It seems plausible, and isn't something I'd thought of before. It sounds like it's essentially applying the underlying idea of the optimiser's/winner's/unilateralist's curse to one person evaluating a set of options, rather than to a set of people evaluating one option? 

I also think confirmation bias or related things will tend to bias people towards thinking options they've picked, or are already leaning towards picking, are good. Though it's less clear that confirmation bias will play a role when a person has only just began evaluating the options.

Most systems in our modern world are not anti-fragile and suffer if you expose them to random noise. 

This sounds more like a reason why many actions (or a "random action") will make things worse (which seems quite plausible to me), rather than a reason why people would be biased to overestimate benefits and underestimate harms from actions. Though I guess perhaps people's failure to recognise this reason why many/random actions may make things worse, despite this reason being real, will then lead to them systematically overestimating how positive actions will be.

In any case, I can also think of biases that could push in the opposite direction. E.g., negativity bias and status quo bias. My guess would be there are some people and domains where, on net, there tends to be a bias towards overestimating the value of actions, and some people and domains where the opposite is true. And I doubt we could get a strong sense of how it all plays out just by theorising; we'd need some empirical work. (Incidentally, Convergence should also be releasing a somewhat related post soon, which will outline 5 potential causes of too little caution about information hazards, and 5 potential causes of too much caution.)

Finally, it seems worth noting that, if we do have reason to believe that, by default, people tend to overestimate the benefits and underestimate the harms that an action will cause, that wouldn't necessarily mean we should abandon the pure EV perspective. Instead, we could just incorporate an adjustment to our naive EV assessments to account for that tendency/bias, in the same way we should adjust for the unilateralist's curse in many situations. And the same would be true if it turned out that, by default, people had the opposite bias. (Though if there are these biases, that could mean it'd be unwise to promote the pure EV perspective without also highlighting the bias that needs adjusting for.)

As far as the third point goes for most non-anti-fragile systems the effects of unknown unknowns are more likely to be harmful then benefitial.

Yes, this seems plausible to me. What I was saying is that that would be a reason why the EV of arbitrary actions might often be negative, rather than directly being a reason why people will overestimate the EV of arbitrary actions. The claim "People should take the pure EV perspective" is consistent with the claim "A large portion of actions have negative EV and shouldn't be taken". This is because taking the pure EV perspective would involve assessing both the benefits and risks (which could include adjusting for the chance of many unknown unknowns that would lead to harm), and then deciding against doing actions that appear negative.