1. The challenge of unawareness for impartial altruist action guidance: Introduction

[-]Alexei6mo66

If you would like to increase engagement with your posts, I’d highly recommend not posting all of them at once, especially because they’re long. Post the first one, see how people respond. Then adjust and post the second one next week.

[-]Anthony DiGiovanni6mo43

(Note: A mod moved the subsequent posts to drafts for this reason. I'll repost them spaced out.)

[-]David Gross6mo30

I have just a superficial familiarity with the lit around this, and I'm wondering if what you're calling "unawareness" is the same concept as what other people have been calling "cluelessness" in this context, or if it is distinct in some way. They seem at least similar.

In any case, thanks for trying to set forth in a rigorous way this problem with the EA project.

[-]Anthony DiGiovanni6mo32

Thanks!

People use "cluelessness" to mean various importantly different things, which is why I de-emphasized that term in this sequence. I think unawareness is a (major) source of what Greaves called complex cluelessness, which is a situation where:

(CC1) We have some reasons to think that the unforeseeable consequences of A1 would
systematically tend to be substantially better than those of A2;
(CC2) We have some reasons to think that the unforeseeable consequences of A2 would
systematically tend to be substantially better than those of A1;
(CC3) It is unclear how to weigh up these reasons against one another.

(It's a bit unclear how "unforeseeable" is defined. In context / in the usual ways people tend to talk about complex cluelessness, I think it's meant to encompass cases where the problem isn't unawareness but rather other obstacles to setting precise credences.)

But unawareness itself means "many possible consequences of our actions haven’t even occurred to us in much detail, if at all" (as unpacked in the introduction section). ETA: I think it's important to conceptually separate this from complex cluelessness, because you might think unawareness is a challenge that demands a response beyond straightforward Bayesianism, even if you disagree that it implies complex cluelessness.

[-]Antoine de Scorraille6mo-10

Just skimmed the post. Seems your notion of "unawareness" shares a cluster alongside with Knightian uncertainty and non-realizability in decision and learning theory.

[-]Anthony DiGiovanni6mo10

There are indeed connections between these ideas, but I think it's very important not to round unawareness off to either of those two. Unawareness is its own epistemic problem with its own implications. (E.g., it's not the same as non-realizability because there are many hypotheses that are not self-referential of which we're unaware/coarsely aware.)

^{^}

See, respectively, (e.g.) Tomasik (broad interventions); Christiano and Karnofsky (simple arguments); and Greaves and MacAskill (2021, Sec. 4) (lock-in, research, and saving).

^{^}

This problem is related to, but distinct from, “complex cluelessness” as framed in Greaves (2016) and Mogensen (2020). Mogensen argues that our credences about far-future events should be so imprecise that it’s indeterminate whether, e.g., donating to AMF is net-good. I find his argument compelling (and some of my arguments in the final post bolster it). However, to my knowledge, no existing case for cluelessness has acknowledged unawareness as a distinct epistemic challenge, except the brief treatment in Roussos (2021).

^{^}

E.g., EV(A)−EV(B)=[−1,2].

^{^}

Remarks:

“Nontrivial moral weight to distant consequences” is deliberately vague. I mean to include not only unbounded total utilitarianism, but also various bounded-yet-scope-sensitive value functions (see Karnofsky, section “Holden vs. hardcore utilitarianism”, and Ngo). I also bracket infinite ethics shenanigans.
Here, a “possible world” is a possible way the entire cosmos could be, not the kind of “possible world” referred to in the mathematical universe hypothesis (which says that all mathematically possible worlds exist). Strictly speaking, we don’t need to know everything about a possible world to precisely evaluate it by impartial altruist lights, but nothing about my argument hinges on this point.

^{^}

These two problems correspond to “coarse awareness” and “restricted awareness”, respectively, from Paul and Quiggin (2018, Sec. 4.1). For other formal models of unawareness, see, e.g., Bradley (2017, Sec. 12.3), Steele and Stefánsson (2021), and de Canson (2024).

^{^}

Remarks:

“In enough detail” is key. Suppose you tried to, say, implicitly specify all physically possible worlds via a set of initial conditions and dynamical laws. You still wouldn’t conceive of what these worlds are like concretely, thus you wouldn’t know how to evaluate them.
Technically, we could dissolve problem (2) by partitioning the set of possible worlds into, say, “misaligned ASI takes over” and “misaligned ASI doesn’t take over”. However, as we’ll see in the third post, when we evaluate these hypotheses, we’ll do so by considering a range of more concrete sub-hypotheses that we could assign more precise values to (like “misaligned ASI takes over and tiles the lightcone with paperclips”, etc.). Then, in practice, we’ll still leave out relevant possibilities at some level. Thus it will be helpful to model ourselves as unaware of some hypotheses.

^{^}

For more, see Meacham and Weisberg (2014, Sec. 4), Hájek (2008, Sec. 3), and this post.

^{^}

Cf. the arguments for EV maximization in Easwaran (2014) and Sec. III of Carlsmith.

^{^}

I use the example of AI control purely for illustration, not because of anything unique to AI control. Note that I’ll address the “focus on near-term lock-in” approach more precisely in the third and final posts.

^{^}

Example of restricted awareness: What if we’re completely missing some way a space-colonizing civilization’s philosophical attitudes affect how much value it produces?

^{^}

Example of coarse awareness: How do we weigh up the likelihoods of these fuzzily sketched pathways from the intervention to an AI takeover event?

^{^}

You might think, “My values are ultimately arbitrary in a sense. I have the values I have because of flukes of my biology, culture, etc.” This is not what I mean by “arbitrary”. A choice is “arbitrary” to the extent that it’s made for no (defensible) reason. Insofar as you make decisions based on impartial altruistic values, those values alone don’t tell you how to evaluate a given hypothesis, as we’ve seen. I’ll say a bit more on how I’m thinking about arbitrariness next time.

LESSWRONG
LW

LESSWRONG
LW

47

1. The challenge of unawareness for impartial altruist action guidance: Introduction

47

47

Sequence summary

Bird’s-eye view of the sequence

Introduction to unawareness

Unawareness vs. uncertainty

Why not just do what works?

Case study: Severe unawareness in AI safety

Vignette

Where this leaves us

Acknowledgments

References