If you would like to increase engagement with your posts, I’d highly recommend not posting all of them at once, especially because they’re long. Post the first one, see how people respond. Then adjust and post the second one next week.
(Note: A mod moved the subsequent posts to drafts for this reason. I'll repost them spaced out.)
I have just a superficial familiarity with the lit around this, and I'm wondering if what you're calling "unawareness" is the same concept as what other people have been calling "cluelessness" in this context, or if it is distinct in some way. They seem at least similar.
In any case, thanks for trying to set forth in a rigorous way this problem with the EA project.
Thanks!
People use "cluelessness" to mean various importantly different things, which is why I de-emphasized that term in this sequence. I think unawareness is a (major) source of what Greaves called complex cluelessness, which is a situation where:
(CC1) We have some reasons to think that the unforeseeable consequences of A1 would
systematically tend to be substantially better than those of A2;
(CC2) We have some reasons to think that the unforeseeable consequences of A2 would
systematically tend to be substantially better than those of A1;
(CC3) It is unclear how to weigh up these reasons against one another.
(It's a bit unclear how "unforeseeable" is defined. In context / in the usual ways people tend to talk about complex cluelessness, I think it's meant to encompass cases where the problem isn't unawareness but rather other obstacles to setting precise credences.)
But unawareness itself means "many possible consequences of our actions haven’t even occurred to us in much detail, if at all" (as unpacked in the introduction section). ETA: I think it's important to conceptually separate this from complex cluelessness, because you might think unawareness is a challenge that demands a response beyond straightforward Bayesianism, even if you disagree that it implies complex cluelessness.
Just skimmed the post. Seems your notion of "unawareness" shares a cluster alongside with Knightian uncertainty and non-realizability in decision and learning theory.
There are indeed connections between these ideas, but I think it's very important not to round unawareness off to either of those two. Unawareness is its own epistemic problem with its own implications. (E.g., it's not the same as non-realizability because there are many hypotheses that are not self-referential of which we're unaware/coarsely aware.)
(This sequence assumes basic familiarity with longtermist cause prioritization concepts, though the issues I raise also apply to non-longtermist interventions.)
Are EA interventions net-positive from an impartial perspective — one that gives moral weight to all consequences, no matter how distant? What if they’re neither positive, nor negative, nor neutral?
Trying to reduce x-risk, improve institutions, or end factory farming might seem robustly positive. After all, we don’t need to be certain in order to do good in expectation. But when we step back to look at how radically far-reaching the impartial perspective is, we start to see a deeper problem than “uncertainty”. This problem is unawareness: many possible consequences of our actions haven’t even occurred to us in much detail, if at all.
Why is unawareness a serious challenge for impartial altruists? Well, impartiality entails that we account for all moral patients, and all the most significant impacts we could have on them. Here’s a glimpse of what such an accounting might require:
How likely is it that we’re missing insights as “big if true” as the discovery of other galaxies, the possibility of digital sentience, or the grabby aliens hypothesis? What’s the expected value of preventing extinction, given these insights? Or set aside long-term predictions for now. What are all the plausible pathways by which we might prevent or cause extinction by, say, designing policies for an intelligence explosion? Not just the pathways we find most salient.
It’s no secret that the space of possible futures is daunting. But whenever we make decisions based on the impartial good, we’re making tradeoffs between these futures. Imagine what it would take to grasp such tradeoffs. Imagine an agent who could conceive of a representative sample of these futures, in fairly precise detail. This agent might still be highly uncertain which future will play out. They might even be cognitively bounded. And yet, if they claimed, “I choose to do A instead of B because A has better expected consequences”, they’d have actually factored in the most important consequences. All the heavy weights would be on the scales.
And then there’s us. We don’t merely share this agent’s limitations. Rather, we are also unaware (or at best only coarsely aware) of many possible outcomes that could dominate our net impact. How, then, can we weigh up these possibilities when making choices from an impartial perspective? We have some rough intuitions about the consequences we’re unaware of, sure. But if those intuitions are so weakly grounded that we can’t even say whether they’re “better than chance” on net, any choices that rely on them might be deeply arbitrary. So whether we explicitly try to maximize EV, or instead follow heuristics or the like, we face the same structural problem. For us, it’s not clear what “better expected consequences” even means.
Of course, we don’t always need to be aware of every possible outcome to make justified decisions. With local goals in familiar domains (picking a gift for your old friend, treating your headache), our choices don’t seem to hinge on factors we’re unaware of. The goal of improving overall welfare across the cosmos, alas, isn’t so forgiving. For example, if you weren’t aware of the alignment problem, you might favor accelerating the development of artificial superintelligence (ASI) to prevent other x-risks! And when we try to concretely map out our impacts on the far future, or even on unprecedented near-term events like ASI takeoff, we find that sensitivity to unawareness isn’t the exception, but the rule.
This isn’t to say our unknown impacts are net-negative. But can we reasonably assume they cancel out in expectation? If not, we’ll need some way to make impartial tradeoffs between outcomes we’re unaware of, despite the ambiguity of those tradeoffs.
In response, the EA community has advocated supposedly “robust” strategies:[1] favoring broad interventions, trusting simple arguments, trying to prevent bad lock-in, doing research, saving for future opportunities… Yet it’s not clear why exactly we should consider these strategies “robust” to unawareness. Perhaps it’s simply indeterminate whether any act has better expected consequences than the alternatives. If so, impartial altruism would lose action-guiding force — not because of an exact balance among all strategies, but because of widespread indeterminacy.
In this sequence, I’ll argue that from an impartial perspective, unawareness undermines our justification for claiming any given strategy is better than another (including inaction). The core argument is simple: (i) Under some plausible ways of precisely evaluating strategies’ consequences, strategy A is better than B; (ii) under others, A is worse; and (iii) it’s arbitrary which precise values we use.[2] But we’ll take things slowly, to properly engage with various objections:
Yes, we ultimately have to choose something. Inaction is still a choice. But if my arguments hold up, our reason to work on EA causes is undermined. Concern for the impartial good would give us no more reason to work on these causes than to, say, prioritize our loved ones.
It’s tempting to ignore unawareness because of these counterintuitive implications. Surely “trying to stop AI from killing everyone is good” is more robust than some elaborate epistemic argument? Again, though, impartial altruism is a radical ethical stance. We’re aiming to give moral weight to consequences far more distant in space and time than our intuitions could reasonably track. Arguably, ASI takeoff will be so wildly unfamiliar that even in the near term, we can’t trust that trying to stop catastrophe will help more than harm. Is it so surprising that an unusual moral view, in an unusual situation, would give unusual verdicts?
I’m as sympathetic to impartial altruism as they come. It would be arbitrary to draw some line between the moral patients who “count” and those who don’t, at least as far as we’re trying to make the world a better place. But I think the way forward is not to deny our epistemic situation, but to reflect on it, and turn to other sources of moral guidance if necessary. Just because impartial altruism might remain silent, that need not mean your conscience does.
I’ll include each of the list items below as a “Key takeaway” at the start of the corresponding section of the sequence.
Our question is, what could justify the kind of claim in the blue box below? We’ll walk through what seem to be all the live options (white boxes), and see why unawareness pulls the rug out from under each one. (Don’t worry if you aren’t sure what all these options mean.)
Key takeaway
Unawareness consists of two problems: The possible outcomes we can conceive of are too coarse to precisely evaluate, and there are some outcomes we don’t conceive of in the first place.
What exactly is our epistemic predicament? And why is it incompatible with the standard framework for evaluating interventions, that is, expected value?
The challenge: As impartial altruists, we care about the balance of value over the whole future. But, for any future trajectory that might result from our actions, we either (1) don’t have anything close to a precise picture of that balance, or (2) don’t foresee the trajectory at all.
In broad strokes, we’ve acknowledged this challenge before. We know we might be missing crucial considerations, that we’d likely be way off-base if we were EAs in the 1800s, and so on. For the most part, however, we’ve only treated these facts as reasons for more epistemic humility. We haven’t asked, “Wait, how can we make tradeoffs between possible outcomes of our actions, if we can barely grasp those outcomes?” Let’s look more closely at that.
We’d like to choose between some strategies, based on impartial altruist values. We represent “impartial altruism” with a value function v that gives nontrivial moral weight to distant consequences (among other things). Since we’re uncertain which possible world will result given our strategy, we consider the set of all such worlds. For any world w in that set, our value function returns a number v(w) representing how good this world is. (If any of this raises an eyebrow, see footnote.[4])
If we were aware of all possible worlds, we could conceive of every feature relevant to their value (say, the welfare of every sentient being). Then we could make judgments like, “Weighing up all possible outcomes, donating to GiveDirectly seems to increase expected total welfare.” That’s a high bar, and I don’t claim we need to meet it. But whatever justification we give for decision-making based on v, we’ll need to reckon with two ways our situation is structurally messier:[5]
For some possible worlds, we are unaware of any hypotheses that contain them. That is, these hypotheses haven’t occurred to us in enough detail to factor into our decision-making. For example, we’re unaware of the sets of worlds where undiscovered crucial considerations are true. (You might wonder if we could get around this problem with more clever modeling; see footnote.[6])
Figure 1. The hypotheses the fully aware agent conceives of are fine-grained possible worlds (thin rectangles). And this agent is aware of all such worlds. An unaware agent conceives of coarse-grained hypotheses that lump together multiple possible worlds (thick rectangles), and these hypotheses don’t cover the full space of worlds (dotted rectangle). |
Fig. 1 illustrates both problems, which I’ll collectively call “unawareness”. Post #2 will explore these epistemic gaps much more concretely, from big-picture considerations to more mundane causal pathways. For now, what matters is how we differ in general from fully aware agents.
Key takeaway
Unlike standard uncertainty, under unawareness it’s unclear how to make tradeoffs between the possible outcomes we consider when making decisions.
These two problems might seem like mere flavors of regular old uncertainty. Why can’t we handle unawareness by just taking the expected value (perhaps with a Bayesian prior that bakes in common sense)? As Greaves and MacAskill (2021, Sec. 7.2) say:
Of course, failure to consider key fine-grainings might lead to different expected values and hence to different decisions, but this seems precisely analogous to the fact that failure to possess more information about which state in fact obtains similarly affects expected values (and hence decisions).
To respond to this, let’s revisit why we care about EV in the first place. A common answer: “Coherence theorems! If you can’t be modeled as maximizing EU, you’re shooting yourself in the foot.” For our purposes, the biggest problem with this answer is: Suppose we act as if we maximize the expectation of some utility function. This doesn’t imply we make our decisions by following the procedure “use our impartial altruistic value function to (somehow) assign a number to each hypothesis, and maximize the expectation”.[7]
So why, and when, is that particular procedure appealing? When you’re aware of every possible world relevant to your value function, then even if you’re uncertain, it’s at least clear how to evaluate each world. In turn, it’s clear how the worlds trade off against each other, e.g., “B saves 2 more lives than A in world X; but A saves 1 more life than B in world Y, which is 3 times as likely; so A has better consequences”. Your knowledge of these tradeoffs justifies the use of EV to represent how systematically good your action’s consequences are.[8]
Whereas, if you’re unaware or only coarsely aware of some possible worlds, how do you tell what tradeoffs you’re making? It would be misleading to say we’re simply “uncertain” over the possible worlds contained in a given hypothesis, because we haven’t spelled out the range of worlds we’re uncertain over in the first place. The usual conception of EV is ill-defined under unawareness.
Now, we could assign precise values to each hypothesis h, by taking some “best guess” of the value averaged over the possible worlds in h. Then we’d get a precise EV, indirectly. We’ll look more closely at why that’s unjustified next time. The problem for now is that this approach requires additional epistemic commitments, which we’d need to positively argue for. Since we lack access to possible worlds, our precise guesses don’t directly come from our value function v, but from some extra model of the hypotheses we’re aware of (and unaware of).
Worse yet, unawareness appears especially troubling for impartial altruists. We care about what happens to every moral patient over the whole future. A highly abstract description of a future trajectory doesn’t capture that. So we have two challenges:
Currently, I’m merely pointing out the prima facie problem. The third post will look at how serious the problems of coarseness and the murkiness of the catch-all are (spoiler: very).
Key takeaway
We can’t dissolve this problem by avoiding explicit models of the future, or by only asking what works empirically.
Before moving on, an important note: unawareness specifically challenges the justification or reasons for our decisions. This isn’t a purely empirical problem, nor one that can be dissolved by avoiding explicit models of our impact. In particular, we might think something like:
We already knew that naïve EV maximization doesn’t work, and humans can’t be ideal Bayesians. That’s why we should stop trying to derive our decisions from a foundational Theory of Everything, grounded purely in explicit world models. Instead, we should do what seems to work for bounded agents. This often entails following heuristics, common sense intuitions, or model-free “outside views”, refined over time by experience (see, e.g., Karnofsky).
“Winning isn’t enough”, by Jesse Clifton and myself, argues that this response is inadequate. Here’s the TL;DR. (Feel free to skip this if you’re familiar with that post.)
If we want to tell which strategies have net-positive consequences, we can’t avoid the question of what it means to have “net-positive consequences”. Answering this question doesn’t require us to solve all of epistemology and decision theory. Nor do we need a detailed formal model tallying up our chosen strategy’s consequences. Our justifications will often be vague and incomplete, and our models will often be high-level. E.g., “the mechanisms that make this procedure work in these domains seem very similar to the mechanisms in others”, or “I have reasons XYZ to trust that my intuition tracks the biggest consequences”.
But these justifications must bottom out in some all-things-considered model of how a strategy leads to better consequences, including those we’re unaware of. This is as true for intuitive judgments, heuristics, or supposedly model-free extrapolations, as anything else. (Whether those methods resolve unawareness remains to be seen. I’ll argue in the second and final posts that they don’t.)
Key takeaway
This vignette illustrates how unawareness might undermine even intuitively robust interventions, like trying to reduce AI x-risk.
On a fairly abstract level, we’ve seen that we can’t reduce unawareness to ordinary uncertainty. Now let’s ground things more concretely with a simple story.[9]
I want to increase total welfare across the cosmos. Seems pretty daunting! Nonetheless, per the standard longtermist analysis, I reason, “The value of the future hinges on a few simple levers that could get locked in within our lifetimes, like ‘Is ASI aligned with human values?’ And it doesn’t seem that hard to nudge near-term levers in the right direction. So it seems like x-risk reduction interventions will be robust to missing details in our world-models.”
Inspired by this logic, I set out to choose a cause area. Trying to stop AIs from permanently disempowering humans? Looks positive-EV to me. Still, I slow down to check this. “What are the consequences of that strategy,” I wonder, “relative to just living a normal life, not working on AI risk?”
Well, let’s say I succeed at preventing human disempowerment by AI, with no bad off-target effects of similar magnitude. That looks pretty good! At least, assuming human space colonization (“SC”) is better than SC by human-disempowering AIs. Clearly(?), SC by humans with my values would be better by my lights than SC by human-disempowering AIs.
But then I begin to wonder about other humans’ values. There’s a wide space of possible human and AI value system pairs to compare. And some human motivations are pretty scary, especially compared to what I imagine Claude 5.8 or OpenAI o6 would be like. Also, it’s not only the initial values that matter. Maybe humans would differ a lot from AIs in how much (and what kind of) reflection they do before value lock-in, or how they approach philosophy, or how cooperative they are.[10] Is our species unusually philosophically stubborn? My guess is that this stuff cancels out, but this feels kinda arbitrary. I feel like I need a much more fine-grained picture of the possibilities, to say which direction this all pushes in.
Also, if I try to stop human disempowerment by, say, working on AI control, how does this effort actually affect the risk of disempowerment? The intended effects of AI control sure seem good. And maybe there are flow-through benefits, like inspiring others to work on AI safety too. But have I precisely accounted for the most significant ways that this research could accelerate capabilities, or AI companies that implement control measures could get complacent about misalignment, or the others I inspire to switch to control would’ve more effectively reduced AI risk some other way, or …?[11] Even if I reduce disempowerment risk, how do I weigh this against the possible effects on the likelihood of catastrophes that prevent SC entirely? For all I know, my biggest impact is on AI race dynamics that lead to a war involving novel WMDs. And if no one from Earth colonizes space, how much better or worse might SC by alien civilizations be?
Hold on, this was all assuming I can only influence my lightcone. Acausal interactions are so high-stakes that I guess they dominate the calculus? I don’t really know how I’d begin tallying up my externalities on the preferences of agents whose decision-making is correlated with mine (Oesterheld 2017), effects on the possible weird path-dependencies in “commitment races”, or the like. After piecing together what little scraps of evidence and robust arguments I’ve got, my guesses beyond that are pulled out of a hat, if I’m being honest. Maybe there are totally different forms of acausal trade we haven’t thought of yet? Or maybe the acausal stuff all depends a lot on really philosophically subtle aspects of decision theory or anthropics I’m fundamentally confused about? Or, if I’m in a simulation, I have pretty much no clue what’s going on.
We’ve looked at the most apparently robust, battle-tested class of interventions available to an impartial altruist. Even here, unawareness looms quite large. In the end, we could say, “Let’s shrink the EV towards zero, slap on some huge error bars, and carry on with our ‘best guess’ that it’s positive.” But is that our actual epistemic state?
This vignette alone doesn’t show we have no reason to work on AI risk reduction. What it does illustrate is that if we assign a precise EV to an intervention under unawareness, the sign of the EV seems sensitive to highly arbitrary choices.[12] That’s a problem we must grapple with somehow, even if we ultimately reject this sequence’s strongest conclusions.
Even so, we might think these choices aren’t actually arbitrary, but instead grounded in reliable intuitions. (At least, maybe for some interventions, and the problems above are just quirks of AI risk?) Rejecting such intuitions may seem like a fast track to radical skepticism. In the next post I’ll argue that, on the contrary: Yes, our intuitions can provide some guidance, all else equal. But all else is not equal. When our goal is to improve the future impartially speaking, the guidance from our intuitions isn’t sufficiently precise to justify judgments about an intervention’s sign — but this isn’t true for more local, everyday goals.
Thanks to Nicolas Macé, Sylvester Kollin, Jesse Clifton, Jim Buhler, Clare Harris, Michael St. Jules, Guillaume Corlouer, Miranda Zhang, Eric Chen, Martín Soto, Alex Kastner, Oscar Delaney, Capucine Griot, and Violet Hour for helpful feedback and suggestions. I edited this sequence with assistance from ChatGPT, Claude, and Gemini. Several ideas and framings throughout this sequence were originally due to Anni Leskelä and Jesse Clifton. This does not imply their endorsement of all my claims.
Bradley, Richard. 2017. Decision Theory with a Human Face. Cambridge University Press.
Canson, Chloé de. 2024. “The Nature of Awareness Growth.” Philosophical Review 133 (1): 1–32.
Easwaran, Kenny. 2014. “Decision Theory without Representation Theorems.” Philosophers’ Imprint 14 (August). https://philpapers.org/rec/EASDTW.
Greaves, Hilary. 2016. “Cluelessness.” Proceedings of the Aristotelian Society 116 (3): 311–39.
Greaves, Hilary, and William MacAskill. 2021. “The Case for Strong Longtermism.” Global Priorities Institute Working Paper No. 5-2021, University of Oxford.
Hájek, Alan. 2008. “Arguments for–or against–Probabilism?” British Journal for the Philosophy of Science 59 (4):793-819.
Meacham, Christopher J. G., and Jonathan Weisberg. 2011. “Representation Theorems and the Foundations of Decision Theory.” Australasian Journal of Philosophy 89 (4): 641–63.
Mogensen, Andreas L. 2020. “Maximal Cluelessness.” The Philosophical Quarterly 71 (1): 141–62.
Oesterheld, Caspar. 2017. “Multiverse-Wide Cooperation via Correlated Decision Making.” https://longtermrisk.org/files/Multiverse-wide-Cooperation-via-Correlated-Decision-Making.pdf.
Paul, L.A., and John Quiggin. 2018. “Real world problems.” Episteme. 2018;15(3):363-382. doi:10.1017/epi.2018.28
Roussos, Joe. 2021. “Unawareness for Longtermists.” 7th Oxford Workshop on Global Priorities Research. June 24, 2021. https://joeroussos.org/wp-content/uploads/2021/11/210624-Roussos-GPI-Unawareness-and-longtermism.pdf.
Steele, Katie, and H. Orri Stefánsson. 2021. Beyond Uncertainty: Reasoning with Unknown Possibilities. Cambridge University Press.
See, respectively, (e.g.) Tomasik (broad interventions); Christiano and Karnofsky (simple arguments); and Greaves and MacAskill (2021, Sec. 4) (lock-in, research, and saving).
This problem is related to, but distinct from, “complex cluelessness” as framed in Greaves (2016) and Mogensen (2020). Mogensen argues that our credences about far-future events should be so imprecise that it’s indeterminate whether, e.g., donating to AMF is net-good. I find his argument compelling (and some of my arguments in the final post bolster it). However, to my knowledge, no existing case for cluelessness has acknowledged unawareness as a distinct epistemic challenge, except the brief treatment in Roussos (2021).
E.g., EV(A)−EV(B)=[−1,2].
Remarks:
These two problems correspond to “coarse awareness” and “restricted awareness”, respectively, from Paul and Quiggin (2018, Sec. 4.1). For other formal models of unawareness, see, e.g., Bradley (2017, Sec. 12.3), Steele and Stefánsson (2021), and de Canson (2024).
Remarks:
For more, see Meacham and Weisberg (2014, Sec. 4), Hájek (2008, Sec. 3), and this post.
Cf. the arguments for EV maximization in Easwaran (2014) and Sec. III of Carlsmith.
Example of restricted awareness: What if we’re completely missing some way a space-colonizing civilization’s philosophical attitudes affect how much value it produces?
Example of coarse awareness: How do we weigh up the likelihoods of these fuzzily sketched pathways from the intervention to an AI takeover event?
You might think, “My values are ultimately arbitrary in a sense. I have the values I have because of flukes of my biology, culture, etc.” This is not what I mean by “arbitrary”. A choice is “arbitrary” to the extent that it’s made for no (defensible) reason. Insofar as you make decisions based on impartial altruistic values, those values alone don’t tell you how to evaluate a given hypothesis, as we’ve seen. I’ll say a bit more on how I’m thinking about arbitrariness next time.