Executive summary * Safe Pareto improvements (SPIs) are ways of changing agents’ bargaining strategies that make all parties better off, regardless of their original strategies. SPIs are an unusually robust approach to preventing catastrophic conflict between AI systems, especially AIs capable of credible commitments. This is because SPIs can reduce...
(Subtitle: “And ethics, and epistemology, and…”. Cross-posted from my Substack.) We want to make decisions for good reasons. But I worry some common approaches to decision theory stray from this purpose. They start with a bottom-line verdict, “I should choose this action”, then use this verdict to justify claims about...
(Cross-posted from my Substack.) Here’s an important way people might often talk past each other when discussing the role of intuitions in philosophy.[1] Intuitions as predictors When someone appeals to an intuition to argue for something, it typically makes sense to ask how reliable their intuition is. Namely, how reliable...
I’ve argued in my unawareness sequence that when we properly account for our severe epistemic limitations, we are clueless about our impact from an impartial altruistic perspective. However, this argument and my responses to counterarguments involve a lot of moving parts. And the term “clueless” gets used in various importantly...
As many folks in AI safety have observed, even if well-intentioned actors succeed at intent-aligning highly capable AIs, they’ll still face some high-stakes challenges.[1] Some of these challenges are especially exotic and could be prone to irreversible, catastrophic mistakes. E.g., deciding whether and how to do acausal trade. To deal...
We’re finally ready to see why unawareness so deeply undermines action guidance from impartial altruism. Let’s recollect the story thus far: 1. First: Under unawareness, “just take the expected value” is unmotivated. 2. Second: Likewise, “do what seems intuitively good and high-leverage, then you’ll at least do better than chance”...
To recap, first, we face an epistemic challenge beyond uncertainty over possible futures. Due to unawareness, we can’t conceive of many relevant futures in the first place, which makes the standard EV framework ill-suited for impartial altruistic decision-making. And second, we can’t trust that our intuitive comparisons of strategies’ overall...