CLR is excited about safe Pareto improvements (SPIs) as a way to mitigate downsides from conflict between AIs. SPIs are a class of interventions on how agents negotiate that makes them all better off, no matter how they would have negotiated without the SPI. Among many candidate interventions against AI...
(I wrote this post partly to help orient those interested in participating in the EA Forum’s Cluelessness Critiques Competition. The competition closes August 14th.) I’d like to elicit direct, productive critiques of the argument for cluelessness from my sequence on “unawareness”, which I’ll call the unawareness argument. To that end,...
(Lightly edited July 3, 2026 (clarifying the definition of time T) and June 26, 2026 (footnote 5).) Advanced AIs might be capable of various credible commitments unavailable to humans, which they could use when bargaining with each other. “Bargaining” can sound like something pretty specific: haggling over (literal) prices. But,...
In Part I of CLR's safe Pareto improvements (SPI) agenda, we gave our high-level strategy for evaluating models for SPI-incompatible behavior and reasoning. This guide gives more details on how I’m thinking about executing on this strategy, especially: * the kind of workflow I think we should use, to start...
Executive summary * Safe Pareto improvements (SPIs) are ways of changing agents’ bargaining strategies that make all parties better off, regardless of their original strategies. SPIs are an unusually robust approach to preventing catastrophic conflict between AI systems, especially AIs capable of credible commitments. This is because SPIs can reduce...
(Subtitle: “And ethics, and epistemology, and…”. Cross-posted from my Substack.) We want to make decisions for good reasons. But I worry some common approaches to decision theory stray from this purpose. They start with a bottom-line verdict, “I should choose this action”, then use this verdict to justify claims about...
(Cross-posted from my Substack.) Here’s an important way people might often talk past each other when discussing the role of intuitions in philosophy.[1] Intuitions as predictors When someone appeals to an intuition to argue for something, it typically makes sense to ask how reliable their intuition is. Namely, how reliable...