Varieties Of Doom
There has been a lot of talk about "p(doom)" over the last few years. This has always rubbed me the wrong way because "p(doom)" didn't feel like it mapped to any specific belief in my head. In private conversations I'd sometimes give my p(doom) as 12%, with the caveat that "doom" seemed nebulous and conflated between several different concepts. At some point it was decided a p(doom) over 10% makes you a "doomer" because it means what actions you should take with respect to AI are overdetermined. I did not and do not feel that is true. But any time I felt prompted to explain my position I'd find I could explain a little bit of this or that, but not really convey the whole thing. As it turns out doom has a lot of parts, and every part is entangled with every other part so no matter which part you explain you always feel like you're leaving the crucial parts out. Doom is more like an onion than a single event, a distribution over AI outcomes people frequently respond to with the force of the fear of death. Some of these outcomes are less than death and some of them are worse. It is a subconscious(?) seven way motte-and-bailey between these outcomes to create the illusion of deeper agreement about what will happen than actually exists for political purposes. Worse still, these outcomes are not mutually independent but interlocking layers where if you stop believing in one you just shift your feelings of anxiety onto the previous. This is much of why discussion rarely updates people on AI X-Risk, there's a lot of doom to get through. I've seen the conflation defended as useful shorthand for figuring out whether someone is taking AI X-Risk seriously at all. To the extent this is true its use as a political project and rhetorical cudgel undermines it. The intended sequence goes something like: 1. ask opponent for p(doom) 2. if p(doom) is unreasonably low laugh at them, if something reasonable like 5% ask how they think they can justify gambling with everyones lives? 3

I stand by my basic point in Varieties Of Doom that these models don't plan very much yet, and as soon as we start having them do planning and acting over longer time horizons we'll see natural instrumental behavior emerge. We could also see this emerge from e.g. continuous learning that lets them hold a plan in implicit context over much longer action trajectories.