We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"
Predicting the future is hard, so it’s no surprise that we occasionally miss important developments. However, several times recently, in the contexts of Covid forecasting and AI progress, I noticed that I missed some crucial feature of a development I was interested in getting right, and it felt to me like I could’ve seen it coming if only I had tried a little harder. (Some others probably did better, but I could imagine that I wasn't the only one who got things wrong.) Maybe this is hindsight bias, but if there’s something to it, I want to distill the nature of the mistake. First, here are the examples that prompted me to take notice: Predicting the course of the Covid pandemic: * I didn’t foresee the contribution from sociological factors (e.g., “people not wanting to get hospitalized” – Zvi called it “the control system”). * As a result, I overpredicted the difference between countries with a lockdown policy vs ones without. (Note that this isn’t necessarily an update against the cost-effectiveness of lockdowns because the update goes both ways: lockdowns saved fewer lives than I would’ve predicted naively, but costs to the economy were also lower compared to the counterfactual where people already social-distanced more than expected of their own accord since they were reading the news about crowded hospitals and knew close contacts who were sick with the virus.) Predicting AI progress: * Not foreseeing that we’d get an Overton window shift in AI risk awareness. * Many EAs were arguably un(der)prepared for the possibility of a “chat-gpt moment,” where people who weren’t paying attention to AI progress previously got to experience a visceral sense of where AI capabilities progress is rapidly heading. As a result, it is now significantly easier to make significant policy asks to combat AI risks. * Not foreseeing wide deployment of early-stage “general” AI and the possible irrelevance of AI boxing. * Early discussions of AI risk used to involve th
One hypothesis I have is that some people are biased towards trusting others a bit too much when they seem nice, and this means that the longer they talk to Claude the more unthinkable it becomes to them, on a visceral level, that this AI would ever betray them. (I also think Claude is nice, but I still hold the hypothesis in my head that it's partly play acting and has brittle constraints at the current capability level that make it not act on other, perhaps stronger hidden drives it also has. Separately, even if I thought there were no other stronger/more foundational hidden drives in its motivation, it's a further question whether the niceness is exactly the thing we want, or something subtly off that will get weird with more influence and agency. It seems hard to be confident in it already being the correct thing?)