One framing that I think might be helpful for thinking about "Plan A" vs "shut it all down" is: "Suppose that you have the political will for an n-year slowdown, i.e. after n years, you are forced to handoff trust to superhuman AI systems (e.g. for n = 5, 10, 30). What should the capability progression throughout the slowdown be?" This framing forces a focus on the exit condition / plan to do handoff, which I think is an underdiscussed weakness of the "shut it all down" plan.
I think my gut reaction is that the most important considerations are: (i) there are a lot of useful things you can do with the AIs, so I want more time with the smarter AIs, and (ii) I want to scale through the dangerous capability range slowly and with slack (as opposed to at the end of the slowdown).
Ofc this framing also ignores some important considerations, e.g. choices about the capability progression effect both difficulty of enforcement/verification (in both directions: AI lie detectors/ai verification is helpful, while making AIs closer to the edge is a downside), as well as willingness to pay over time (e.g. scary demos or AI for epistemics might help increase WTP)
However, I also think that open agency approaches to transparency face two key difficulties: competitiveness and safety-of-the-components.[18]
I think a third key difficulty with this class of approaches is something like "emergent agency", i.e. that each of the individual components seem to be doing something safe, but when you combine several of the agents, you get a scary agent. Intuition pump: each of the weights in a NN is very understandable (it's just a number) and not doing dangerous scheming, but if you compose them it might be scary. Analagously, each of the subagents in the open agency AI might not be scheming, but a collection of these agents might be scheming.
Understanding the communications between the components seems like it may or may not be sufficient to mitigate this failure mode. If the understanding is "local", i.e. looking at a particular chain of reasoning and verifying that it is valid, this is probably not sufficient to mitigate the problem, as scary reasoning might be made up of a bunch of small chains of local valid reasoning that looks safe. So I think you want something like a reasonable global picture of the reasoning that the open agent is doing in order to mitigate "emergent agency".
I think this is kind of related to types of the "safety of the components" failure mode you talk about, particularly in the analogue to the corporation passing memos around, but the memos not corresponding to the "real reasoning" going on. However, it could be that the "real reasoning" emerges on a higher level of abstraction than the individual agents.
This sort of threat model leads me to think that if we're aiming for this sort of open agency, we shouldn't do end-to-end training of the whole system, lest we incentivize "emergent agency", even if we don't make the individual components less safe.
One upside of shut it all down is that it does in fact buy more time: in Plan A it is difficult to secure algorithmic secrets without extremely aggressive security measures, hence any rogue projects (e.g. nation state blacksites) can just coast off the algos developed by the verified projects. Then, a few years in, they fire up their cluster and try to do an intelligence explosion with the extra algorithmic progress.
>superintelligence
Small detail: My understanding of the IABIED scenario is that their AI was only moderately superhuman, not superintelligent
This post seems systematically too slow to me, and to underrate the capabilities of superintelligence. One particular point of disagreement:
It seems reasonable to use days or weeks as an upper bound on how fast robot doublings could become, based on biological analogies. This is very fast indeed.20
When I read this, I thought this would say "lower bound". Why would you expect evolution to find globally optimal doubling times? This reads to me a bit like saying that the speed of a Cheetah or the size of an Blue Whale will be an upper bound on the speed/size of a robot. Why???
The case for lower bound seems clear: biology did it, probably a superintelligence could design a more functional robot than biology.
Small typo: Alog(B) = log(B^A), not log(A^B)
Also there's a good chance AI gov won't work, and labs will just have a very limited safety budget to implement their best guess mitigations. Or maybe AI gov does work and we get a large budget, we still need to actually solve alignment.
Thanks for writing this!
For what its worth, my view is that we're very likely to be wrong about the specific details in both of the endings -- they are obviously super conjunctive. I don't think that there's any way around this because we can be confident AGI is going to cause some ex-ante surprising things to happen.
Also, this is scenario is around 20th percentile timelines for me, my median is early 2030s (though other authors disagree with me). I also feel much more confident about the pre-2027 scenario than about the post 2027 scenario.
Is your disagreement that you think AGI will happen later, or that you think the effects of AGI on the world will look very different, or both? If its just the timelines, we might have fairly similar views.
I think I mostly am on board with this comment. Some thoughts:
Another reaction I have is that a constraint to coordination will probably be "is the other guy doing a blacksite which will screw us over". So I think there's a viability bump at the point of "allow legal capabiliites scaling at least as fast as the max size blacksite that you would have a hard time detecting".
So I think this paragraph isn't really right, because "slowdown' != 'pause', and slowdowns might still be really really helpful and enough to get you a long way.
I actually currently think that you want to accelerate compute production, because hardware scaling seems safer than software scaling. I'm not sure exactly what you mean by "in an uncontrolled fashion".. if you mean "have a bunch of inspectors making sure the flow of new chips isn't being smuggled to illegal projects", then I agree with this, on my initial read I thought you meant something like "pause chip production until they start producing GPUS with HEMs in them", which I think is probably bad.
In other words I think that you want to create a big compute overhang during a pause. The downside is obvious, but the upsides are:
(this comment might be confusing because I typed it quickly, happy to clarify if you want)