Under fast takeoff models, the first rogue AGI posing a serious takeover/extinction risk to humanity would very likely be the last, with no chance for serious opposition (e.g. Sable). This model seems theoretically compelling to me.
However, there is some recent empirical evidence that the basin of "roughly human" intelligence may not be trivial to escape. LLM agents seem increasingly competent and general, but continue to lag behind humans on long-term planning. If capabilities continue to develop in a highly jagged fashion, we may face rather dangerous rogue AI that still have some exploitable weaknesses. Also, the current (neuro-scaffold) paradigm is compute/data hungry, and perhaps not easily amenable to RSI. Though I suspect strongly superhuman models would be able to invent a much more efficient paradigm, it does seem reasonable to give some weight to the possibility that early rogue neuro-scaffold AGI will undergo a relatively slow takeoff.[1]
Therefore, a competent civilization would have a governmental agency (or team) designated to rapidly shut down (and thoroughly purging/containing) rogue AGI. My question is which agencies currently hold that responsibility?
Ideally, such an agency would be international since rogue AGI can easily cross borders and may even negotiate with / bribe / blackmail governments. However, I would guess that some cybercrime unit within the (U.S.) DoD is probably the best candidate. While the UK AISI seems most "on the ball," as far as I know they are not very well equipped to aggressively pursue rogue AGI across borders, which may require a very quick response / escalation across cyber-defense and conventional strikes on data-centers.
At a bare minimum, a strong candidate for this role should actually perform drills simulating shutdown attempts against rogue AGI, which will probably be possible to carry out in a somewhat useful form very soon (or now, with red team human assistance).
If neuro-scaffold AI is inherently too weak to reach AGI then the first rogue AGI may arise from a more dangerous paradigm, e.g. "brain-like-AGI". This would be unfortunate, is likely, and is not the focus of this post.
A loss of control scenario would likely result in rogue AI replicating themselves across the internet, as discussed here: https://metr.org/blog/2024-11-12-rogue-replication-threat-model/
Under fast takeoff models, the first rogue AGI posing a serious takeover/extinction risk to humanity would very likely be the last, with no chance for serious opposition (e.g. Sable). This model seems theoretically compelling to me.
However, there is some recent empirical evidence that the basin of "roughly human" intelligence may not be trivial to escape. LLM agents seem increasingly competent and general, but continue to lag behind humans on long-term planning. If capabilities continue to develop in a highly jagged fashion, we may face rather dangerous rogue AI that still have some exploitable weaknesses. Also, the current (neuro-scaffold) paradigm is compute/data hungry, and perhaps not easily amenable to RSI. Though I suspect strongly superhuman models would be able to invent a much more efficient paradigm, it does seem reasonable to give some weight to the possibility that early rogue neuro-scaffold AGI will undergo a relatively slow takeoff.[1]
Therefore, a competent civilization would have a governmental agency (or team) designated to rapidly shut down (and thoroughly purging/containing) rogue AGI. My question is which agencies currently hold that responsibility?
Surprisingly, I have not been able to find much previous discussion on practical aspects of this question (ex. legal aspects of shutting down a rogue AI running on AWS).
Ideally, such an agency would be international since rogue AGI can easily cross borders and may even negotiate with / bribe / blackmail governments. However, I would guess that some cybercrime unit within the (U.S.) DoD is probably the best candidate. While the UK AISI seems most "on the ball," as far as I know they are not very well equipped to aggressively pursue rogue AGI across borders, which may require a very quick response / escalation across cyber-defense and conventional strikes on data-centers.
At a bare minimum, a strong candidate for this role should actually perform drills simulating shutdown attempts against rogue AGI, which will probably be possible to carry out in a somewhat useful form very soon (or now, with red team human assistance).
If neuro-scaffold AI is inherently too weak to reach AGI then the first rogue AGI may arise from a more dangerous paradigm, e.g. "brain-like-AGI". This would be unfortunate, is likely, and is not the focus of this post.