a 7B-70B parameter model finetuned for this task spreading over the internet somewhat like a more modern ILOVEYOU.
The best-case scenario would be something like "The capabilities necessary for rogue replication scale with size, ensuring that a 70B parameter model is VERY FAR from havinf a 50% time horizon of 40+ hours, which, if we trust METR, is necessary for rogue replication, while a capable model will be like Agent-1 or Agent-2 from AI-2027 and require far more compute per token". Additionally, the AI-2027 team pointed to a paper which would imply that attackers won't have a significant advantage over defenders as AI capabilities improve.
This question has been wandering my mind a lot recently:
As the world becomes more and more efficient, I think there are three forces that are going to be way more consequential than what most people currently assume:
Sure, you could argue an individual model without much compute capacity or significant can't really do much harm against hardened internet infrastructure, but I am not very sure whether someone could stop a 7B-70B parameter model finetuned for this task spreading over the internet somewhat like a more modern ILOVEYOU.
I am not a pessimistic person. But the same dynamic that is making companies that would rather slow down the development of very capable AI systems have to keep competing and releasing models is the dynamic that could make hostile competition very convenient.
I do not think cooperation and guardrails can fully mitigate this risk. Sure, you might argue, this is not yet a problem. "Clearly it hasn't happened". "Clearly, if some bad-spirited person had wanted to do this they would have already done it".
This is not a very good counterargument. Think about something like antibiotic resistance. Soil bacteria had amoxicillin resistant genes very long before modern clinical antibiotics had been invented. The enzymes required to degrade these compounds was there, but since there was no competitive or evolutionary pressure to express these genes, the capacity was mostly inconsequential. The moment humans decided to carpet bomb hospital floors and cattle feedlots with antibiotics, selection pressure did its thing with lateral gene transfer, and we ended up with the current situation, where novel antibiotics have to deal with widespread resistance in a timeline of years.
Another example is the Great Oxygenation Event. Cyanobacteria started producing oxygen mostly as a side effect of a metabolically better energy production pathway. This event had the incidental effect of basically poisoning the atmosphere for every other living thing in the planet.
I bring the GOE specifically because it illustrates another feature that I think draws a scary parallel with the current AI situation. Mass extinction events driven by competitive pressure or capability gains in biological systems are mostly irreversible. The old meta-stable biological arrangement is simply irrecoverable from the new state.
My conclusion is: we might be near the oxygen threshold. A distributed replicator, that is past a certain establishment density, no longer becomes a problem that can simply be unwound. Adversarial-tuned models might establish themselves quicker than what current cyber defenses might be able to hold.
Up until this point, I have just been making observations. Now I do want to make a more theoretical consideration that holds a lot of weight for a few reasons. If you agree that the replicator/worm role will be filled by some sort of agent eventually – and I have argued myself that these forces are convergent – then, I think someone needs to stop theorizing and define the properties for this "replicator".
One deployed by a malicious actor, or a state entity, will not have a certain kill switch. It will probably be optimized for persistence and spread, fully. One deployed by a careful actor, might have beneficial downstream effects that are not immediately discardable. It could patch vulnerabilities actively and use compute in a beneficial way.
If someone thinks there is a good counterargument to my entire post, or that I have continued from a false premise somewhere in my post, I would love to hear why you think so.