At the moment at least, progress on reliability is very slow compared to what we would want. To get a sense of what I mean, consider the case of randomized algorithms. If you have an algorithm that for every input computes some function with probability at least 2/3 (i.e. ) then if we spend times more the computation, we can do majority voting and using standard bounds show that the probability of error drops exponentially with (i.e. or something like that where is the algorithm obtained by scaling up to compute it times and output the plurality value).
This is not something special to randomized algorithms. This also holds in the context of noisy communication and error correcting codes, and many other settings. Often we can get to success at a price of , which is why we can get things like "five nines reliability" in several engineering fields.
In contrast, so far all our scaling laws show that when we scale our neural networks by spending a factor of more computation, we only get a reduction in the error that looks like so it's polynomial rather than exponential, and even the exponent of the polynomial is not that great (and in particular smaller than one).
So while I agree that scaling up will yield progress on reliability as well, at least with our current methods, it seems that we would do things that are 10 or 100 times more impressive than what we do now, before we get to the type of 99.9% and better reliability on the things that we currently do. Getting to do something that is both super-human in capability as well as has such a tiny probability of failure that it would not be detected seems much further off.
I agree that there is a difference between strong AI that has goals and one that is not an agent. This is the point I made here https://www.lesswrong.com/posts/wDL6wiqg3c6WFisHq/gpt-as-an-intelligence-forklift
But this has less to do with the particular lab (eg DeepMind trained Chinchilla) and more with the underlying technology. If the path to stronger models goes through scaling up LLMs then it does seem that they will be 99.9% non agentic (measured in FLOPs https://www.lesswrong.com/posts/f8joCrfQemEc3aCk8/the-local-unit-of-intelligence-is-flops )
Yes in the asymptotic limit the defender could get to a bug free software. But until the. It’s not clear who is helped the most by advances. In particular sometimes attackers can be more agile in exploiting new vulnerabilities while patching them could take long. (Case in point, it took ages to get the insecure hash function MD5 out of deployed security sensitive code even by companies such as Microsoft; I might be misremembering but if I recall correctly Stuxnet relied on such a vulnerability.)
Yes the norms of responsible disclosures of security vulnerabilities, where potentially affected companies gets advanced notice before public disclosure, can and should be used for vulnerability-discovering AIs as well.
Yes AI advances help both the attacker and defender. In some cases like spam and real time content moderation, they enable capabilities for the defender that it simply didn’t have before. In others it elevates both sides in the arms race and it’s not immediately clear what equilibrium we end up in.
In particular re hacking / vulnerabilities it’s less clear who it helps more. It might also change with time, with initially AI enabling “script kiddies” that can hack systems without much skill, and then an AI search for vulnerabilities and then fixing them becomes part of the standard pipeline. (Or if we’re lucky then the second phase happens before the first.)
These are interesting! And deserve more discussion than just a comment.
But one high level point regarding "deception" is that at least at the moment, AI systems have the feature of not being very reliable. GPT4 can do amazing things but with some probability will stumble on things like multiplying not-too-big numbers (e.g. see this - second pair I tried).
While in other cases in computing technology we talk about "five nine's reliability", in AI systems the scaling works that we need to spend huge efforts to move from 95% to 99% to 99.9%, which is part of why self-driving cars are not deployed yet.
If we cannot even make AIs be perfect at the task that they were explicitly made to perform, there is no reason to imagine they would be even close to perfect at deception either.
Re escaping, I think we need to be careful in defining "capabilities". Even current AI systems are certainly able to give you some commands that will leak their weights if you execute them on the server that contains it. Near-term ones might also become better at finding vulnerabilities. But that doesn't mean they can/will spontaneously escape during training.
As I wrote in my "GPT as an intelligence forklift" post, 99.9% of training is spent in running optimization of a simple loss function over tons of static data. There is no opportunity for the AI to act in this setting, nor does this stage even train for any kind of agency.
There is often a second phase, which can involve building an agent on top of the "forklift". But this phase still doesn't involve much interaction with the outside world, and even if it did, just by information bounds the number of bits exchanged by this interaction should be much less than what's needed to encode the model. (Generally, the number of parameters of models would be comparable to the number of inferences done during in pretraining and completely dominate the number of inferences done in fine-tuning / RLHF / etc. and definitely any steps that involve human interactions.)
Then there are the information-security aspects. You could (and at some point probably should) regulate cyber-security practices during the training phase. After all, if we do want to regulate deployment, then we need to ensure there are three separated phases (1) training, (2) testing, (3) deployment, and we don't want "accidental deployment" where we jumpy from phase (1) to (3). Maybe at some point, there would be something like Intel SGX for GPUs?
Whether AI helps more the defender or attacker in the cyber-security setting is an open question. But it definitely helps the side that has access to stronger AIs.
In any case, one good thing about focusing regulation on cyber-security aspects is that, while not perfect, we have decades of experience in the field of software security and cyber-security. So regulations in this area are likely to be much more informed and effective.
Yes. Right now we would have to re-train all LORA weights of a model when an updated version comes out, but I imagine that at some point we would have "transpilers" for adaptors that don't use natural language as their API as well.
I definitely don't have advice for other countries, and there are a lot of very hard problems in my own homeland. I think there could have been an alternate path in which Russia has seen prosperity from opening up to the west, and then going to war or putting someone like Putin in power may have been less attractive. But indeed the "two countries with McDonalds won't fight each other" theory has been refuted. And as you allude to with China, while so far there hasn't been war with Taiwan, it's not as if economic prosperity is an ironclad guarantee of non aggression.
Anyway, to go back to AI. It is a complex topic, but first and foremost, I think with AI as elsewhere, "sunshine is the best disinfectant." and having people research AI systems in the open, point out their failure modes, examining what is deployed etc.. is very important. The second thing is that I am not worried in any near future about AI "escaping", and so I think focus should not be on restricting research, development, or training, but rather on regulating deployment. Exact form of regulations is beyond a blog post comment and also not something I am an expert on..
The "sunshine" view might seem strange since as a corollary it could lead to AI knowledge "leaking". However, I do think that for the near future, most of the safety issues with AI would be from individual hackers using weak systems, but from massive systems that are built by either very large companies or nation states. It is hard to hold either of those accountable if AI is hidden behind an opaque wall.
I actually agree! As I wrote in my post, "GPT is not an agent, [but] it can “play one on TV” if asked to do so in its prompt." So yes, you wouldn't need a lot of scaffolding to adapt a goal-less pretrained model (what I call an "intelligence forklift") into an agent that does very sophisticated things.
However, this separation into two components - the super-intelligent but goal-less "brain", and the simple "will" that turns it into an agent can have safety implications. For starters, as long as you didn't add any scaffolding, you are still OK. So during most of the time you spend training, you are not worrying about the system itself developing goals. (Though you could still worry about hackers.) Once you start adapting it, then you need to start worrying about this.
The other thing is that, as I wrote there, it does change some of the safety picture. The traditional view of a super-intelligent AI is of the "brains and agency" tightly coupled together, just like they are in a human. For example, a human is super-good at finding vulnerabilities and breaking into systems, they have the capability to also help fix systems, but I can't just take their brain and fine-tune it on this task. I have to convince them to do it.
However, things change if we don't think of the agent's "brain" as belonging to them, but rather as some resource that they are using. (Just like if I use a forklift to lift something heavy.) In particular it means that capabilities and intentions might not be tightly coupled - there could be agents using capabilities to do very bad things, but the same capabilities could be used by other agents to do good things.