OK. I’m going to aim this at a group of people with a broad-spectrum of p(doom) values here, so this will be a scattergun approach to different AI systems and threat models.
These are more “why does the AI kill us” than how. I assume that a powerful enough AI would find a way. Especially if it’s hooked up to a bio lab. Why are we trying this again?
The bulk of the argument goes something like this:
As I understand it (which is poorly) this probably depends on the AI’s incentive to become self-reflective and coherent. It also depends on the AI’s ability to conceal misalignment during training.
This one tends to be more “intuitive”
This seems to depend on the amount of trust which is given to the system, and the degree to which the AI’s predictions are inscrutable to the users.
Something of a half-way house
This seems to depend on the ability of humans to stay in the decision making loop without harming the final decisions. We could either coordinate to limit the amount of control given to AIs, or enhance human capabilities to keep pace with advancing AI.
There have been a few different ways to do this. Most of them involve using the AI to control lab equipment like liquid-handling robots, which look like this:
(I may have added the eye myself, but you’ll never know)
This is the big one you’ll have heard of if you’re into this field. AlphaFold is a narrow AI which aims to predict the structures of biologically-relevant molecules. If this sounds weird, think about it like this:
Imagine aliens trying to understand how a city works. They suck parts of the city into their flying saucer, break them up, then separate them into parts. Their tech is on such a scale that they can only analyse things in bulk, so they’d need to get a million screwdrivers to figure out the structure of them. They can separate the parts, but they then can’t easily put them back together.
AlphaFold does the job of reconstructing things from knowing what their parts are. If the aliens suck up a million bicycles, separate out all the spokes, axels, and frames, then analyse those, they can model a hundred parts individually. AlphaFold can reconstruct those back into the shape of the bike.
In biology, we can quite easily figure out the sequence of a gene, and therefore the protein it codes for. But a protein actually has a complex 3D structure, which might also be bound loosely to other proteins. AlphaFold gets us (part of) the way from the gene sequence to the final protein structure.
AlphaFold isn’t agentic. It doesn’t plan. All it does is enhance the productivity of human researchers. If it can replace CryoEM (a very laborious technique, state-of-the-art for finding protein structures) then that saves a lot of time and effort.
Personally, I think the risks from AlphaFold-type AIs are very low.
In The Future
We should think about how AlphaFold might be developed in future. Any tool-ish AI could be incorporated into the toolkit of a larger system if we’re not careful. For now I see relatively few dangerous directions to take AlphaFold 4 into. The riskiest seem to be some upcoming approaches to design a protein with certain properties based on user input.
This is also a narrow AI, not just to a single task, but to a single machine. This one is totally autonomous, and optimizes a set of reactions for a pre-specified task. The researchers say “we want 50 nm gold particles with a 10 nm silver shell” and the AI controls a reactor, testing different conditions, until it succeeds.
Why is this better than humans? It’s a quick learner and it doesn’t sleep. Fun fact: even the most dogged postdoc spends over 50% of the week not even working!
From experience, I can say that the most successful researchers are often those with the tightest feedback loop between running one experiment and running the follow-up. For an autonomous agent, this feedback loop can be very, very tight.
The architecture has two parts. First, a reward model which learns from the experiments to predict how good the nanoparticles are. Second a monte-carlo tree search method for picking new options. There isn’t anything weird going on here with self-reflectiveness or unexpected agency.
So what are the risks to misalignment:
In The Future
It will always be tempting to create more general versions of AlphaFlow. Future, more complex architectures might be incentivised to self-model uncertainty to improve their experiment design. Nobody has really made an architecture which does this inherently. For now I’ll say it’s low without an architecture breakthrough, and if an architecture breakthrough does happen in this area, I expect it to be deployed more dangerously elsewhere before the automated chemistry labs get hold of it.
Incorporating systems like this into larger systems could lead to creeping disempowerment if those systems are also automated. But, like AlphaFold, this applies to any tool.
This involves using a bunch of GPT-4 instances to run your lab for you. There are controller units which coordinate the other GPT-4s, googler units which search the web, coding units which write code for liquid-handling robots, etc.
So far, the GPT-4s still require humans to pick up plates of samples and move them between machines (this is one of the funniest examples I’ve seen of how automation has switched from taking the manual jobs to taking the cerebral ones). And it’s not that good at planning, because GPT-4 is not yet that good at planning.
In The Future
This seems to be one of the riskiest approaches to AI-enhanced research going on at the moment. The models are poorly-understood LLMs, which might be changed or upgraded at a moment’s notice. Rather than having to think carefully about training a new agent, the researchers might just plug a new LLM into their system.
The highest risks are likely to come from combined systems. I’ll discuss these in another post.