I'm not a fan of the phage synthesis paper as there was a lot of post-generation filtering and failures going on and the AI-generated phages were basically the same. There's a couple interesting new mutations, but a lot of them are noncoding/synonymous/in nonessential genes.
Judging by your writing I think you missed a new paper red-teaming DNA synthesis screening software. They're using AI to create proteins that function the same but with different amino acids (probably conformation-based) which bypasses the DNA screening, because screening (probably) isn't translating and throwing it into Alphafold and comparing it to known toxins. Though the paper didn't test whether the AI-generated toxins are actually toxic, but we can assign that a reasonable probability.
That being said...translating DNA to protein with standard codon table is just one encoding scheme. And we can recode organisms to use a different encoding scheme. And no DNA screening would be able to catch it, since they have no knowledge of the nonstandard codon table you're using.
Also thanks for bringing up the Germy Paradox; I seemed to have missed that sequence.
Judging by your writing I think you missed a new paper red-teaming DNA synthesis screening software. They're using AI to create proteins that function the same but with different amino acids (probably conformation-based) which bypasses the DNA screening, because screening (probably) isn't translating and throwing it into Alphafold and comparing it to known toxins. Though the paper didn't test whether the AI-generated toxins are actually toxic, but we can assign that a reasonable probability.
I hadn't! Thanks for pointing this out! Looks like someone is actually on top of this after all.
That being said...translating DNA to protein with standard codon table is just one encoding scheme. And we can recode organisms to use a different encoding scheme. And no DNA screening would be able to catch it, since they have no knowledge of the nonstandard codon table you're using
Who do you mean by "we" in this case? I had the vague vibes that the "we" in question is "Jason Chin's lab and a few other similarly high-level groups elsewhere", but that it was pretty far beyond the capabilities of the marginal bioterrorist, and anyone who's successfully fucking around with nonstandard codon sequences can definitely already make a bioweapon. Is this incorrect under your model?
marginal bioterrorist
I'm not sure a marginal bioterrorist can train an AI model to obfuscate a DNA sequence to bypass the sequence scanning, but I concede it's definitely easier to do than recode an organism. I'm not really sure what the skillset/resources of a marginal bioterrorist are.
But we should watch out for proliferation/commercialization of recoded organisms, since recoded organisms would be easier to recode further (if they introduce a new codon, just modify the synthetase to load a different amnio acid).
A marginal bioterrorist could probably just brew up a vat of anthrax which technically counts. Advanced labs definitely have more capacity for modification, but they still need to source the pathogens.
A marginal bioterrorist could probably just brew up a vat of anthrax which technically counts.
Perhaps worth nothing that they've tried in the past, and failed.
It seems they were literally using the nonpathogenic attenuated anthrax strain used for vaccines.
Yes - they made a huge number of mistakes, despite having sophisticated people and tons of funding. It's been used over and over to make the claim that bioweapons are really hard - but I do wonder how much using an LLM for help would avoid all of these classes of mistake. (How much prosaic utility is there for project planning in general? Some, but at high risk if you need to worry about detection, and it's unclear that most people are willing to offload or double check their planning, despite the advantages.)
It would have been worth exploring your stance on AI for AI defense in general here, at least a little. Particularly as I guess from your writing that you are not a biologist, meaning your idea here is more interesting from a conceptual/strategic idea, rather than as a fully thought-through suggestion for countering this exact scenario. (At least this is a key reason why I upvoted your post, although I don't find the actual suggestion promising. I agree with eniteris objections. But I want to see more of this and I support your thinking.)
Personally, I am currently very much for using AI for safety and targeted defense purposes. To be clear: NOT developing AI capability to do AI safety work, but to utilize existing models for aiding with defense and safety work. I see it as a no-brainer, really. (I also actually think that that working on improving public safety sets a good example of what we should use LLMs for, vs. AI slop and cooking recipes.)
This suggestion of yours would fit into this category, from my POV, but it seems you may disagree? What exactly IS your overall thinking on using AI for defense and/or AI safety?
Epistemic Status: Seems like it would be worth somebody doing this.
TL;DR: Modern biological/language models (BioLMs) are capable of building new viruses from scratch. Before you freak out, we're OK for now. It might be a worthwhile project for somebody to be "on the ball" maintaining a SoTA gene sequence classifier and getting as many gene synthesis companies to sign on to use it as possible.
Bacteriophages (aka phages) are viruses which only infect bacteria, which makes them pretty exciting as a way to treat drug-resistant bacterial infections. They're somewhat smaller and simpler than the most viruses which infect humans, with about 10% the genome size of coronaviruses (though about twice the genome size of the smallest human virus, hepatitis D).
A recent paper found that the Evo series of BioLMs are capable, with some fine-tuning, of generating functional phage genomes, which managed to infect bacteria in the lab. Personally, I would probably have kept this kind of research a bit hush-hush, what with, you know, the Implications, but the authors of this paper handily put it up on biorXiv for anyone to read. While I appreciate their commitment to making their research accessible, it is a bit "Open Asteroid Impact" for my liking.
So how similar are they to existing bacteriophages? Are the Evo series just stochastic parrots, or do they have the magic spark of creativity?
Turns out they're fairly similar to existing phages. A couple of them had lost a gene, but mostly the genome was all still there. All of the new phages had genomes >90% similar to existing ones, though one was as low as 93%. Anything less than 95% similar to a known phage would be considered a new species, so we've arguably seen the first AI-generated species now. If the word "species" even means anything when we're talking about viruses, that is.
Suppose you try to order a gene from a reputable source, like IDT. They pass the DNA sequence through some automated filters. Suppose you were ordering something just a bit scary, like a truncated gene for staphyloccocal alpha hemolysi: a mildly toxic protein which is made all the time under minimal containment procedures in Oxford, and can be bought online from Sigma Aldrich.
They say: "Oh no, we couldn't possibly!"
Suppose you then supply them with permission forms from the UK government, indicating that you have permission to work with this gene. And that this gene is also not on any registers or schedules
They say "Oh no, we couldn't possibly!"
So you go to a supplier called VectorBuilder, who will send you "basically anything" according to your mate who does more work in this field than you do.
They say "Oh no, we couldn't possibly!"
As do five other supplies. At this point you're getting a bit miffed with the memory of that OpenPhil guy you met at a conference, who insisted to you that it was possible to order toxic genes from online, and that nobody would stop you. You get even more miffed with yourself, for not thinking to ask which ones.
The Germy Paradox is often phrased as a question "Where are the engineered pandemics?". Here's my answer: there's a lot of different inconveniences between you and a bioweapon!
One of these---a big one---is how hard it is to actually get hold of pathogen genes. As I understand it, most gene synthesis companies use simple programmatic classifiers to determine if a given gene comes from a pathogen. These mostly look at oveall sequence similarity. Some ways to try and trick these, in increasing order of sophistication:
These are on a continuum, of course.
Until now, only option 1 was easily achievable. Now, option 2 is. My best guess is that existing classifiers will catch attempts at option 2, but as we slide towards option 3, they might start to fail, and they certainly will at option 4.
So what can we do?
I've been really quite skeptical of def/acc proposals in the past, especially as scalable-to-ASI things. But here, maybe they'll work. The thing about BioLMs is that they're also LMs. They can also annotate and classify sequences. As an example, those bacteriophages produced by the Evo models were easily classified as such by the Evo models, down to the gene functionality, though it would warrant some careful evaluation and validation to make sure this was true. It wouldn't be too difficult to hook up these annotations to either an LLM or an automated classifier as an additional filter to run a gene through before classifying it. As long as the best models available to would-be bioterrorists are no better than the ones available to the largest companies, they'll struggle to get anything past the classifiers.
The important step is for someone to actually build and maintain BioLM-based classifier, and for the gene synthesis companies to use it. It would have to be treated seriously, and flagged samples would have to be rejected even if the humans couldn't tell anything wrong with them by eye. It would also have to be kept up to date with the latest BioLMs, or maybe something else, more complicated, if BioLMs end up not being the SoTA for whole-genome synthesis.
Will this help us survive superintelligence?
Obviously not, don't be silly. If this proposal gets implemented, and the primary effect is to create (even more) complacency surrounding ASI, then it might make everything worse on net.
(For those who need it spelling out: if the superintelligence wants to build a bioweapon, it will do so. Either it will trick a dumb monitor or collude with a smart monitor. God help us if the superintelligence gets put in charge of the monitoring.)
Despite this, it seems kind of insane and undignified to not be building the best filters we can for anything resembling an automated lab.