a_g — LessWrong

LESSWRONG
LW

a_g — LessWrong

Replying toPropaganda or Science: A Look at Open Source AI and Bioterrorism Risk

Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk

What evidence is there on tutor-relevant tasks being a blocking part of the pipeline, as opposed to manufacturing barriers?

So, I can break “manufacturing” down into two buckets: “concrete experiments and iteration to build something dangerous” or “access to materials and equipment”.

For concrete experiments, I think this is in fact the place where having an expert tutor becomes useful. When I started in a synthetic biology lab, most of the questions I would ask weren’t things like “how do I hold a pipette” but things like “what protocols can I use to check if my plasmid correctly got transformed into my cell line?” These were the types of things I’d ask a senior grad student,... (read more)

Replying toPropaganda or Science: A Look at Open Source AI and Bioterrorism Risk

a_g2y

Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk

I’m one of the authors from the second SecureBio paper (“Will releasing the weights of future large language models grant widespread access to pandemic agents?"). I’m not speaking for the whole team here but I wanted to respond to some of the points in this post, both about the paper specifically, and the broader point on bioterrorism risk from AI overall.

First, to acknowledge some justified criticisms of this paper:

I agree that performing a Google search control would have substantially increased the methodological rigor of the paper. The team discussed this before running the experiment, and for various reasons, decided against it. We’re currently discussing whether it might make sense to run a

... (read 824 more words →)

Replying toWill releasing the weights of large language models grant widespread access to pandemic agents?

a_g2y

Will releasing the weights of large language models grant widespread access to pandemic agents?

(co-author on the paper)

Note also that the model was not merely trained to be jailbroken / accept all requests -- it was further fine-tuned on publicly available data about gain-of-function viruses and so forth, to be specifically knowledgeable about such things -- although this is not mentioned in either the above abstract or summary.

Mentioned this in a separate comment but: we revised the paper to mention that the fine-tuning didn’t appreciably help with the information generated by the Spicy/uncensored model (which we were able to assess by comparing how much of the acquisition pathway was revealed by the fine-tuned model vs a prompt-based-jailbroken version of Base model; this last point isn’t in the manuscript... (read 370 more words →)

Replying toWill releasing the weights of large language models grant widespread access to pandemic agents?

a_g2y

Will releasing the weights of large language models grant widespread access to pandemic agents?

(co-author on the paper)

Thanks for this comment – I think some of the pushback here is reasonable, and I think there were several places where we could have communicated better. To touch on a couple of different points:

> Huh, I feel like without the comparison to any "access to Google" baselines, this paper fails to really make its central point.

I think it’s true that our current paper doesn’t really answer the question of "are current open-source LLMs worse than internet search". We're more uncertain about this, and agree that a control study could be good here; however, despite this, I think the point that future open-source models will be more capable, will... (read more)