Many people in OpenAI truly believe they're doing the right thing, and did so two weeks ago.
According to almost all accounts, the board did not give the people working at OpenAI any new evidence that they were doing something bad! They just tried to dictate to them without any explanation, and employees responded as humans are apt to do when someone tries to dictate to them without explanation, whether they were right or wrong.
Which is to say -- I don't think we've really gotten any evidence about the people are being collectively sociopathic there.
The Gell-Mann Amnesia effect seems pretty operative, given the first name on the relevant NYT article is the same guy who did some pretty bad reporting on Scott Alexander.
If you don't think the latter was a reliable summary of Scott's blog, there's not much reason to think that the former is a reliable summary of the OpenAI situation.
The board's behavior is non-trivial evidence against EA promoting willingness-to-cooperate and trustworthiness.
LLMs as currently trained run ~0 risk of catastrophic instrumental convergence even if scaled up with 1000x more compute
FWIW: I think you're right that I should have paid more attention to the current v future models split in the paper. But I also think that the paper is making... kinda different claims at different times.
Specifically when it talks about the true-or-false world-claims it makes, it talks about models potentially indefinitely far in the future; but when it talks about policy it talks about things you should start doing soon or now.
For instance, consider part 1 of the conclusion:
1. Developers and governments should recognise that some highly capable models will be too dangerous to open-source, at least initially.
If models are determined to pose significant threats, and those risks are determined to outweigh the potential benefits of open-sourcing, then those models should not be open-sourced. Such models may include those that can materially assist development of biological and chemical weapons [50, 109], enable successful cyberattacks against critical national infrastructure [52], or facilitate highly-effective manipulation and persuasion [88].[30]
The [50] and [109] citations are to the two uncontroled, OpenPhil-funded papers from my "science" section above. The [30] is to a footnote like this:
Note that we do not claim that existing models are already too risky. We also do not make any predictions about how risky the next generation of models will be. Our claim is that developers need to assess the risks and be willing to not open-source a model if the risks outweigh the benefits.
And like... if you take this footnote literally, then this paragraph is almost tautologically true!
Even I think you shouldn't open source a model "if the risks outweigh the benefits," how could I think otherwise? If you take it to be making no predictions about current or the next generation of models -- well, nothing to object to. Straightforward application of "don't do bad things."
But if you take it literally -- "do not make any predictions"? -- then there's no reason to actually recommend stuff in the way that the next pages do, like saying the NIST should provide guidance on whether it's ok to open source something; and so on. Like there's a bunch of very specific suggestions that aren't the kind of thing you'd be writing about a hypothetical or distant possibility.
And this sits even more uneasily with claims from earlier in the paper: "Our general recommendation is that it is prudent to assume that the next generation of foundation models could exhibit a sufficiently high level of general-purpose capability to actualize specific extreme risks." (p8 -- !?!). This comes right after it talks about the biosecurity risks of Claude. Or "AI systems might soon present extreme biological risk" Etc.
I could go on, but in general I think the paper is just... unclear about what it is saying about near-future models.
For purposes of policy, it seems to think that we should spin up things to specifically legislate the next gen; it's meant to be a policy paper, not a philosophy paper, after all. This is true regardless of whatever disclaimers it includes about how it is not making predictions about the next gen. This seems very true when you look at the actual uses to which the paper is put.
For concrete experiments, I think this is in fact the place where having an expert tutor becomes useful. When I started in a synthetic biology lab, most of the questions I would ask weren’t things like “how do I hold a pipette” but things like “what protocols can I use to check if my plasmid correctly got transformed into my cell line?” These were the types of things I’d ask a senior grad student, but can probably ask an LLM instead[1]
Right now I can ask a closed-source LLM API this question. Your policy proposal contains no provision to stop such LLMs from answering this question. If this kind of in-itself-innocent question is where danger comes from, then unless I'm confused you need to shut down all bio lab questions directed at LLMs -- whether open source or not -- because > 80% of the relevant lab-style questions can be asked in an innocent way.
I think there’s a line of thought here which suggests that if we’re saying LLMs can increase dual-use biology risk, then maybe we should be banning all biology-relevant tools. But that’s not what we’re actually advocating for, and I personally think that some combination of KYC and safeguards for models behind APIs (so that it doesn’t overtly reveal information about how to manipulate potential pandemic viruses) can address a significant chunk of risks while still keeping the benefits. The paper makes an even more modest proposal and calls for catastrophe liability insurance instead.
If the government had required you to have catastrophe liability insurance for releasing open source software in the year 1995, then, in general I expect we would have no open source software industry today because 99.9% of this software would not be released. Do you predict differently?
Similarly for open source AI. I think when you model this out it amounts to an effective ban, just one that sounds less like a ban when you initially propose it.
And it's a really difficult epistemic environment, since someone who was incorrectly convinced by a misinterpretation of a concrete example they think is dangerous to share is still wrong.
I agree that this is true, and very unfortunate; I agree with / like most of what you say.
But -- overall, I think if you're an org that has secret information, on the basis of which you think laws should be passed, you need to be absolutely above reproach in your reasoning and evidence and funding and bias. Like this is an extraordinary claim in a democratic society, and should be treated as such; the reasoning that you do show should be extremely legible, offer ways for itself to be falsified, and not overextend in its claims. You should invite trusted people who disagree with you in adversarial collaborations, and pay them for their time. Etc etc etc.
I think -- for instance -- that rather than leap from an experiment maybe showing risk, to offering policy proposals in the very same paper, it would be better to explain carefully (1) what total models the authors of the paper have of biological risks, and how LLMs contribute to them (either open-sourced or not, either jailbroken or not, and so on), and what the total increased scale of this risk is, and to speak about (2) what would constitute evidence that LLMs don't contribute to risk overall, and so on.
Evidence for X is when you see something that's more likely in a world with X than in a world with some other condition not X.
Generally substantially more likely; for good reason many people only use "evidence" to mean "reasonably strong evidence."
Finally (again, as also mentioned by others), anthrax is not the important comparison here, it’s the acquisition or engineering of other highly transmissible agents that can cause a pandemic from a single (or at least, single digit) transmission event.
At least one paper that I mention specifically gives anthrax as an example of the kind of thing that LLMs could help with, and I've seen the example used in other places. I think if people bring it up as a danger it's ok for me to use it as a comparison.
LLMs are useful isn’t just because they’re information regurgitators, but because they’re basically cheap domain experts. The most capable LLMs (like Claude and GPT4) can ~basically already be used like a tutor to explain complex scientific concepts, including the nuances of experimental design or reverse genetics or data analysis.
I'm somewhat dubious that a tutor to specifically help explain how to make a plague is going to be that much more use than a tutor to explain biotech generally. Like, the reason that this is called "dual-use" is that for every bad application there's an innocuous application.
So, if the proposal is to ban open source LLMs because they can explain the bad applications of the in-itself innocuous thing -- I just think that's unlikely to matter? If you're unable to rephrase a question in an innocuous way to some LLM, you probably aren't gonna make a bioweapon even with the LLMs help, no disrespect intended to the stupid terrorists among us.
It's kinda hard for me to picture a world where the delta in difficulty in making a biological weapon between (LLM explains biotech) and (LLM explains weapon biotech) is in any way a critical point along the biological weapons creation chain. Is that the world we think we live in? Is this the specific point you're critiquing?
If the proposal is to ban all explanation of biotechnology from LLMs and to ensure it can only be taught by humans to humans, well, I mean, I think that's a different matter, and I could address the pros and cons, but I think you should be clear about that being the actual proposal.
For instance, the post says that “if open source AI accelerated the cure for several forms of cancer, then even a hundred such [Anthrax attacks] could easily be worth it”. This is confusing for a few different reasons: first, it doesn’t seem like open-source LLMs can currently do much to accelerate cancer cures, so I’m assuming this is forecasting into the future. But then why not do the same for bioweapons capabilities?
This makes sense as a critique: I do think that actual biotech-specific models are much, much more likely to be used for biotech research than LLMs.
I also think that there's a chance that LLMs could speed up lab work, but in a pretty generic way like Excel speeds up lab work -- this would probably be good overall, because increasing the speed of lab work by 40% and terrorist lab work by 40% seems like a reasonably good thing for the world overall. I overall mostly don't expect big breakthroughs to come from LLMs.
The quote from Schmidhuber literally says nothing about human extinction being good.
I'm disappointed that Critch glosses it that way, because in the past he has been leveler-headed than many, but he's wrong.
The quote is:
Humans not being the "last stepping stone" towards greater complexity does not imply that we'll go extinct. I'd be happy to live in a world where there are things more complex than humans. Like it's not a weird interpretation at all -- "AI will be more complex than humans" or "Humans are not the final form of complexity in the universe" simply says nothing at all about "humans will go extinct."
You could spin it into that meaning if you tried really hard. But -- for instance-- the statement could also be about how AI will do science better than humans in the future, which was (astonishingly) the substance of the talk in which this statement took place, and also what Schmidhuber has been on about for years, so it probably is what he's actually talking about.
I note that you say, in your section on tribalism.
It would be great if people were a tad more hesitant to accuse others of wanting omnicide.