StellaAthena - LessWrong

RAND report finds no effect of current LLMs on viability of bioterrorism attacks

This is one area where I hope the USG will be able to exert coercive force to bring companies to heel. Early access evals, access to base models, and access to training data seem like no-brainers from a regulatory POV.

RAND report finds no effect of current LLMs on viability of bioterrorism attacks

StellaAthena6mo43

I think you're misrepresenting Gwern's argument. He's arguing that terrorists are not optimizing for killing the most people. He makes no claims about whether terrorists are scientifically incompetent.

RAND report finds no effect of current LLMs on viability of bioterrorism attacks

StellaAthena6mo90

Thanks! I like your title more :)

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

StellaAthena6mo8-5

It seems helpful to me if policy discussions can include phrases like "the evidence suggests that if the current ML systems were trying to deceive us, we wouldn't be able to change them not to".

I take this as evidence that TurnTrout's fears about this paper are well-grounded. This claim is not meaningfully supported by the paper, but I expect many people to repeat it as if it is supported by the paper.

Integrity in AI Governance and Advocacy

StellaAthena7mo32

We ended up talking about this in DMs, but to gist of it is:

Back in June Hoagy opened a thread in our "community research projects" channel and the work migrated there. Three of the five authors of the [eventual paper](https://arxiv.org/abs/2309.08600) chose to have EleutherAI affiliation (for any work we organize with volunteers, we tell them they're welcome to use an EleutherAI affiliation on the paper if they like) and we now have an entire channel dedicated to future work. I believe Hoagy has two separate paper ideas currently in the works and over a half dozen people working on them.

Integrity in AI Governance and Advocacy

StellaAthena7mo40

Ooops. It appeared that I deleted my comment (deeming it largely off-topic) right as you were replying. I'll reproduce the comment below, and then reply to your question.

I separately had a very weird experience with them on the Long Term Future Fund where Conor Leahy applied for funding for Eleuther AI. We told him we didn't want to fund Eleuther AI since it sure mostly seemed like capabilities-research but we would be pretty interested in funding AI Alignment research by some of the same people. He then confusingly went around to a lot of people around EleutherAI and told them that "Open Phil is not interested in funding pre-paradigmatic AI Alignment research and that that is the reason why they didn't fund Eleuther AI". This was doubly confusing and misleading because Open Phil had never evaluated a grant to Eleuther AI (Asya who works at Open Phil was involved in the grant evaluation as a fund member, but nothing else), and of course the reason he cited had nothing to do with the reason we actually gave. He seems to have kept saying this for a long time even after I think someone explicitly corrected the statement to

While this anecdote is largely orthogonal to the broader piece, I remembered that this existed randomly today and wanted to mention that Open Phil has recommended a 2.6 M/3 years grant to EleutherAI to pursue interpretability research. It was a really pleasant and very easy experience: Nora Belrose (head of interpretability) and I (head of everything) talked with them about some of our recent and on-going work such as Eliciting Latent Predictions from Transformers with the Tuned Lens, Eliciting Latent Knowledge from Quirky Language Models, and Sparse Autoencoders Find Highly Interpretable Features in Language Models very interesting and once they knew we had shared areas of interest it was a really easy experience.

I had no vibes along the lines of "oh we don't like EleutherAI" or "we don't fund pre-paradigmatic research." It was a surprise to some people at Open Phil that we had areas of overlapping interest, but we spent like half an hour clarifying our research agenda and half an hour talking about what we wanted to do next and people were already excited.

Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk

StellaAthena9mo915

I agree that a control group is vital for good science. Nonetheless, I think that such an experiment is valuable and informative, even if it doesn't meet the high standards required by many professional science disciplines. I believe in the necessity of acting under uncertainty. Even with its flaws, this study is sufficient evidence for us to want to enact temporary regulation at the same time as we work to provide more robust evaluations.

But... this study doesn't provide evidence that LLMs increase bioweapon risk.

Introducing the Center for AI Policy (& we're hiring!)

StellaAthena10mo10

It doesn't let the government institute prior restraint on speech.

Introducing the Center for AI Policy (& we're hiring!)

StellaAthena11mo43

So far, I'm confident that our proposals will not impede the vast majority of AI developers, but if we end up receiving feedback that this isn't true, we'll either rethink our proposals or remove this claim from our advocacy efforts.

It seems to me like you've received this feedback already in this very thread. The fact that you're going to edit the claim to basically say "this doesn't effect most people because most people don't work on LLMs" completely dodges the actual issue here, which is that there's a large non-profit and independent open source LLM community that this would heavily impact.

I applaud your honestly in admitting one approach you might take is to "remove this claim from our advocacy efforts," but am quite sad to see that you don't seem to care about limiting the impact of your regulation to potentially dangerous models.

Introducing the Center for AI Policy (& we're hiring!)

StellaAthena11mo140

Nora didn't say that this proposal is harmful. Nora said that if Zach's explanation for the disconnect between their rhetoric and their stated policy goals is correct (namely that they don't really know what they're talking about) then their existence is likely net-harmful.

That said, yes requiring everyone who wants to finetune LLaMA 2 get a license would be absurd and harmful. la3orn and gallabyres articulate some reasons why in this thread.

Another reason is that it's impossible to enforce, and passing laws or regulations and then not enforcing them is really bad for credibility.

Another reason is that the history of AI is a history of people ignoring laws and ethics so long as it makes them money and they can afford to pay the fines. Unless this regulation comes with fines so harsh that they remove all possibility of making money off of models, OpenAI et al. won't be getting licenses. They'll just pay the fines while small scale and indie devs (who allegedly the OP is specifically hoping to not impact) screech their work to a halt and wait for the government to tell them it's okay for them to continue to do their work.

Also, such a regulation seems like it would be illegal in the US. While the government does have wide latitude to regulate commercial activities that impact multiple states, this is rather specifically a proposal that would regulate all activity (even models that never get released!). I'm unaware of any precedent for such an action, can you name one?

LESSWRONG
LW

Posts

Wiki Contributions

Comments