55 Anthropic is (probably) not meeting its RSP security commitments

by habryka

18th Nov 2025

5 min read

4

55

AI

55

New Comment

4 comments, sorted by

top scoring

Click to highlight new comments since: Today at 1:47 AM

[-]Zach Stein-Perlman2h80

I agree but don't feel very strongly. On Anthropic security, I feel even more sad about this.

If [the insider exception] does actually apply to a large fraction of technical employees, then I'm also somewhat skeptical that Anthropic can actually be "highly protected" from (e.g.) organized cybercrime groups without meeting the original bar: hacking an insider and using their access is typical!

Reply

1

[-]habryka2h68

I agree but don't feel very strongly.

As I say at the end, I don't particularly care about Anthropic's security commitments here, what I do care about is the RSP meaning anything at all!

And to be clear, my belief for a long time has been that the RSP was unlikely to have much predictive power over Anthropic's priorities, so part of my motivation here is establishing common knowledge about that, so people can push on other governance approaches that aren't relying on companies holding themselves to their RSPs.

Reply

[-]Yonatan Cale2h10

I don't even know whether I think Anthropic having good security is good or bad for the world

Could you share the TL;DR for why this might be bad for the world?

Reply

[-]habryka2h47

It's plausible to me that Anthropic (and other frontier labs) having bad security is good because it deflates race dynamics (if your competitors can just steel your weights after you invest $100b into a training run, you will probably think twice). Bad cybersecurity means you can't capture as much of the economic value provided by a model.

Furthermore, "bad cybersecurity is the poor man's auditing agreement". If I am worried about a lab developing frontier models behind closed doors, then them having bad cybersecurity means other actors can use the stolen weights to check whether a model poses a national security risk to them, and intervene before it is too late.

Is this a terrible solution to auditing? Yes! Are we going to get something better by default? I really don't know, I think not any time soon?

Reply

1

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

55

Anthropic is (probably) not meeting its RSP security commitments

55

55

Anthropic is committed to being robust to attacks from corporate espionage teams (which includes corporate espionage teams at Google, Microsoft and Amazon)

Claude weights that are covered by ASL-3 security requirements are shipped to many Amazon, Google, and Microsoft data centers

This means given executive buy-in by a high-level Amazon, Microsoft or Google executive, their corporate espionage team would have virtually unlimited physical access to Claude inference machines that host copies of the weights

With unlimited physical access, a competent corporate espionage team at Amazon, Microsoft or Google could extract weights from an inference machine, without too much difficulty

Given all of the above, this means Anthropic is in violation of its most recent RSP

Postscript