davekasten — LessWrong

To be clear, I think people should feel free to block freely and for any reason, including literally no reason at all. I'm open to ways of describing people's block decisions in the future that better convey that, but I definitely didn't think others reading this would assume "oh, Zach's the bad guy here" as opposed to the reverse.

davekasten's Shortform

davekasten7d20

By the "whitepaper," are you referring to the RSP v2.2 that Zach linked to, or something else? If so, I don't understand how a generic standard can "call out what kind of changes would need to be required" to their current environment if they're also claiming they meet their current standard.

Also, just to cut a little more to brass tacks here, can you describe the specific threat model that you think they are insufficiently responding to? By that, I don't mean just the threat actor (insiders within their compute provider) and their objective to get weights, but rather the specific class or classes of attacks that you expect to occur, and why you believe that existing technical security + compensating controls are insufficient given Anthropic's existing standards.

For example, AIUI the weights aren't just sitting naively decrypted at inference, they're running inside a fairly locked down trusted execution environment, with keys provided only as-needed (probably with an ephemeral keying structure?) from a HSM, and those trusted execution environments are operating inside a physical security perimeter of a data center that already is designed to mitigate insider risk. Which parts of this are you worried are attackable? To what degree are organizational boundaries between Anthropic and its compute providers salient to increasing this risk? Why should we expect that the compute providers don't already have sufficient compensatory controls here, given that, e.g., these compute providers also provide classified compute to the US government that is secured at the Top Secret / SCI level and presumably therefore have best-in-class anti-insider-threat capabilities?

I'm extremely willing to buy into a claim that they're not doing enough, but I would actually need to have an argument here that's more specific.

davekasten's Shortform

davekasten7d20

I'm really not following your argument here. Of course in many instances compute providers don't offer zero trust relationships with those running on their systems. This is just not news. There's a reason why we have an entire universe of compensating technical and non-technical controls to mitigate risk in such circumstances.

You have done zero analysis to identify any reason to believe that those compensating controls are insufficient. You could incredibly easily get me to flip sides in this discussion if you offered any of that, but simply saying that someone's running zero trust isn't sufficient. As a hypothetical, if Anthropic is expending meaningful effort to be highly confident that they can ensure that Amazon's own security processes are securing against insiders, they would have substantial risk reduction (as long as they can have high confidence said processes are continuing to be executed.

Separately, though it probably cuts against my argument above [1], I would politely disagree with the perhaps-unintended implication in your comment above that "implement zero trust" is a sufficient definition of defenses to defend against compute providers like Amazon, MSFT, etc. After all, Anthropic's proper threat modeling of them should include things like, "Amazon, Microsoft, etc. employ former nation-state hackers who considered attacking zero trust networks to be part of the cost of doing business."

[1] Scout mindset, etc.

davekasten's Shortform

davekasten10d2-4

Huh? Simply using someone else's hosting doesn't mean that Amazon has a threat-modeled ability to steal Claude's model weights.

For example, it could be the case (not saying it is, this is just illustrative) that Amazon has given Anthropic sufficient surveillance capabilities inside their data centers that combined with other controls the risk is low.

davekasten's Shortform

davekasten10d20

Where's the "almost certainly" coming from? I feel like everyone responding to this is seeing something I'm not seeing.

davekasten's Shortform

davekasten16d110

Zach Stein-Perlman's recent quick take is confusing. It just seems like an assertion, followed by condemnation of Anthropic conditioned on us accepting his assertion blindly as true.

It is definitely the case that "insider threat from a compute provider" is a key part of Anthropic's threat model! They routinely talk about it in formal and informal settings! So what precisely is his threat model here that he thinks they're not defending adequately against?

(He has me blocked from commenting on his posts for some reason, which is absolutely his right, but insofar as he hasn't blocked me from seeing his posts, I wanted to explicitly register in public my objection to this sort of low-quality argument.)

Buck's Shortform

davekasten1mo40

My opinion, FWIW, is that both treaty and international agreement (or "deal", etc.) have upsides and downsides. And it's hard to predict those considerations' political salience or direction in the long term -- e.g., just a few years ago, Republicans' main complaint against the JCPOA (aka "the Iran Nuclear Deal") was that it wasn't an actual treaty, and should have been, which would be a very odd argument in 2025.

I think as long as MIRI says things like "or other international agreement or set of customary norms" on occasion it should be fine. It certainly doesn't nails on the chalkboard me to hear "treaty" on a first glance, and in any long convo I model MIRI as saying something like "or look, we'd be open to other things that get this done too, we think a treaty is preferable but are open to something else that solves the same problem."

The title is reasonable

davekasten1mo50

The big challenge here is getting national security officials to respond to your survey! Probably easier with former officials, but unclear how much that's predictive of current officials' beliefs.

The title is reasonable

davekasten1mo87

I'm pretty sure that p(doom) is much more load-bearing for this community than policymakers generally. And frankly, I'm like this close to commissioning a poll of US national security officials where we straight up ask "at percent X of total human extinction would you support measures A, B, C, D, etc."

I strongly, strongly, strongly suspect based on general DC pattern recognition that if the US government genuinely belived that the AI companies had a 25% chance of killing us all, FBI agents would rain out of the sky like a hot summer thunderstorm, sudden, brilliant, and devastating.

davekasten's Shortform

davekasten1mo17543

Heads up -- if you're 1. on a H1-B visa AND 2. currently outside the US, there is VERY IMPORTANT, EXTREMELY TIME SENSITIVE stuff going on that might prevent you from getting back into the US after 21 September.

If this applies to you, immediately stop looking at LessWrong and look at the latest news. (I'm not providing a summary of it here because there are conflicting stories about who it will apply to and it's evolving hour by hour and I don't want this post to be out of date)

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments