davekasten — LessWrong

One Shot Singalonging is an attitude, not a skill or a song-difficulty-level*

One thing to consider is a strategy used in Jewish singing contexts (which I see relatively rarely done in other contexts -- but maybe it's super common and I just don't know the word to describe it!) Before singing the first verse or the chorus, you do a wordless verse where you're just using nonsense syllables like "lai-dai-lai" or "yai-dah-dai-dai" that matches the song. This enables folks to pre-load the music before having to learn the words, and gives implicit social permission to, if you forget the words, to just do nonsense syllables that match the words. (A common problem if it's a Hebrew text you're unfamiliar with and you fall behind in reading it!)

(These are sometimes called niggunim, from the Hebrew for "tune" or "melody" for those wanting to google; for what should be fairly obvious reasons about a false cognate, I didn't lead with that vocabulary term)

So for example, for The Circle, you'd start with something:
"Yah dai dai dai daiiii?
Dai dai, daaaai dah daaaai.
Yah dai dai dai daiiii?
Dai dai, daaaai dah daaaai.

Yah dai dai dai daiiii
Lah dah dai lai lai
Bah dah baiiih bah bah
Bah da bah baii bah
Bah da baiiii da baiii

Dai dai, daaaai dah daaaai."

To presage the beat of verses like:

"So will we bring our families in,
Circle, grow and grow.
those whom Nature made our kin?
Circle, grow and grow.
Countless likenesses we find,
by our common blood bestowed.
What a debt of care is owed;
what a blesséd tie that binds!

Circle, circle, grow and grow."

(The Circle is a little tricky in that the first verse starts slightly different, but trust me, this social technology extends to that use case as well, it's not uncommon to have that in Jewish songs)

davekasten's Shortform

davekasten1mo20

To be clear, I think people should feel free to block freely and for any reason, including literally no reason at all. I'm open to ways of describing people's block decisions in the future that better convey that, but I definitely didn't think others reading this would assume "oh, Zach's the bad guy here" as opposed to the reverse.

davekasten's Shortform

davekasten1mo20

By the "whitepaper," are you referring to the RSP v2.2 that Zach linked to, or something else? If so, I don't understand how a generic standard can "call out what kind of changes would need to be required" to their current environment if they're also claiming they meet their current standard.

Also, just to cut a little more to brass tacks here, can you describe the specific threat model that you think they are insufficiently responding to? By that, I don't mean just the threat actor (insiders within their compute provider) and their objective to get weights, but rather the specific class or classes of attacks that you expect to occur, and why you believe that existing technical security + compensating controls are insufficient given Anthropic's existing standards.

For example, AIUI the weights aren't just sitting naively decrypted at inference, they're running inside a fairly locked down trusted execution environment, with keys provided only as-needed (probably with an ephemeral keying structure?) from a HSM, and those trusted execution environments are operating inside a physical security perimeter of a data center that already is designed to mitigate insider risk. Which parts of this are you worried are attackable? To what degree are organizational boundaries between Anthropic and its compute providers salient to increasing this risk? Why should we expect that the compute providers don't already have sufficient compensatory controls here, given that, e.g., these compute providers also provide classified compute to the US government that is secured at the Top Secret / SCI level and presumably therefore have best-in-class anti-insider-threat capabilities?

I'm extremely willing to buy into a claim that they're not doing enough, but I would actually need to have an argument here that's more specific.

davekasten's Shortform

davekasten1mo20

I'm really not following your argument here. Of course in many instances compute providers don't offer zero trust relationships with those running on their systems. This is just not news. There's a reason why we have an entire universe of compensating technical and non-technical controls to mitigate risk in such circumstances.

You have done zero analysis to identify any reason to believe that those compensating controls are insufficient. You could incredibly easily get me to flip sides in this discussion if you offered any of that, but simply saying that someone's running zero trust isn't sufficient. As a hypothetical, if Anthropic is expending meaningful effort to be highly confident that they can ensure that Amazon's own security processes are securing against insiders, they would have substantial risk reduction (as long as they can have high confidence said processes are continuing to be executed.

Separately, though it probably cuts against my argument above [1], I would politely disagree with the perhaps-unintended implication in your comment above that "implement zero trust" is a sufficient definition of defenses to defend against compute providers like Amazon, MSFT, etc. After all, Anthropic's proper threat modeling of them should include things like, "Amazon, Microsoft, etc. employ former nation-state hackers who considered attacking zero trust networks to be part of the cost of doing business."

[1] Scout mindset, etc.

davekasten's Shortform

davekasten1mo2-4

Huh? Simply using someone else's hosting doesn't mean that Amazon has a threat-modeled ability to steal Claude's model weights.

For example, it could be the case (not saying it is, this is just illustrative) that Amazon has given Anthropic sufficient surveillance capabilities inside their data centers that combined with other controls the risk is low.

davekasten's Shortform

davekasten1mo20

Where's the "almost certainly" coming from? I feel like everyone responding to this is seeing something I'm not seeing.

davekasten's Shortform

davekasten1mo110

Zach Stein-Perlman's recent quick take is confusing. It just seems like an assertion, followed by condemnation of Anthropic conditioned on us accepting his assertion blindly as true.

It is definitely the case that "insider threat from a compute provider" is a key part of Anthropic's threat model! They routinely talk about it in formal and informal settings! So what precisely is his threat model here that he thinks they're not defending adequately against?

(He has me blocked from commenting on his posts for some reason, which is absolutely his right, but insofar as he hasn't blocked me from seeing his posts, I wanted to explicitly register in public my objection to this sort of low-quality argument.)

Buck's Shortform

davekasten2mo40

My opinion, FWIW, is that both treaty and international agreement (or "deal", etc.) have upsides and downsides. And it's hard to predict those considerations' political salience or direction in the long term -- e.g., just a few years ago, Republicans' main complaint against the JCPOA (aka "the Iran Nuclear Deal") was that it wasn't an actual treaty, and should have been, which would be a very odd argument in 2025.

I think as long as MIRI says things like "or other international agreement or set of customary norms" on occasion it should be fine. It certainly doesn't nails on the chalkboard me to hear "treaty" on a first glance, and in any long convo I model MIRI as saying something like "or look, we'd be open to other things that get this done too, we think a treaty is preferable but are open to something else that solves the same problem."

The title is reasonable

davekasten2mo50

The big challenge here is getting national security officials to respond to your survey! Probably easier with former officials, but unclear how much that's predictive of current officials' beliefs.

The title is reasonable

davekasten2mo87

I'm pretty sure that p(doom) is much more load-bearing for this community than policymakers generally. And frankly, I'm like this close to commissioning a poll of US national security officials where we straight up ask "at percent X of total human extinction would you support measures A, B, C, D, etc."

I strongly, strongly, strongly suspect based on general DC pattern recognition that if the US government genuinely belived that the AI companies had a 25% chance of killing us all, FBI agents would rain out of the sky like a hot summer thunderstorm, sudden, brilliant, and devastating.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments