habryka — LessWrong

I agree that in common parlance there is still some ontological confusion going on here, but I think it's largely a sideshow to what is happening.

If there was a culture in the world that had an expression that more straightforwardly meant "I curse you" and so wasn't making claims about checkable attributes about the other person, largely not that much would change. Indeed, "I curse you", or the more common "fuck you" is a thing people do (or in the former case used to do), and it works, and usually has very similar effects to saying "you suck", despite the latter being ontologically a very different kind of statement if taken literally.

I agree that there is often also a claim smuggled in about some third-party checkable attribute. This is IMO not that crazy. Indeed, a curse/direct-insult is often associated with some checkable facts, and so calling attention to both makes it efficient to combine them.

It is indeed common that if you were wronged by someone by your own lights, that this is evidence that other people will be wronged by their lights as well, and so that there will be some third-party checkable attribute of the person that generalizes. So it's not that surprising that these two kinds of actions end up with shared language (and my guess is there are also benefits in terms of plausible deniability on how much social capital you end up spending that encourage people to conflate here, but this doesn't change the fact that the pure curse kind of expression exists and is a crucial thing to model to make accurate predictions here).

The Relationship Between Social Punishment and Shared Maps

habryka3h*152

This feels like it's missing the most common form of "social punishment", which is just a threat to at some distant point in the future take resources from you, in a way that ultimately relies on physical force, but just does so through many intermediaries. I agree the map-distorting kind of social punishment is real, but also, lots of social punishment is of the form "I think X is bad, and I will use my ability to steer our collective efforts in the direction of harming X".

A single step removed, this might simply be someone saying "X is bad, and if I see you associating with X I will come and throw stones through your window". Another step removed it becomes "X is bad, and I will vote to remove X from our professional association which is necessary for them to do business". Another step removed it becomes "X is bad and I am spending my social capital which is a shared ledger we vaguely keep track of to reduce the degree to which X gets access to shared resources, and the basis of that social capital is some complicated system of hard power and threats that in some distant past had something to do with physical violence but has long since become its own game".

I don't think most social punishment is best modeled as map distortion. Indeed, I notice in your list above you suspiciously do not list the most common kind of attribute that is attributed to someone facing social punishment. "X is bad" or "X sucks" or "X is evil". Those are indeed different statements, and those statements should usually more accurately be interpreted as a threat in a game grounded in social capital that is more grounded in physical violence and property rights than in map distortion.

davekasten's Shortform

habryka4h20

I am a bit confused why it's an assertion and not an "argument"? The argument is relatively straightforward:

Anthropic is currently shipping its weights to compute providers for inference. Those compute providers almost certainly do not comply with Anthropic's ASL-3 security standard, and the inference setup is likely not structured in a way that makes it impossible for the compute provider somehow get access to the weights if they really wanted. This means Anthropic is violating its RSP, as their ASL-3 security standard required them to be robust against this kind of attack.

It is true that "insider threat from a compute provider" is a key part of Anthropic's threat model! Anthropic is clearly not unaware of this attack chain. Indeed, in the whitepaper linked in Zach's shortform they call for various changes that would need to happen at compute providers to enable a zero-trust relationship here, but also implicitly in calling for these changes they admit that they are very likely not currently in place!

My guess is what happened here is that at least some people at Anthropic are probably aware that their RSP commits them to a higher level of security than they can currently realistically meet, but competetive pressures were too strong, and they hoped they could fix it soon enough without too much of an issue. It's also plausible that Anthropic has somehow implemented a pretty much new security paradigm at compute providers that would protect against sophisticated high-level threats. It's also plausible to me that Anthropic's security team got out of sync with the RSP-writing team and didn't realize that the RSP required them to basically meet RAND's SL-4 security standard (a lot of their writing in e.g. the whitepaper linked reads to me as if they are currently aiming for SL-4, but this is insufficient to meet Anthropic's commitments to be robust against corporate espionage teams, which maybe they aren't tracking properly).

Zach Stein-Perlman's Shortform

habryka14h141

To be clear, I think if the basic premise here is true, then Anthropic at the very least needs to report this violation to their LTBT, and then consequently take down Claude from being served by major cloud providers, if they want to follow the commitments laid out in their RSP. They would also be unable to ship any new models until this issue is resolved.

My guess is no one really treats the RSP as anything particularly serious these days, so none of that will happen. My guess is instead if this escalates at all is that Anthropic would simply edit their RSP to exclude high-level insiders at compute providers they use. This is sad, but I would like things to escalate at least until that point.

RobertM's Shortform

habryka14h60

Just make a new paragraph with three dashes into it, and it will automatically convert into a horizontal line.

Looks like that!

undefined's Shortform

habryka14h3-3

I don't really believe successionists are real in the sense of "people who would reflectively endorse giving control over the future to AI systems right now".

Even if you have weird preferences about AI ultimately being better than humanity, it seems really very convergently insane to make that succession happen now in an uncontrolled way. If you want humanity to be replaced by AI systems, first put yourself in a position where you can steer that transition.

Open Thread Autumn 2025

habryka1d53

Ah, yeah, I think we shouldn't show the spotlight item summary on hover. Seems confusing and speaking about the article and author in third person feels sudden.

Open Thread Autumn 2025

habryka1d30

No such thing exists! So my guess is you must have gotten confused somewhere.

LessWrong is migrating hosting providers (report bugs!)

habryka2d20

Yeah, I care a lot about client-side reactivity, which I think you just can't really achieve that way (unless you want to glue together javascript strings using templates, which I would not recommend).

I think people should just treat the web as an application platform. Doing a roundtrip for each piece of interactivity, or needing to pre-render each piece of interactivity is IMO really not viable at the complexity level of something like LW.

LessWrong is migrating hosting providers (report bugs!)

habryka2d20

I am sure that mental model has nothing to do with why Jim thinks this is/was a bad idea. I think we are all really quite happy we are built on React (or something of that family). Gluing HTML strings together would be a crazy nightmare.

LESSWRONG
LW

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments