Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.
(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)
I think I am overall glad about this project, but I do want to share that my central reaction has been "none of these lines seem very red to me, in the sense of being bright clear lines, and it's been very confusing how the whole 'call for red lines' does not actually suggest any specific concrete red line". Like, of course everyone would like some kind of clear line with regards to AI, the central question is what are the lines!
“the need to ensure that AI never lowers the barriers to acquiring or deploying prohibited weapons”
This for example seems like a really bad red line. Indeed, it seems very obvious that it has already been crossed. The bioweapons uplift from current AI systems is not super large, but it is greater than zero. Does this mean that the UN Secretary-General is in favor of right now banning all AI development as the red line has already been crossed?
(Separately, I am also pretty sad about the focus on autonomous weapons. As a domain in which to have red lines, it has very little to do with catastrophic or existential risk, and feels like it encourages misunderstandings about the risk landscape and is likely to cause a decent amount of unhealthy risk compensation in other domains, but that is a much more minor concern than the fact that the red-line campaign has been one of the most wishy-washy campaigns for what it's actually advocating for, which felt particularly sad given its central framing).
I agree that in common parlance there is still some ontological confusion going on here, but I think it's largely a sideshow to what is happening.
If there was a culture in the world that had an expression that more straightforwardly meant "I curse you" and so wasn't making claims about checkable attributes about the other person, and that expression was commonly used where we use statements like "You suck", I don't think that culture would be very different from ours. Indeed, "I curse you", or the more common "fuck you" is a thing people say (or in the former case used to say), and it works, and usually has very similar effects to saying "you suck", despite the latter being ontologically a very different kind of statement if taken literally.
I agree that there is often also a claim smuggled in about some third-party checkable attribute. This is IMO not that crazy. Indeed, a curse/direct-insult is often associated with some checkable facts, and so calling attention to both makes it efficient to combine them.
It is indeed common that if you were wronged by someone by your own lights, that this is evidence that other people will be wronged by their lights as well, and so that there will be some third-party checkable attribute of the person that generalizes. So it's not that surprising that these two kinds of actions end up with shared language (and my guess is there are also benefits in terms of plausible deniability on how much social capital you end up spending that encourage people to conflate here, but this doesn't change the fact that the pure curse kind of expression exists and is a crucial thing to model to make accurate predictions here).
This feels like it's missing the most common form of "social punishment", which is just a threat to at some distant point in the future take resources from you, in a way that ultimately relies on physical force, but just does so through many intermediaries. I agree the map-distorting kind of social punishment is real, but also, lots of social punishment is of the form "I think X is bad, and I will use my ability to steer our collective efforts in the direction of harming X".
A single step removed, this might simply be someone saying "X is bad, and if I see you associating with X I will come and throw stones through your window". Another step removed it becomes "X is bad, and I will vote to remove X from our professional association which is necessary for them to do business". Another step removed it becomes "X is bad and I am spending my social capital which is a shared ledger we vaguely keep track of to reduce the degree to which X gets access to shared resources, and the basis of that social capital is some complicated system of hard power and threats that in some distant past had something to do with physical violence but has long since become its own game".
I don't think most social punishment is best modeled as map distortion. Indeed, I notice in your list above you suspiciously do not list the most common kind of attribute that is attributed to someone facing social punishment. "X is bad" or "X sucks" or "X is evil". Those are indeed different statements, and those statements should usually more accurately be interpreted as a threat in a game grounded in social capital that is more grounded in physical violence and property rights than in map distortion.
I am a bit confused why it's an assertion and not an "argument"? The argument is relatively straightforward:
Anthropic is currently shipping its weights to compute providers for inference. Those compute providers almost certainly do not comply with Anthropic's ASL-3 security standard, and the inference setup is likely not structured in a way that makes it impossible for the compute provider somehow get access to the weights if they really wanted. This means Anthropic is violating its RSP, as their ASL-3 security standard required them to be robust against this kind of attack.
It is true that "insider threat from a compute provider" is a key part of Anthropic's threat model! Anthropic is clearly not unaware of this attack chain. Indeed, in the whitepaper linked in Zach's shortform they call for various changes that would need to happen at compute providers to enable a zero-trust relationship here, but also implicitly in calling for these changes they admit that they are very likely not currently in place!
My guess is what happened here is that at least some people at Anthropic are probably aware that their RSP commits them to a higher level of security than they can currently realistically meet, but competetive pressures were too strong, and they hoped they could fix it soon enough without too much of an issue. It's also plausible that Anthropic has somehow implemented a pretty much new security paradigm at compute providers that would protect against sophisticated high-level threats. It's also plausible to me that Anthropic's security team got out of sync with the RSP-writing team and didn't realize that the RSP required them to basically meet RAND's SL-4 security standard (a lot of their writing in e.g. the whitepaper linked reads to me as if they are currently aiming for SL-4, but this is insufficient to meet Anthropic's commitments to be robust against corporate espionage teams, which maybe they aren't tracking properly).
To be clear, I think if the basic premise here is true, then Anthropic at the very least needs to report this violation to their LTBT, and then consequently take down Claude from being served by major cloud providers, if they want to follow the commitments laid out in their RSP. They would also be unable to ship any new models until this issue is resolved.
My guess is no one really treats the RSP as anything particularly serious these days, so none of that will happen. My guess is instead if this escalates at all is that Anthropic would simply edit their RSP to exclude high-level insiders at compute providers they use. This is sad, but I would like things to escalate at least until that point.
Just make a new paragraph with three dashes into it, and it will automatically convert into a horizontal line.
Looks like that!
I don't really believe successionists are real in the sense of "people who would reflectively endorse giving control over the future to AI systems right now".
Even if you have weird preferences about AI ultimately being better than humanity, it seems really very convergently insane to make that succession happen now in an uncontrolled way. If you want humanity to be replaced by AI systems, first put yourself in a position where you can steer that transition.
Ah, yeah, I think we shouldn't show the spotlight item summary on hover. Seems confusing and speaking about the article and author in third person feels sudden.
No such thing exists! So my guess is you must have gotten confused somewhere.
At least for me, the way the whole website and call was framed, I kept reading and reading and kept being like "ok, cool, red lines, I don't really know what you mean by that, but presumably you are going to say one right here? No wait, still no. Maybe now? Ok, I give up. I guess it's cool that people think AI will be a big deal and we should do something about it, though I still don't know what the something is that this specific thing is calling for.".
Like, in the absence of specific red lines, or at the very least a specific defnition of what a red line is, this thing felt like this:
And like, sure. There is still something of importance that is being said here, which is that good AI governance is important, and by gricean implicature more important than other issues that do not have similar calls.
But like, man, the above does feel kind of vacuous. Of course we would like to have good governance! Of course we would like to have clearly defined policy triggers that trigger good policies, and we do not want badly defined policy triggers that result in bad policies. But that's hardly any kind of interesting statement.
Like, your definition of "red line" is this:
And like, I don't really buy the "agreed upon internationally" part. Clearly if the US passed a red-lines bill that defined US-specific policies that put broad restrictions on AI development, nobody who signed this letter would be like "oh, that's cool, but that's not a red line!".
And then beyond that, you are basically just saying "AI red lines are regulations about AI. They are things that we say that AI is not allowed to do. Also known as laws about AI".
And yeah, cool, I agree that we want AI regulation. Lots of people want AI regulation. But having a big call that's like "we want AI regulation!" does kind of fail to say anything. Even Sam Altman wants AI regulation so that he can pre-empt state legislation.
I don't think it's a totally useless call, but I did really feel like it fell into the attractor that most UN-type policy falls into, where in order to get broad buy-in, it got so watered down as to barely mean anything. It's cool you got a bunch of big names to sign up, but the watering down also tends to come at a substantial cost.