Zach Stein-Perlman — LessWrong

Anthropic: Sharing our compliance framework for California's Transparency in Frontier AI Act.

2.5 weeks ago Anthropic published a framework for SB 53. I haven't read it; it might have interesting details. And it might be noteworthy that compliance is somewhat separate from RSP, possibly due to CA law. I'm pretty ignorant on state law stuff like SB 53 but I wasn't expecting a new framework.

Framework here (you can't easily download it or copy from it, which is annoying). Anthropic's full Trust Portal here. Anthropic's blogpost below.

On January 1, California's Transparency in Frontier AI Act (SB 53) will go into effect. It establishes the nation’s first frontier AI safety and transparency requirements for catastrophic risks.
While we have long advocated for a federal framework, Anthropic endorsed SB 53 because we believe frontier AI developers like ourselves should be transparent about how they assess and manage these risks. Importantly, the law balances the need for strong safety practices, incident reporting, and whistleblower protections—while preserving flexibility in how developers implement their safety measures, and exempting smaller companies from unnecessary regulatory burdens.
One of the law’s key requirements is that frontier AI developers publish a framework describing how they assess and manage catastrophic risks. Our Frontier Compliance Framework (FCF) is now available to the public, here. Below, we discuss what’s included within it, and highlight what we think should come next for frontier AI transparency.
What’s in our Frontier Compliance Framework
Our FCF describes how we assess and mitigate cyber offense, chemical, biological, radiological, and nuclear threats, as well as the risks of AI sabotage and loss of control, for our frontier models. The framework also lays out our tiered system for evaluating model capabilities against these risk categories and explains our approach to mitigations. It also covers how we protect model weights and respond to safety incidents.
Much of what's in the FCF reflects an evolution of practices we've followed for years. Since 2023, our Responsible Scaling Policy (RSP) has outlined our approach to managing extreme risks from advanced AI systems and informed our decisions about AI development and deployment. We also release detailed system cards when we launch new models, which describe capabilities, safety evaluations, and risk assessments. Other labs have voluntarily adopted similar approaches. Under the new law going into effect on January 1, those types of transparency practices are mandatory for those building the most powerful AI systems in California.
Moving forward, the FCF will serve as our compliance framework for SB 53 and other regulatory requirements. The RSP will remain our voluntary safety policy, reflecting what we believe best practices should be as the AI landscape evolves, even when that goes beyond or otherwise differs from current regulatory requirements.
The need for a federal standard
The implementation of SB 53 is an important moment. By formalizing achievable transparency practices that responsible labs already voluntarily follow, the law ensures these commitments can't be abandoned quietly later once models get more capable, or as competition intensifies. Now, a federal AI transparency framework enshrining these practices is needed to ensure consistency across the country.
Earlier this year, we proposed a framework for federal legislation. It emphasizes public visibility into safety practices, without trying to lock in specific technical approaches that may not make sense over time. The core tenets of our framework include:
Requiring a public secure development framework: Covered developers should publish a framework laying out how they assess and mitigate serious risks, including chemical, biological, radiological, and nuclear harms, as well as harms from misaligned model autonomy.
Publishing system cards at deployment: Documentation summarizing testing, evaluation procedures, results, and mitigations should be publicly disclosed when models are deployed and updated if models are substantially modified.
Protecting whistleblowers: It should be an explicit violation of law for a lab to lie about compliance with its framework or punish employees who raise concerns about violations.
Flexible transparency standards: A workable AI transparency framework should have a minimum set of standards so that it can enhance security and public safety while accommodating the evolving nature of AI development. Standards should be flexible, lightweight requirements that can adapt as consensus best practices emerge.
Limit application to the largest model developers: To avoid burdening the startup ecosystem and smaller developers with models at low risk for causing catastrophic harm, requirements should apply only to established frontier developers building the most capable models.
As AI systems grow more powerful, the public deserves visibility into how they're being developed and what safeguards are in place. We look forward to working with Congress and the administration to develop a national transparency framework that ensures safety while preserving America’s AI leadership.

Thomas Kwa's Shortform

Zach Stein-Perlman11d70

I mostly agree. And you can do better with better investments. I observe that this does not imply that scope-sensitive altruists should accumulate capital in order to buy galaxies: buying one millionth of the lightcone costs around $20M invested well now (my independent number; Thomas says $5-500M iiuc). And I think there are donation opportunities 100x that good for scope-sensitive altruists on the margin.

Habryka's Shortform Feed

Zach Stein-Perlman17d40

I would bet that lawyers, scholars, and chatbots would basically disagree with you. Your literal reading of the Constitution is less important than norms/practice.

Zach Stein-Perlman17d22

Importantly, as far as I can tell, from a purely constitutional perspective the supreme court has no more authority to direct any members of the executive branch to do anything than I do. Their only constitutional power is to call for federal officers to refuse to do something. Asking them to do anything proactive would go relatively clearly against their mandate.

In case nobody else has mentioned it: this is false, courts order actions all the time, e.g. ordering agencies to do stuff that they're legally required to do. Ask your favorite chatbot.

TsviBT's Shortform

Zach Stein-Perlman1mo40

This ~assumes that your primary projects have diminishing returns; the third copy of Tsvi is less valuable than the first. But note that Increasing returns to effort are common.

Zach Stein-Perlman's Shortform

Zach Stein-Perlman1mo230

Zach Stein-Perlman1mo20

The move I was gesturing at—which I haven't really seen other people do—is saying

This sequence of reasoning [is/isn't] cruxy for me

as opposed to the usual use of "cruxiness" which is like

I think P and you think ¬P. We agree that Q would imply P and ¬Q would imply ¬P; the crux is Q.

I believe P. This is based on a bunch of stuff but Q is the step/assumption/parameter/whatever that I expect is most interesting to interlocutors or you're most likely to change my mind on or something.

Zach Stein-Perlman1mo130

If [your job is automated] . . . the S&P 500 will probably at least double (assuming no AI takeover)

Is this true?

It has been claimed:

Prior discussion: Tail SP 500 Call Options.

I currently have 7% of my portfolio in such calls.

Concept: epistemic loadbearingness.

I write an explanation of what I believe on a topic. If the explanation is very load-bearing for my belief, that means that if someone convinces me that a parameter is wrong or points out a methodology/math error, I'll change my mind to whatever the corrected result is. In other words, my belief is very sensitive to the reasoning in the explanation. If the explanation is not load-bearing, my belief is really determined by a bunch of other stuff; the explanation might be helpful to people for showing one way I think I think about the question, but quibbling with the explanation wouldn't change my mind.

This is related to Is That Your True Rejection?; I find my version of the concept more helpful for working with collaborators who share almost all background assumptions with me.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments

What’s in our Frontier Compliance Framework

The need for a federal standard