Read the constitution. Previously: 'soul document' discussion here; the new constitution contains almost all of the 'soul document' content, but is >2x longer with a lot of new additions. (Zac and Drake work at Anthropic but are just sharing the linkpost and weren't heavily involved in writing this document.) We're...
Today we are publishing a significant update to our Responsible Scaling Policy (RSP), the risk governance framework we use to mitigate potential catastrophic risks from frontier AI systems. This update introduces a more flexible and nuanced approach to assessing and managing AI risks while maintaining our commitment not to train...
Last September we published our first Responsible Scaling Policy (RSP) [LW discussion], which focuses on addressing catastrophic safety failures and misuse of frontier models. In adopting this policy, our primary goal is to help turn high-level safety concepts into practical guidelines for fast-moving technical organizations and demonstrate their viability as...
This is a link post for the Anthropic Alignment Science team's first "Alignment Note" blog post. We expect to use this format to showcase early-stage research and work-in-progress updates more in the future. Twitter thread here. Top-level summary: > In this post we present "defection probes": linear classifiers that use...
(nb: this post is written for anyone interested, not specifically aimed at this forum) We believe that the AI sector needs effective third-party testing for frontier AI systems. Developing a testing regime and associated policy interventions based on the insights of industry, government, and academia is the best way to...
I hope Dario's remarks to the Summit can shed some light on how we think about RSPs in general and Anthropic's RSP in particular, both of which have been discussed extensively since I shared our RSP announcement. The full text of Dario's remarks follows: Before I get into Anthropic’s Responsible...
Text of post based on our blog post as a linkpost for the full paper which is considerably longer and more detailed. Neural networks are trained on data, not programmed to follow rules. We understand the math of the trained network exactly – each neuron in a neural network performs...