TL;DR We experimentally test the mathematical framework for circuits in superposition by hand-coding the weights of an MLP to implement many conditional[1] rotations in superposition on two-dimensional input features. The code can be found here. This work was supported by Coefficient Giving and Goodfire AI 1 Introduction Figure 1: The...
See below the project outputs for AI Safety Camp's 10th edition, which took place from January to April 2025. You can also find them on our website. This year's edition featured a wide range of approaches to reducing AI risk. Projects resulted in published papers, produced helpful community resources, grants...
Summary & Motivation This post is a continuation and clarification of Circuits in Superposition: Compressing many small neural networks into one. That post presented a sketch of a general mathematical framework for compressing different circuits into a network in superposition. On closer inspection, some of it turned out to be...
Using the notation from here: A Mathematical Framework for Transformer Circuits The attention pattern for a single attention head is determined by A=softmax(xTWTQWKx), where softmax is computed for each row of xTWTQWKx. Each row of A gives the attention pattern for the current token. Are these rows (post softmax) typically...
In a private discussion, related to our fundraiser, it was pointed out that AISC hasn't made clear enough what our theory of change is. Therefore this post. Some caveats/context: * This is my personal viewpoint. Other organisers might disagree about what is central or not. * I’ve co-organised AISC1, AISC8,...
We still need more funding to be able to run another edition. Our fundraiser raised $6k as of now, and will end if it doesn't reach the $15k minimum, on February 1st. We need proactive donors. If we don't get funded for this time, there is a good chance we...
This is a linkpost to our funding case on Manifund. Project summary AI Safety Camp has a seven-year track record of enabling participants to try their fit, find careers and start new orgs in AI Safety. We host up-and-coming researchers outside the Bay Area and London hubs. If this fundraiser...