Iliad is proud to announce that applications are now open for the Iliad Intensive and the Iliad Fellowship! These programs, taken together, are our evolution of the PIBBSS × Iliad Research Residency pilot. The Iliad Intensive will cover taught coursework, serving as a widely comprehensive introduction to the field of...
Iliad is now opening up applications to attend Agent Foundations 2026 at CMU! Agent Foundations 2026 will be a 5-day conference (of ~35 attendees) on fundamental, mathematical research into agency. It will take place March 2–6, 2026 at Carnegie Mellon University in Pittsburgh, Pennsylvania, and will be the third conference...
Repo: https://github.com/DavidUdell/sparse_circuit_discovery TL;DR: A SPAR project from a while back. A replication of an unsupervised circuit discovery algorithm in GPT-2-small, with a negative result. Thanks to Justis Mills for draft feedback and to Neuronpedia for interpretability data. Introduction I (David) first heard about sparse autoencoders at a Bay Area party....
> When you have put a lot of ideas together to make an elaborate theory, you want to make sure, when explaining what it fits, that those things it fits are not just the things that gave you the idea for the theory; but that the finished theory makes something...
Thanks to the many people I've chatted with this about over the past many months. And special thanks to Cunningham et al., Marks et al., Joseph Bloom, Trenton Bricken, Adrià Garriga-Alonso, and Johnny Lin, for crucial research artefacts and/or feedback. Codebase: sparse_circuit_discovery TL; DR: The residual stream in GPT-2-small, expanded...
Especial thanks to Logan Riggs and Monte MacDiarmid, for pointing me towards this whole research direction and for code discussion, respectively. Thanks to Alex Turner for project feedback and for orienting me towards scaling activation engineering up to larger models. Thanks to Adrià Garriga-Alonso, Daniel Kokotajlo, Hoagy Cunningham, Nina Rimsky,...
We wrote up the GPT-2 steering vector work as a full paper, adding a few systematic tests. Recap: We've been looking into activation engineering: modifying the activations of a language model at inference time to predictably alter its behavior. Our method works by adding a bias to the forward pass,...