Developmental Interpretability

Developmental interpretability is an AI alignment research agenda studying how structure forms in neural networks.

2023 SLT & alignment summit

The first SLT & alignment summit ("Singularities against the singularity") was run in June 2023. In the first week, we recorded more than 20 hours of lectures on the necessary background, all of which you can find here. In the second week, we started research collaborations on a dozen open problems.

A second summit is planned for November 2023. Stay tuned for more details.

Community

Join our discord to discuss ask questions, find collabators, and stay up to date on the latest developments.

Join