Louis Jaburi: Compact Proofs of Model Performance via Mechanistic Interpretability

2 Louis Jaburi: Compact Proofs of Model Performance via Mechanistic Interpretability

LWEA

1 min read

Speaker: Louis Jaburi (Independent researcher) Talk: Compact Proofs of Model Performance via Mechanistic Interpretability

Part of the Guaranteed Safe AI Seminars, a monthly online series on AI systems with quantitative safety guarantees.

Recording: https://youtu.be/m_2JnJglx9g

Readings: arXiv:2406.11779, arXiv:2410.07476

YouTube playlist: https://www.youtube.com/playlist?list=PLOutnjp2BEJeQM2J49_KvdpuZlaQXPboy