x

LESSWRONG

LW

Kamal Maher — LessWrong

Kamal Maher

Kamal Maher

Message

44

4mo

Kamal Maher

44

4mo

Cross-Layer Transcoders are incentivized to learn Unfaithful Circuits

by Georg Lange, RGRGRG, Kat Dearstyne, and Kamal Maher

Many thanks to Michael Hanna and Joshua Batson for useful feedback and discussion. Kat Dearstyne and Kamal Maher conducted experiments during the SPAR Fall 2025 Cohort. TL;DR Cross-layer transcoders (CLTs) enable circuit tracing that can extract high-level mechanistic explanations for arbitrary prompts and are emerging as general-purpose infrastructure for mechanistic...