Mechanisms of Introspective Awareness
Uzay Macar and Li Yang are co-first authors. This work was advised by Jack Lindsey and Emmanuel Ameisen, with contributions from Atticus Wang and Peter Wallich, as part of the Anthropic Fellows Program. Paper: https://arxiv.org/abs/2603.21396. Code: https://github.com/safety-research/introspection-mechanisms TL;DR * We investigate the mechanisms underlying "introspective awareness" (as shown in Lindsey...