Introspective Interpretability: a Definition, Motivation, and Open Problems
(Originally posted on my blog: https://belindal.github.io/introspection/) 1. Introduction In 2022, ChatGPT turned language models (LMs) from a tool used almost exclusively by AI researchers into the fastest-growing consumer software application in history, spawning a $40 billion generative AI market and a boom that continues to reshape markets today. While the...
Feb 910