Proof-of-Concept Debugger for a Small LLM
We want to show that it is possible to build an LLM “debugger” using SAE features and have developed a prototype that automates circuit visualizations for arbitrary prompts. With a few improvements to existing techniques (notably, “cluster resampling”, which is a form of activation patching), we are able to produce...
Mar 17, 202527