Cross Layer Transcoders for the Qwen3 LLM Family
Digging Into Interpretable Features Sparse autoencoders SAEs and cross layer transcoders CLTs have recently been used to decode the activation vectors in large language models into more interpretable features. Analyses have been performed by Goodfire, Anthropic, DeepMind, and OpenAI. BluelightAI has constructed CLT features for the Qwen3 family, specifically Qwen3-0.6B Base and Qwen3-1.7B Base, which are made available for exploration and discovery here. In addition to the construction of the features themselves, we enable the use of topological data analysis (TDA) methods for improved interaction and analysis of the constructed features. We have found anecdotally that it is easier to find clearer and more conceptually abstract features in the CLT features we construct than what we have observed in other analyses. Here are a couple of examples from Qwen3-1.7B-Base: Layer 20, feature 847: Meta-level judgment of conceptual or interpretive phrases, often with strong evaluative language. It fires on text that evaluates how something is classified, framed, or interpreted, especially when it says that a commonly used label or interpretation is wrong. * You might be tempted to paraphrase Churchill and say it was the end of the beginning, but it wasn’t that either. * This is peculiar objection to imprisonment – rather like complaining that your TV is not working because it does not defrost chickens * Well, yeah, that’s like saying that you owe money on your mortgage because you borrowed it. The real question is “why do we have to keep running such large deficits?” Layer 20, feature 179: Fires on phrases about criteria or conditions that must be fulfilled, and is multilingual. * Also, strong skin pigmentation or tattoo at the measurement location was regarded as exclusion criterion as it might interfere with the green light-based PPG. * Protect doctrine should conditions be favorable and calling for unilateral limited military efforts to establish safe-zones in Feb