Sparse autoencoders SAEs and cross layer transcoders CLTs have recently been used to decode the activation vectors in large language models into more interpretable features. Analyses have been performed by Goodfire, Anthropic, DeepMind, and OpenAI. BluelightAI has constructed CLT features for the Qwen3 family, specifically Qwen3-0.6B Base and Qwen3-1.7B Base, which are made available for exploration and discovery here. In addition to the construction of the features themselves, we enable the use of topological data analysis (TDA) methods for improved interaction and analysis of the constructed features.
We have found anecdotally that it is easier to find clearer and more conceptually abstract features in the CLT features we construct than what we have observed in other analyses. Here are a couple of examples from Qwen3-1.7B-Base:
Layer 20, feature 847: Meta-level judgment of conceptual or interpretive phrases, often with strong evaluative language. It fires on text that evaluates how something is classified, framed, or interpreted, especially when it says that a commonly used label or interpretation is wrong.
Layer 20, feature 179: Fires on phrases about criteria or conditions that must be fulfilled, and is multilingual.
In addition, a number of features are preferentially highly active on the CLTs and show high activation for concepts specifically isolated to stop words and punctuation, as was observed in this analysis.
Topological data analysis methods are used to enable the identification and analysis of groups of features. Even though the CLT features we construct are often meaningful by themselves, it is certainly the case that ideas and concepts will be more precisely identified by groups of features. TDA enables the determination of groups of features that are close in a similarity measure through a visual interface.
Here is an illustration of the interface. Each node in the graph corresponds to a group of features, so groups of nodes also correspond to groups of features. The circled group is at least partially explained by the phrases on the right.
We also believe that TDA can be used effectively as a tool for circuit-tracing in LLMs. Circuit tracing is now very much a manual procedure that selects individual features and looks at individual features in subsequent layers that they connect to. Connections between groups are something one would very much like to analyze, and we will return to that in a future post.