Gunnar Carlsson — LessWrong

Cross Layer Transcoders for the Qwen3 LLM Family

Digging Into Interpretable Features Sparse autoencoders SAEs and cross layer transcoders CLTs have recently been used to decode the activation vectors in large language models into more interpretable features. Analyses have been performed by Goodfire, Anthropic, DeepMind, and OpenAI. BluelightAI has constructed CLT features for the Qwen3 family, specifically Qwen3-0.6B...

Dec 4, 202526

Improving CNNs with Klein Networks: A Topological Approach to AI

In our earlier post, we described how one could parametrize local image patches in natural images by a surface called a Klein bottle. In Love et al, we used this information to modify the convolutional neural network construction so as to incorporate information about the pixels in a small neighborhood...

Apr 21, 202520

From Loops to Klein Bottles: Uncovering Hidden Topology in High Dimensional Data

Motivation Dimensionality reduction is vital to the analysis of high dimensional data, i.e. data with many features. It allows for better understanding of the data, so that one can formulate useful analyses. Dimensionality reduction that produces a set of points in a vector space of dimension n, where n s...

Mar 24, 202515

Geometry of Features in Mechanistic Interpretability

This post is motivated by the observation in Open Problems in Mechanistic Interpretability by Sharkey, Chugtai, et al that `` SDL (sparse dictionary learning) leaves feature geometry unexplained", and that it is desirable to utilize geometric structures to gain interpretability for sparse autoencoder features. We strongly agree, and the goal...

Mar 14, 202516

Topological Data Analysis and Mechanistic Interpretability

This article was written in response to a post on LessWrong from the Apollo Research interpretability team. This post represents our initial attempts at acting on the topological data analysis suggestions. In this post, we’ll look at some ways to use topological data analysis (TDA) for mechanistic interpretability. We’ll first...

Feb 24, 202516