phenomanon

From Thermodynamics to Sora: A Comprehensive Introduction to Denoising Diffusion for Video Generation

3mo

Video Diffusion Models have recently experienced a sharp uptake in interest, both academically and popularly. Despite this increase in impact, relative to LLMs or other popular architectures, video diffusion models have received far less attention from an interpretability perspective. In order to boost awareness and lay a framework for model understanding, we offer a conceptual explanation of video diffusion models, as well as a mathematical framework in similar parlance to that of LLMs, with an emphasis on autoregressive approaches. This blog post assumes basic familiarity with neural networks and machine learning.

Background and Intro

The goal of this work is to provide a background to video diffusion models and intuitively discuss what we might... (read 4466 more words →)

Quantifying SAE Quality with Feature Steerability Metrics

phenomanon

10mo

Introducing the Steerability Metric

Steering CLIP's Vision Transformer with Sparse Autoencoders

The Steerability metric came about during the writing of Steering CLIP's Vision Transformer with Sparse Autoencoders, the general focus of which is leveraging CLIP’s preexisting image-text capabilities to provide automated interpretability for vision models. Throughout training, as is the case for many attempting to use SAEs, we lacked a clear manner by which to judge the quality of the SAEs we trained - L0, explained variance and MSE loss are useful from a convergence perspective, but provide little insight into how well realized the ideal utility of the resulting SAEs is - which is of course the degree to which it separates the... (read 918 more words →)

Hypothesis on Composition Circuits in Vision Transformers

phenomanon

Idea

I just wanted to make a quick post to get this idea out there, I'll hope to do a more thorough explanation later, and experiments after that.

Basically, I hypothesize that Vision Transformers have mechanisms that I call composition circuits, which operate like induction circuits in text transformers. They begin with a mechanism akin to the previous token head in text transformers. I call this a modular attention head. It works like the previous token head, except it attends to patch embeddings (~aka "tokens") around the candidate token in multiple directions around the token of interest. We know heads like this exist in early ViT layers. We don't necessarily know they perform the... (read 619 more words →)

Replying toEfficient Dictionary Learning with Switch Sparse Autoencoders

phenomanon2y

Efficient Dictionary Learning with Switch Sparse Autoencoders

Thank you very much for your reply - I appreciate the commentary and direction

Replying toEfficient Dictionary Learning with Switch Sparse Autoencoders

phenomanon2y

Efficient Dictionary Learning with Switch Sparse Autoencoders

Hi Lee, if I may ask, when you say "geometric analysis" of the router, do you mean analysis of the parameters or activations? Are there any papers that perform the sort of analysis you'd like seen done? Asking from the perspective of someone who understands nns thoroughly but is new to mechinterp.

Replying toEfficient Dictionary Learning with Switch Sparse Autoencoders

phenomanon2y

Efficient Dictionary Learning with Switch Sparse Autoencoders

Thank you for the answer, that makes more sense.

Replying toEfficient Dictionary Learning with Switch Sparse Autoencoders

phenomanon2y

Efficient Dictionary Learning with Switch Sparse Autoencoders

For a batch with $T$ activations, we first compute vectors $f \in R^{N}$ and $P \in R^{N}$ . $f$ represents what proportion of activations are sent to each expert

Hi, I'm not exactly sure where f fits in here. In Figure 1/section 2.2, it seems like x is fed into the router layer, which produces a distribution over the N experts, from which the "best expert" is chosen. I'm not sure where the "proportion of activations" is in that process. To me that sounds like it's describing something that would be multiplied by x before it's fed into an expert, but I don't see that reflected in the diagram or described in section 2.2.

LESSWRONG
LW

LESSWRONG
LW

phenomanon

phenomanon

From Thermodynamics to Sora: A Comprehensive Introduction to Denoising Diffusion for Video Generation

Quantifying SAE Quality with Feature Steerability Metrics

Hypothesis on Composition Circuits in Vision Transformers

phenomanon

phenomanon

phenomanon

From Thermodynamics to Sora: A Comprehensive Introduction to Denoising Diffusion for Video Generation

Quantifying SAE Quality with Feature Steerability Metrics

Hypothesis on Composition Circuits in Vision Transformers

Background and Intro

Introducing the Steerability Metric

Steering CLIP's Vision Transformer with Sparse Autoencoders

Idea