This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Transformer Circuits
•
Applied to
SAEs (usually) Transfer Between Base and Chat Models
by
Connor Kissane
8d
ago
•
Applied to
Arrakis - A toolkit to conduct, track and visualize mechanistic interpretability experiments.
by
Yash Srivastava
10d
ago
•
Applied to
An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2
by
Neel Nanda
19d
ago
•
Applied to
Logit Prisms: Decomposing Transformer Outputs for Mechanistic Interpretability
by
ntt123
1mo
ago
•
Applied to
"What the hell is a representation, anyway?" | Clarifying AI interpretability with tools from philosophy of cognitive science | Part 1: Vehicles vs. contents
by
IwanWilliams
2mo
ago
•
Applied to
Finding Backward Chaining Circuits in Transformers Trained on Tree Search
by
abhayesian
2mo
ago
•
Applied to
Can quantised autoencoders find and interpret circuits in language models?
by
charlieoneill
4mo
ago
•
Applied to
Sparse Autoencoders Work on Attention Layer Outputs
by
robertzk
6mo
ago
•
Applied to
Finding Sparse Linear Connections between Features in LLMs
by
Logan Riggs
8mo
ago
•
Applied to
AISC project: TinyEvals
by
Jett
8mo
ago
•
Applied to
Polysemantic Attention Head in a 4-Layer Transformer
by
Jett
9mo
ago
•
Applied to
Graphical tensor notation for interpretability
by
Jordan Taylor
10mo
ago
•
Applied to
Interpreting OpenAI's Whisper
by
Neel Nanda
10mo
ago
•
Applied to
Automatically finding feature vectors in the OV circuits of Transformers without using probing
by
Jacob Dunefsky
10mo
ago
•
Applied to
An adversarial example for Direct Logit Attribution: memory management in gelu-4l
by
Can
11mo
ago
•
Applied to
Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy
by
Neel Nanda
11mo
ago
•
Applied to
An Interpretability Illusion for Activation Patching of Arbitrary Subspaces
by
Alex Makelov
11mo
ago
•
Applied to
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
by
Neel Nanda
1y
ago