This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
MATS Program
•
Applied to
Language Models Model Us
by
eggsyntax
3d
ago
•
Applied to
MATS Winter 2023-24 Retrospective
by
Rocket
10d
ago
•
Applied to
Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers
by
Ryan Kidd
16d
ago
•
Applied to
Mechanistically Eliciting Latent Behaviors in Language Models
by
TurnTrout
20d
ago
•
Applied to
Transcoders enable fine-grained interpretable circuit analysis for language models
by
Jacob Dunefsky
20d
ago
•
Applied to
End-to-end hacking with language models
by
tchauvin
1mo
ago
•
Applied to
Ophiology (or, how the Mamba architecture works)
by
Danielle Ensign
2mo
ago
•
Applied to
My MATS Summer 2023 experience
by
James Chua
2mo
ago
•
Applied to
Understanding SAE Features with the Logit Lens
by
Joseph Bloom
2mo
ago
•
Applied to
MATS AI Safety Strategy Curriculum
by
Ryan Kidd
2mo
ago
•
Applied to
We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To
by
robertzk
3mo
ago
•
Applied to
Implementing activation steering
by
Annah
3mo
ago
•
Applied to
Attention SAEs Scale to GPT-2 Small
by
robertzk
4mo
ago
•
Applied to
How important is AI hacking as LLMs advance?
by
Artyom Karpov
4mo
ago
•
Applied to
Uncertainty in all its flavours
by
Cleo Nardo
4mo
ago
•
Applied to
Sparse Autoencoders Work on Attention Layer Outputs
by
robertzk
4mo
ago
•
Applied to
Case Studies in Reverse-Engineering Sparse Autoencoder Features by Using MLP Linearization
by
Jacob Dunefsky
5mo
ago
•
Applied to
Steering Llama-2 with contrastive activation additions
by
TurnTrout
5mo
ago
•
Applied to
Interview: Applications w/ Alice Rigg
by
jacobhaimes
5mo
ago