Last December, the Institute for Protein Design dropped RFDiffusion3, a protein design model that operates at the level of individual atoms. Before the AIs figure out how to use it to craft mirror life bacteria and kill everyone, I wanted to understand its architecture and do a mini exploration on...
TLDR: Sparse Autoencoders (SAEs) trained on protein folding and design models find features correlated with virulent proteins, while logistic regression probes trained on both SAE encoded and raw model activations approach SOTA classifiers on virulent vs benign proteins Abstract Protein design and folding models are powerful tools that could be...
And so are you! When you were a fetus, you were sending millions of your cells through the placenta into your mom. And she was sending her cells into you, although to a lesser degree. These cells made themselves right at home, differentiating into heart, blood, and even brain cells....
TLDR; SAEs can complement and enhance LLM as a Judge scalable oversight for uncovering hypotheses over large datasets of LLM outputs paper Abstract > Large language models (LLMs) are increasingly trained in long-horizon, multi-agent environments, making it difficult to understand how behavior changes over training. We apply pretrained SAEs, alongside...
Do AIs feel anything? It's hard to tell, but interpretability can give us some clues. Using Anthropic's persona vectors codebase, we extracted 7 vectors from Qwen3-14 B representing joy, love, sadness, surprise, disgust, fear, and anger. During inference, we remove the correlated directions between each emotion, project the activations from...
I was using neuronpedia's steering feature and was curious: How much does it cost to run? How does one do all the networking and expose the endpoints to the internet with a fancy domain? The plan: 1. Make a project with a small open weight model 2. Choose a GPU...
LLMs are so boring, corporate, and sane these days. What if we could control the emotions of LLMs to be more interesting? The plan is: 1. Use Anthropic's persona vectors codebase to generate steering vectors for different emotions 2. Use Easysteer to serve a chat endpoint that exposes activations gathering...