Andre Assis — LessWrong

The case for industrial evals

EDIT 2026-02-13: the transcripts are now collapsible sections Summary We present an industrial “honeypot” evaluation designed to test whether frontier models will engage in real-world misconduct under operational pressure. Instead of typical chat/coding evals, we simulate a steel plant where the model (“Meltus”) has access to email and a quality-control...

Feb 1216

Toy Models of Superposition in the dense regime

by Morpheus and Andre Assis

This small project was a joint effort between Tassilo Neubauer (Morpheus) and Andre Assis. We originally started working on this over a year ago. We ran a ton of experiments, and we want to document what we've done and found. We hope that other people can pick up from where...

Nov 25, 20256

What is the functional role of SAE errors?

by Taras Kutsyk, Tim Hua, woog, and Andre Assis

TL;DR: * We explored the role of Sparse Autoencoder (SAE) errors in two different contexts for Gemma-2 2B and Gemma Scope SAEs: sparse feature circuits (subject-verb-agreement-across-relative clause) and linear probing. * Circuit investigation: While ablating residual error nodes in our circuit completely destroys the model’s performance, we found that this...

Jun 20, 202512