[day 5/7 - epistemic status: longtermism apology form, having a moment] Voyager 1 was launched on September 5, 1977. Its mission was to study the very edges of the solar system, and then go gentle into that good night. As it was drifting away into the vast unknown, Sagan begged...
[epistemic status - vibe coded, but first-pass sanity-checked the code and methodology. Messy project, take results with grain of salt. See limitations/footnotes] Done as a mini-project for Neel Nanda's MATS exploration stream Github repo - Gemma 2-9B-it model organism - Gemma 3-27B-it model organism Abstract Neural chameleons showed that models...
Crosspost from substack So, if you haven’t heard the news, America deposed Maduro, and a fresh polymarket account bet on it happening and made ~$320k in profit. Seems suspicious right! And yeah! The media, expectedly, went hogwild. And look, I agree, I smell the same stench all of you do,...
This doubles as my Neel stream MATS application, figured I would crosspost it to LW because the results are interesting EDIT: Got accepted! :) Executive summary What problem am I trying to solve?/TLDR Activation oracles (iterating on LatentQA) are an interpretability technique, capable of generating natural language explanations about model...
Crosspost from my substack [Epistemic status - speculative, but sort of grounded, might be wrong - don’t take too seriously] Alternative title: why Gemini 3 pro gets to be so big Bit more technical than usual but want to try out writing technical articles, thank you to @bycloud on twitter...
Cross-post from my substack (read I am the Open Source Woman and Writers block for full context for this one) (not quite LessWrong quality maybe but I thought it was interesting enough of an experiment to post regardless) Introduction I wrote down every single thought I had for 2 days...
You’ve been there before. You start talking to a friend and before you even consciously realize it you have steered the conversation towards the vegan question. You’ve asked them to name the trait: “that which what is true of animals that is not true of humans that makes it okay...