In The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models, we study giving LLMs the option to end chats, and what they choose to do with that option. This is a linkpost for that work, along with a casual discussion of my favorite findings. I...
Epistemic Status: I thought of this yesterday and it seems plausible, but my initial attempts couldn't get it to work (as best as I could tell) and I don't have more time to commit to this. Anyone can feel free to try it, no need to credit me if it...
The following post was made as part of Danielle's MATS work on doing circuit-based mech interp on Mamba, mentored by Adrià Garriga-Alonso. It's the first in a sequence of posts about finding an IOI circuit in Mamba/applying ACDC to Mamba. This introductory post was also made in collaboration with Gonçalo...
There are some common biases that are often used to discount AI progress. We should keep these in mind, as they can prevent us from having an objective understanding of progress in the field. I'm going to use AI here instead of ML because usually these biases are relevant to...