Code: github.com/ElleNajt/controllability tldr: Yueh-Han et al. (2026) showed that models have a harder time making their chain of thought follow user instruction compared to controlling their response (the non-thinking, user-facing output). Their CoT controllability conditions require the models’ thinking to follow various style constraints (e.g. write in lowercase, avoid a...
Code: https://github.com/ElleNajt/UnmonitoredExternalAgents Overview AI Control at AI companies usually focuses on monitoring internal AI deployments because this is where a lot of sensitive resources are (model weights, training datasets, applied safety research, monitoring systems, etc.), and for AIs that are not very superhuman, access to these resources makes causing a...
Code: github.com/ElleNajt/Steganography_Wiretapping | Data: huggingface.co/datasets/lnajt/steganography-wiretapping Play the decoding game: can you eavesdrop on Claude Opus 4.6? tldr of post Frontier models (Opus and Gemini Pro) can agree on Schelling steganography schemes with significant advantage against weaker eavesdroppers, given just the knowledge of the alphabet and the game that the encoder...
Notes before reading: * This is an art project. Just jump around and listen to music, read the words if you’re inclined to, and play with vibe-duet on GitHub if you’re inspired to. * Click the links to be taken to Strudel, where you can play the pieces. Alternatively, the...