[CS 2881r] Some Generalizations of Emergent Misalignment
This work was done as an experiment for Boaz Barak’s “CS 2881r: AI Safety and Alignment” at Harvard. The lecture where this work was presented can be viewed on YouTube here, and its corresponding blogpost can be found here. TL;DR: Building on Turner et al. (2025)'s work on model organisms...
Sep 14, 202512