Inoculation Prompting: Open Questions and My Research Priorities
Overview I'm currently an ERA fellow researching ways to improve inoculation prompting as a technique against emergent misalignment. It's one of the few alignment techniques that works against EM, and I think it has potential to be generalised further. There's lots of experiments I won't have time to run, and...
Feb 158