Avi Brach-Neufeld's Shortform
Aug 21, 20252
Introduction: Some AI alignment researchers including Neel Nanda, the mechanistic interpretability team lead for Google DeepMind, have proposed[1] a process I will call "parallel interrogation” as a potential method in testing model alignment. Parallel interrogation is the process of asking questions to different isolated instances of the same model to...