Ensemble monitoring for AI control: diverse signals outweigh more compute
by Yejun Y., Sam Tetef, eugenekoran, Benjamin Arnav, and Pablo Bernabeu Perez
tl;dr What improves LLM monitoring besides leveraging more compute? Leveraging more diverse compute. In the AI control setting of Greenblatt et al. (2024), we build 12 monitors via various prompting and fine-tuning approaches; we then study different ways to compose them into ensembles. We evaluate our ensembles on the APPS...
May 3112