Iterative Finetuning is Mostly Idempotent
This is a summary of a paper we and our collaborators at the University of Chicago recently arXiv-ed. tl;dr: We seed models with some property (e.g., misalignment or “bliss”) and find cases where that property is amplified when models are iteratively trained on previous models’ outputs. However, this phenomenon is...