Mesa-Optimizers: Under what condition do mesa-optimizers arise, and how can we detect or prevent them (if we want to, and if that's possible)?
Inner Alignment: How do you cause mesa-optimizers to have the same goal as the base optimizer? (Or maybe, more generally, how do you cause mesa-optimizers to have good desired properties?)
Or 'Inner Alignment' is meant to be a subcategory of 'Mesa-Optimizers'?
I'm not actually sure about the difference here between this tag and Mesaoptimizers
I'm guessing the distinction was intended to be:
Or 'Inner Alignment' is meant to be a subcategory of 'Mesa-Optimizers'?