Inner Alignment

Discuss the wiki-tag on this page. Here is the place to ask questions and propose changes.
New Comment
2 comments, sorted by

I'm not actually sure about the difference here between this tag and Mesaoptimizers

I'm guessing the distinction was intended to be:

  • Mesa-Optimizers: Under what condition do mesa-optimizers arise, and how can we detect or prevent them (if we want to, and if that's possible)?
  • Inner Alignment: How do you cause mesa-optimizers to have the same goal as the base optimizer? (Or maybe, more generally, how do you cause mesa-optimizers to have good desired properties?)

Or 'Inner Alignment' is meant to be a subcategory of 'Mesa-Optimizers'?