Inner Alignment

jimv (+10/-8) Corrected typo in "designed" and added scare quotes around it.
Ben Pace (+826)

Inner Alignment is the problem of ensuring mesa-optimizers (i.e. when a trained ML system is itself an optimzer) is aligned with the objective funcition of the training process. As an example, evolution is an optimization force that itself desifned'designed' optimizers (humans) to achieve its goals. However, humans do not primarily maximise reproductive success, they instead use birth control and then go out and have fun. This is a failure of inner alignment.