|jimv||v1.1.0Nov 1st 2020||(+10/-8) Corrected typo in "designed" and added scare quotes around it.|
|Ben Pace||v1.0.0Jul 17th 2020||(+826)|
Inner Alignment is the problem of ensuring mesa-optimizers (i.e. when a trained ML system is itself an optimzer) is aligned with the objective funcition of the training process. As an example, evolution is an optimization force that itself
desifned optimizers (humans) to achieve its goals. However, humans do not primarily maximise reproductive success, they instead use birth control and then go out and have fun. This is a failure of inner alignment.