Mesa-Optimizers and Over-optimization Failure (Optimizing and Goodhart Effects, Clarifying Thoughts - Part 4) — LessWrong