Optimization, loss set at variance in RL — LessWrong