Some motivations to gradient hack — LessWrong