Thoughts on reward engineering — LessWrong