Reinforcement learning with imperceptible rewards — LessWrong