x
Reinforcement learning with imperceptible rewards — LessWrong