How is reinforcement learning possible in non-sentient agents? — LessWrong