Learning biases and rewards simultaneously — LessWrong