Vector-Valued Reinforcement Learning — LessWrong