x
Vector-Valued Reinforcement Learning — LessWrong