x
Aspiration-based Q-Learning — LessWrong