(Non-)Interruptibility of Sarsa(λ) and Q-Learning — LessWrong