The anti-psychotic Q-learner trick — LessWrong