Reinforcement Learning: A Non-Standard Introduction (Part 2) — LessWrong