[CS2881r] Optimizing Prompts with Reinforcement Learning — LessWrong