Policy Entropy, Learning, and Alignment (Or Maybe Your LLM Needs Therapy) — LessWrong