The Hot Mess Paper Conflates Three Distinct Failure Modes
High-level summary: Anthropic's recent "Hot Mess of AI" paper makes an important empirical observation: as models reason longer and take more actions, their errors become more incoherent rather than more systematically misaligned. They use a bias-variance decomposition to show this, and conclude that we should worry relatively more about reward...
Mar 2119