Values Weren't Complex, Once.
Oversight of Unsafe Systems via Dynamic Safety Envelopes
Collaboration-by-Design versus Emergent Collaboration
Multi-Agent Overoptimization, and Embedded Agent World ModelsΩ
Policy Beats Morality
(Some?) Possible Multi-Agent Goodhart Interactions
Lotuses and Loot Boxes
Non-Adversarial Goodhart and AI Risks
[Link]Evidence as Rhetoric — Normative or Positive?
[Link]A Short Explanation of Blame and Causation