X-risks are a tragedies of the commons
My use of the phrase "Super-Human Feedback"
Thoughts on Ben Garfinkel's "How sure are we about this AI stuff?"
The role of epistemic vs. aleatory uncertainty in quantifying AI-Xrisk
Imitation learning considered unsafe?
Conceptual Analysis for AI Alignment
Disambiguating "alignment" and related notions
Problems with learning values from observation
Risks from Approximate Value Learning