Catastrophe Mitigation Using DRL (Appendices) — LessWrong