Paper: Identifying the Risks of LM Agents with an LM-Emulated Sandbox - University of Toronto 2023 - Benchmark consisting of 36 high-stakes tools and 144 test cases! — LessWrong