This is a linkpost to our working paper “Towards AI Standards Addressing AI Catastrophic Risks: Actionable-Guidance and Roadmap Recommendations for the NIST AI Risk Management Framework”, which we co-authored with our UC Berkeley colleagues Jessica Newman and Brandie Nonnecke. Here are links to both Google Doc and pdf options for accessing our working paper:
- Google Doc (56 pp, last updated 16 May 2022)
- pdf on Google Drive (56 pp, last updated 16 May 2022)
- pdf on arXiv (not available yet, planned for a later version)
We seek feedback from readers considering catastrophic risks as part of their work on AI safety and governance. It would be very helpful if you email feedback to Tony Barrett, or share a marked-up copy of the Google Doc with Tony, at firstname.lastname@example.org.
If you are providing feedback on the draft guidance in this document, in addition to any comments via email or Google Docs, it would be particularly helpful if you answer the questions in Appendix 2 of this document or in the following Google Form:
Feedback by May 31, 2022 would be most helpful! (We will also appreciate feedback after that!)
We may update the links or content in this post to reflect the latest version of the document.
Background on the NIST AI RMF
The National Institute of Standards and Technology (NIST) is currently developing the NIST Artificial Intelligence Risk Management Framework, or AI RMF. NIST intends the AI RMF as voluntary guidance on AI risk assessment and other AI risk management processes for AI developers, users, deployers, and evaluators. NIST plans to release Version 1.0 of the AI RMF in early 2023.
As voluntary guidance, NIST would not impose “hard law” mandatory requirements for AI developers or deployers to use the AI RMF. However, AI RMF guidance would be part of “soft law” norms and best practices, which AI developers and deployers would have incentives to follow as appropriate. For example, insurers or courts may expect AI developers and deployers to show reasonable usage of relevant NIST AI RMF guidance as part of due care when developing or deploying AI systems in high-stakes contexts, in much the same way that NIST Cybersecurity Framework guidance can be used as part of demonstrating due care for cybersecurity. In addition, elements of soft-law guidance are sometimes adapted into hard-law regulations, e.g., by mandating that particular industry sectors comply with specific standards.
Summary of our Working Paper
In this document, we provide draft elements of actionable guidance focused primarily on identifying and managing risks of events with very high or catastrophic consequences, intended to be easily incorporated by NIST into the AI RMF. We also provide our methodology for development of our recommendations.
We provide actionable-guidance recommendations for AI RMF 1.0 on:
- Identifying risks from unintended uses and misuses of AI systems
- Including potential catastrophic-risk factors within the scope and time frame of risk assessments and impact assessments
- Identifying and mitigating human rights risks
- Reporting information on AI risk factors including catastrophic-risk factors
We also provide recommendations on additional issues for NIST to address as part of the roadmap for later versions of the AI RMF or supplementary publications, on the grounds that they are critical topics but appropriate guidance development would take additional time. Our recommendations for the AI RMF roadmap include:
- Creating an AI RMF Profile providing supplementary guidance for cutting-edge increasingly general-purpose AI. For development of such AI, examples of actionable guidance could include: only increase compute for AI model training incrementally, and use red-teaming or other testing methods to identify emergent properties of AI models after each incremental increase in training compute.
- A comprehensive set of governance mechanisms or controls to help organizations mitigate identified risks
- Methods for characterization and measurement of the following AI system characteristics:
- Objectives specification (i.e. alignment of system behavior with designer goals)
- Generality (i.e. breadth of AI applicability/adaptability)
- Recursive improvement potential
- Other measurement/assessment tools for technical specialists testing key aspects of AI safety, reliability, robustness, etc.
Key Sections of our Working Paper
Readers considering catastrophic risks as part of their work on AI safety and governance may be most interested in the following sections:
- Section 18.104.22.168 “Map” Guidance for Scope and Time Frame of Risk Assessments and Impact Assessments (2 pp) regarding risk identification
- Section 22.214.171.124 “Measure” Guidance for Scope and Time Frame of Risk Assessments and Impact Assessments (3 pp), especially on “Factors that could lead to severe or catastrophic consequences for society”
- Section 4.1 An AI RMF Profile for Cutting-Edge, Increasingly Multi-Purpose or General-Purpose AI (6 pp) for examples of supplementary guidance for developers of large-scale machine learning models and proto-AGI systems, especially Section 4.1.2 on factors to consider in risk analysis, and Section 4.1.3 on risk mitigation steps to consider in risk management
As mentioned above, feedback to Tony Barrett (email@example.com) by May 31, 2022 would be most helpful (and we will also appreciate feedback after that). We will consider feedback as we work on revised versions. These will inform our recommendations to NIST on how best to address catastrophic risks and related issues in the NIST AI RMF, as well as our follow-on work for standards-development and AI governance forums.