Update: we are not taking more applications for now.

I’m looking for concise, informative case studies on social-welfare-based standards1 for companies and products (including standards imposed by regulation).

I think case studies could help a lot with making AI safety standards work.

This post outlines:

  • Some quick background on the state of (and hopes for) AI safety standards. More
  • Why I think case studies can be useful. More
  • Some specific standards that seem especially relevant (more), and guidance on the case studies I’m seeking (more).
  • How to apply for funding to do case studies. I’m running the request for proposals, with funding from Open Philanthropy. More

Some quick background on AI safety standards

The basic idea of AI safety standards would be:

  • Someone publishes an AI safety standard: a set of guidelines that, if followed by AI companies, could reduce risks of harm from AI. “Someone” could be a government agency (like NIST), an independent nonprofit, an industry association, etc. The standard might have conditions along the lines of: “Test to see whether your AI system can do dangerous thing __, using method __. If so, don’t deploy it until safety measures __ and __ (for example, alignment techniques or security measures) have been taken.” (Alternatively, it might only have conditions around assessment and disclosure of risks.)
  • If the standard is compelling - e.g., providing both protectiveness (following the standards would significantly reduce important risks) and practicality (an AI company could follow the standards without needing to compromise its business more than clearly necessary to reduce the risks) - then AI companies might choose to announce their intention to abide by it, for a number of reasons. These reasons could include:
    • They think the guidelines are worth following on the merits and intend to do so, and they want this to be clear to everyone (customers, investors, employees, etc.)
    • They want to get good PR, avoid bad PR, and/or strengthen their defenses against potential lawsuits.
    • Important employees, customers, investors, etc. prefer that they follow the standard.
    • They want to create momentum for the standard in the hopes that their competitors will also follow it, causing them to face fewer tough tradeoffs in terms of “Deploying system X might be risky to society, but if we don’t do it, our competitors might.”
    • They view regulation as inevitable, and hope it will be based on protective+practical standards.
  • Once AI companies have announced their intention to abide by standards, choosing not to do so could become more costly (e.g., bad PR).
  • Over time, standards could be revised to remain protective and practical (or become more so), while also becoming more rigorously enforced - laying the groundwork for national and even international regulation (a bit more here).

If something like this happened, it could (a) make dangerous AI deployments more costly and less likely; (b) reduce “race dynamics” in which companies have to choose between releasing dangerous models and fearing that their competitors will do so; (c) increase incentives for alignment research and other danger-reducing measures (since these things, if done well, might allow companies to release powerful systems while staying in compliance with standards).

One of the things that appeals to me about this general model is that there is plenty of precedent for similar models in other industries. It’s common for companies to voluntarily follow social-welfare-oriented standards established by third parties, aiming - through compliance with standards - to increase confidence in the social responsibility of their work. Sometimes these standards are quite detailed and take a lot of work and/or expense both to create and follow. And there’s also precedent for initially voluntary standards to end up codified in regulation.

Some relevant examples include farm animal welfare standards (governing how animals are treated on farms), environmental standards (governing companies’ environmental impacts), security standards (governing e.g. how customer data is protected), safety standards (for airplanes, wetlabs and more), and financial standards aimed at e.g. preventing a bank collapse. More below.

Some ways case studies can be useful

There’s a lot of interest in AI safety standards right now, and I’m encountering a lot of differences of opinion on questions like:

  • Who should be in charge of drafting and revising standards? Industry associations? Independent nonprofits?
  • What sorts of people should and shouldn’t be looped in heavily for input?
  • Generally, how complex, onerous and/or expensive can a standard be and still command wide adoption?
  • Should standards require outside evaluations to check particular claims made by companies (for example, “Our model doesn’t have dangerous capability X”)?
    • If so, what sorts of organizations can do these evaluations, and what measures can be taken to make it more likely that these organizations (a) are truly neutral and arm’s-length; (b) are able to understand what’s going on at the companies well enough to do accurate evaluations?
  • What are major factors in whether standards become widely adopted or not?
  • Is there much hope (much precedent?) for voluntary industry-adopted standards having an impact on later regulatory frameworks?
  • For those hoping to create standards, what are things they should be making sure to do or not do? What aren’t they thinking of by default, in terms of potential challenges and potential solutions?

I think that studying cases of existing widely-adopted standards can shed a lot of light on how these questions have been answered in other cases (both successful and unsuccessful).

They can thus inform the strategies taken by people looking to write or help shape safety standards that are both highly protective and widely adopted.

So far I’ve done one mini-case-study: a case study on farm animal welfare standards based on a conversation with Lewis Bollard. I’ve picked up a number of things from this that may be useful to people working on standards, such as:

  • There are many competing standards, though the first draft of each standard may be disproportionately important.
    • There are a number of different animal welfare standards. Some are essentially about codifying what’s already common in industry, and have near-universal adoption; others are maintained by independent animal-welfare-oriented nonprofits, have a higher bar, and have lower participation.
    • Each standard is arguably somewhat “anchored” in terms of how high a bar it’s setting and how much participation it’s seeking. (That is, when the standard is being revised, the people doing the revisions are likely aiming to maintain roughly the level of participation they already have.) With this in mind, the first version of a standard - and decisions about how high to set the bar - could be crucially important. (But it’s not the case that once one standard exists, it’s too late to create a competing one.)
  • Activism and advocacy can be important for standards adoption. Lewis believes pressure from activists - including “outside game” activists that hold protests and don’t participate in standards creation - has been important in a large wave of companies agreeing to higher-welfare standards over the last decade.
  • There are a number of pressures toward compliance once a company has signed on - even if there are no formal audits or external checks. For example, if a company says it’s complying with a standard, but isn’t, this might be revealed by a whistleblower or undercover investigation, and could constitute consumer fraud and/or securities fraud.
  • Lewis suggested holding “listening meetings” with companies and other potentially affected parties before standards development gets too far along, so that (a) they feel included in the process and (b) their concerns can be considered from the beginning. I’ve passed this suggestion on to people working on standards.

I think case studies can also help us a lot with the general problem that we don’t know what we don’t know.

  • As a general matter, I think it’s very hard to design something like standards from first principles alone. I expect there will be lots of difficult-to-anticipate challenges.
  • If we study standards from other industries, we get the opportunity to learn about challenges we might not have thought of - and solutions that might have taken decades to iron out.
  • We’ll need to apply judgment to using this information, since no other case is a perfect analogy for AI safety, but I think having the information would be a big plus.

My impression is that standards often take a very long time to take shape and gain wide adoption. If we want to “speedrun” this process due to the possibility of transformative AI being developed soon, learning as quickly and thoroughly as possible how things have worked elsewhere seems important.

Narrowing down standards to learn about

There are an enormous number of standards out there (ISO alone maintains almost 25,000). I’m especially interested in cases that share some key properties with potential AI safety standards. In particular:

I’m interested in intense standards for high-stakes applications. Some standards are relatively lightweight (e.g., international food standards); higher-stakes standards tend to be more intense, and I think the latter will be most appropriate for potentially transformative AI systems.

Example high-stakes standards: biosafety standards (see BMBL as well as the Federal Select Agency Standards); nuclear safety standards (e.g., IAEA’s); safety standards for chemical producers; airline safety standards; and standards and regulations that the FDA imposes on drugs.

I’m interested in standards that involve complex, sometimes creative risk assessment and/or intense, even adversarial auditing. Some standards seem straightforward to observe and verify (farm animal welfare standards are an example); I don’t think we can count on this being the case for AI, where it can take a lot of knowledge and creativity to answer questions like “What dangerous activities is this AI system really capable of?”

I think financial regulation and financial standards (e.g., FINRA) are a promising place to look for this sort of thing, since financial risks are often hard to understand and assess. (I’m told that in some cases, regulators are embedded within a financial company, going to work every day in the company’s office; also see this interesting Twitter thread arguing that the bank supervision model is promising for AI.)

Some other promising categories:

  • Cybersecurity standards such as NERC CIP, SOC2 and ISO 27034.
  • Some environmental standards have some of this quality. For example, down sourcing standards such as Downpass sometimes emphasize the thoroughness and intensity of audits; SA8000 instructs auditors to “conduct off-site interviews with trade union organisations, NGOs and dismissed workers to assess worker treatment.”
  • The Fair Labor Association’s Workplace Code of Conduct has been cited to me as an example of a standard with intense monitoring (more here).

I’m interested in standards that are more complex than just “checklists.” Most standards are something like: “You meet the standard if and only if the following things are all true of your company/product.” But I think AI safety standards might have to involve more complex conditions, like: “If an AI strongly demonstrates dangerous property X, then mitigation measures ___ are required; if it only weakly demonstrates dangerous property X, then lesser mitigation measures ____ are required.”

Here again financial regulations might be useful, for example the Large Financial Institution Rating System.

Institutional Review Boards might be useful as well, and have some other parallels as well (e.g., they are required before performing research).

I’m interested in standards that are motivated by non-monetized social welfare.

  • Many standards are about quality assurance (e.g., helping customers know what they’re getting) or interoperability (e.g., making sure that different products are compatible with each other). There’s a straightforward profit incentive to work on such standards.
  • I don’t think those kinds of motives will be enough to drive AI safety standards that protect against global catastrophic risks. I’m particularly interested in social-welfare-based standards that companies adopt (sometimes under pressure) in order to show social responsibility.
  • Examples include farm animal welfare standards (see my case study); the SA8000 social certification program; Fair Trade; the Eco-Management and Audit Scheme; and many other environmental standards.

All else equal, standards for things that are more similar to AI are better. (E.g., software is probably better to examine than food, although other factors here could outweigh this.)

I’m interested in failure stories, not just success stories. A good example might be bond credit ratings: third-party certifiers of creditworthiness came to play an important role in the economy, but they failed to correctly assess creditworthiness (when accounting for e.g. systemic risk), leading some institutions that were supposed to be conservative to take on too much risk (more).

I’m especially interested in private/voluntary standards, and even more especially in cases where private/voluntary standards helped shape later regulation, though I’m not exclusively interested in these (some of the examples above are regulation-backed standards).

What I’m looking for in case studies

I’m looking for case studies that:

  • Explore a standard, or other case of regulation or self-regulation, that is interestingly analogous to AI safety standards. Most of the standards I linked to in the previous section are standards I’d probably be interested in case studies of (though there are probably lots of interesting ones I didn’t list!)
  • Start with a very clear description of exactly how the standard works today (or worked in its heyday), with links to detailed documents laying out what the standard is and how it’s enforced.
  • Answer questions such as these (though it’s not necessary to cover all of these comprehensively):
    • What’s the history of the standard? How did it get started?
    • How is the standard implemented today? Who writes it and revises it, and what does that process look like?
    • How did we get from the beginnings to where we are today?
    • If a standard aims to reduce risks, to what extent did the standard get out ahead of/prevent risks, as opposed to being developed after relevant problems had already happened?
    • How involved are/were activists/advocates/people who are explicitly focused on public benefit rather than profits in setting standards? How involved are companies? How involved are people with reputations for neutrality?
    • Are there audits required to meet a standard?
      • If so, who does the audits, and how do they avoid being gamed?
      • How much access do they get to the companies they’re auditing?
      • How good are the audits? How do we know?
      • What other measures are taken to avoid standards being “gamed” and ensure that whatever risks they’re meant to protect against are in fact protected against?
    • What sorts of companies (and how many/what percentage of relevant companies) comply with what standards, and what are the major reasons they do so?
    • How costly and difficult is it to comply with the standards?
    • What happens if a company stops complying?
    • Does the standard currently seem to achieve its intended purpose? To the extent it seeks to reduce risks, is there a case that it’s done so?
    • Was there any influence of early voluntary standards on later government regulation?
  • Are very strong on reasoning transparency, providing citations and key quotes for all key claims.
  • Are very clear and easy to navigate. It should be possible to pick up the key takeaways from a case study in 1-3 pages, easily find more detail on any particular key takeaway, and easily find answers to the key questions above. I expect good case studies to be 10-50 pages in general; for longer case studies it’s especially important to meet this criterion.

Other projects I might be interested in

I’m also interested in writeups that look for patterns across a large number of standards. Example topics include:

  • What would be some interesting standards to do case studies on (that weren’t already listed in this post), and why would they be interesting?
  • How widespread are safety standards generally?
  • Are there any generalizations we can make about the answers to the questions above for a wide range of standards?
  • What are the most interesting points from a particular extensive book or other writeup on standards? (Examples in footnote.2)

In general, feel free to use the form below to pitch me on any analysis you think could be useful, although I expect to be most likely to support analysis that is heavily about learning from past/existing cases (rather than about making abstract arguments).

Who can do case studies, and how can they find the relevant information?

I don’t think you need to be a subject-matter expert to do a good case study. You just need to be able to find the relevant information about how a standard works, how the process for maintaining it works, etc. This could be by:

  • Googling around, reading books and papers, etc. (I am unsure of whether all the relevant questions can be answered in this way.)
  • Interviewing knowledgeable people. My case study on farm animal welfare is based on a conversation with Lewis Bollard; I personally knew very little about the topic before we spoke about it.
  • I’ve found large language models quite helpful for this work as well. This may be because there’s often a lot of information about them somewhere online, but it’s not always easy to find. I’d suggest trying them out as research tools, though I’d prefer that any citations are to more reliable sources. (Sometimes, when directly prompted to do so, language models can produce good citations for their claims; sometimes they can’t, and sometimes their claims appear incorrect.3)

How to participate

Please use this form to (a) let me know about your interest in doing a case study or other writeup; (b) apply for funding to support the work. My basic default is to offer funding for up to 50 hours per case study, with room for negotiation in special circumstances. The rate of pay will be at least $75/hour for all approved cases, and could be higher (the form submission asks for information on this).

In any cases where I favor providing funding, I’ll make the recommendation to Open Philanthropy to do so.

If I get multiple proposals to study the same thing, I will probably do something to avoid redundancy (e.g., email the parties in question so they’re aware of overlapping efforts). This is a reason to use the form even if you’re not seeking funding.

I may also occasionally update this post to note whether some topics seem likely to already be well-covered.

Got ideas for more case studies?

Please share them in the comments! I’ve found that a lot of people happen to know of standards that are interestingly analogous to AI safety standards. Some guidance on how to look for such analogies is above.

For this post I talked to a number of people to get ideas on what good case studies might be, and on how some particular standards work. I’m grateful to Daniela Amodei, Sam Bell, Alexander Berger, Lewis Bollard, Alexis Carlier, Rocco Casagrande, Ben Garfinkel, Jonathan Gleklen, Mindy James, Richard Korzekwa, Jade Leung and Piers Millett for help and/or suggesting good example standards to learn about. These folks shouldn’t be seen as responsible for the content of the post.

Notes

New Comment
9 comments, sorted by Click to highlight new comments since: Today at 6:12 PM

Excited to see this! I'd be most excited about case studies of standards in fields where people didn't already have clear ideas about how to verify safety.

In some areas, it's pretty clear what you're supposed to do to verify safety. Everyone (more-or-less) agrees on what counts as safe.

One of the biggest challenges with AI safety standards will be the fact that no one really knows how to verify that a (sufficiently-powerful) system is safe. And a lot of experts disagree on the type of evidence that would be sufficient.

Are there examples of standards in other industries where people were quite confused about what "safety" would require? Are there examples of standards that are specific enough to be useful but flexible enough to deal with unexpected failure modes or threats? Are there examples where the standards-setters acknowledged that they wouldn't be able to make a simple checklist, so they requested that companies provide proactive evidence of safety?

One of the biggest challenges with AI safety standards will be the fact that no one really knows how to verify that a (sufficiently-powerful) system is safe. And a lot of experts disagree on the type of evidence that would be sufficient.

While overcoming expert disagreement is a challenge, it is not one that is as big as you think. TL;DR: Deciding not to agree is always an option.

To expand on this: the fallback option in a safety standards creation process, for standards that aim to define a certain level of safe-enough, is as follows. If the experts involved cannot agree on any evidence based method for verifying that a system X is safe enough according to the level of safety required by the standard, then the standard being created will simply, and usually implicitly, declare that there is no route by which system X can comply with the safety standard. If you are required by law, say by EU law, to comply with the safety standard before shipping a system into the EU market, then your only legal option will be to never ship that system X into the EU market.

For AI systems you interact with over the Internet, this 'never ship' translates to 'never allow it to interact over the Internet with EU residents'.

I am currently in the JTC21 committee which is running the above standards creation process to write the AI safety standards in support of the EU AI Act, the Act that will regulate certain parts of the AI industry, in case they want to ship legally into the EU market. ((Legal detail: if you cannot comply with the standards, the Act will give you several other options that may still allow you to ship legally, but I won't get into explaining all those here. These other options will not give you a loophole to evade all expert scrutiny.))

Back to the mechanics of a standards committee: if a certain AI technology, when applied in a system X, is well know to make that system radioactively unpredictable, it will not usually take long for the technical experts in a standards committee to come to an agreement that there is no way that they can define any method in the standard for verifying that X will be safe according to the standard. The radioactively unsafe cases are the easiest cases to handle.

That being said, in all but the most trivial of safety engineering fields, there is a complicated epistemics involved in deciding when something is safe enough to ship, it is complicated whether you use standards or not. I have written about this topic, in the context of AGI, in section 14 of this paper.

I agree that, at least for the more serious risks, there doesn't seem to be consensus on what the mitigations should be.

For example, I'd be interested to know what proportion of alignment researchers would consider an AGI that's a value learner (and of course has some initial model of human values created by humans to start that value learning process from) to have better outer-alignment safety properties that an AGI with a fixed utility function created by humans.

For me it very clear that the former is better, as it incentivizes the AGI to converge from its initial model of human values towards true human values, allowing it to fix problems when the initial model, say, goes out-of-distribution or doesn't have sufficient detail. But I have no idea how much consensus there is on this, and I see a lot of alignment researchers working on approaches that don't appear to assume that the AI system is a value learner.

My suspicion is the most instructive cases to look at (Modern AI really is too new a field to have much to go on in terms of mature safety standards) is how the regulation of Nuclear and Radiation safety has evolved over time. Early research suggested some serious X-Risks that didn't pan out for either scientific (igniting the atmosphere) or logistical/political reasons (cobalt bombs, tsar bomba scale H bombs) thankfully, but some risks arising more out of the political domain (having big gnarly nuclear war anyway) still exist that could certainly make it a less fun planet to live on. I suspect the successes and failures of the nuclear treaty system could be instructive here with the push to integrate big AI into military heirachies, as regulating nukes is something almost everyone agrees is a very good idea, but have had a less than stellar history of compliance.

They are likely out of scope for whataever your goal is here, but I do think they need serious study because without it, our attempts at regulation will just push unsafe AI to less savory juristictions.

This seems great!

One additional example I know of, which I do not have personal experience with but know that a lot of people do have experience with, is compliance with PCI DSS (for credit card processing). Which does deal with safety in an adversarial setting where the threat model isn't super clear.

(my interactions with it look like "yeah that looks like a lot and we can outsource the risky bits to another company to deal with? great!")

A high-level theme that would be interesting to explore here is rules-based vs. principles-based regulation. For example, the UK financial regulators are more principles-based (broad principles of good conduct, flexible and open to interpretation). In contrast, the US is more rules-based (detailed and specific instructions). 
https://www.cfauk.org/pi-listing/rules-versus-principles-based-regulation

I submitted a proposal but did not receive a confirmation that it was received. Perhaps I should submit again?

We got it! You should get an update within a week.

[Edit - on further investigation this seems to be a more UK-specific point; US regulations are much less ambiguous as they take a rules-based approach unlike the UK's principles-based approach]

It's interesting to note that financial regulations sometimes possess a degree of ambiguity and are subject to varying interpretations. It's frequently the case that whichever institution interprets them most stringently or conservatively effectively establishes the benchmark for how the regulation is understood. Regulators often use these stringent interpretations as a basis for future clarifications or refinements. This phenomenon is especially observable in newly introduced regulations pertaining to emerging forms of fraud or novel technologies.