Four Questions to Refine Your Policy Proposal

I am literally a tort litigator in the United States! I worked for several years as a personal injury and product safety litigator.

Although the American legal system holds out "use reasonable care" as its official standard, in practice this gets further defined and specified based on rules found in OSHA, IEEE, or whatever the professional code of practice is for the relevant industry. As a plaintiff's attorney, if you can't point to a specific rule or norm that the defendant broke, you're extremely unlikely to recover any damages. The violation of the rule or the norm is taken as very persuasive evidence that the defendant didn't use reasonable care -- and, conversely, if you can't point to any crisp rule violation, that's usually taken as very persuasive evidence that the defendant did use reasonable care.

QUESTION 1: Who is required to act?

The answer to the first question needs to be framed in such a way that it clears up most of the edge cases and will create common knowledge about who is and isn’t subject to your policy. For example, if you want to make eggs cage-free, a really crisp answer to “who is required to act?” is something like “anyone who sells more than 100 chicken eggs per year.” An answer that’s somewhat looser but still acceptable is “large commercial chicken farmers.” If your answer is instead a very broad concept like “the meat industry,” then that’s rhetorically satisfying, but you can and will spark an unwanted debate about who qualifies as “small family farmers” and whether they should have to follow your cage-free policy. The more time you spend arguing about the exact borders of your policy, the less time you can spend explaining why your core idea is a good idea.

Being specific here is surprisingly helpful, especially if you’re willing to give some ground. If you just say you want to regulate “the meat industry” and refuse to clarify any further, then small family farmers will never trust you. On the other hand, if you start with “100 chicken eggs per year,” and the local farmers mock you for setting such a low threshold, then you can offer to add a few zeroes, or ask the farmers what they think the threshold should be. This makes the discussion usefully concrete. If the family farmers would stop opposing you in exchange for raising the threshold to 100,000 chicken eggs per year, then you’ve found a useful political compromise. If there’s no even halfway-sane number they would agree to, then their true concerns probably aren’t really about family farms – at least you know you’re not missing out on a potential deal, and if you can find even a couple of local farmers to endorse your higher threshold, then you can undermine your opponents’ credibility.

Similarly, if you want frontier AI developers to publish model cards, it’s really helpful if you can crisply define who counts as a frontier AI developer – perhaps by using a FLOP threshold (e.g. more than 10^26 FLOPs used in training), or with reference to a performance benchmark, or just by trying to count and name the top 5 or top 10 developers. Otherwise, you can and will waste a ton of energy on type I and type II errors – you’ll get people who you never wanted to be subject to your policy coming out and complaining that model cards don’t make sense for them, and then you’ll get people who the policy *should* have covered who quietly ignore the policy and then later claim they were confused and didn’t realize the policy applied to them.

Defining exactly who has to follow your rules is challenging and you’re likely to mis-classify at least a few people, but failing to offer a definition is worse, because then your proposal isn’t clear enough for people to understand what they’d be agreeing to, and you will lose political momentum while people argue and complain about that.

QUESTION 2: What are they required to do?

The answer to the second question needs to include at least one active verb. When you say what people are required to do, you can require people to “test,” “publish,” “vaccinate,” “buy,” or “install” something. You can’t require people to “be safe” or “be kind” or “avoid suffering” – or, at least, you can’t do that using any of the legal and policy tools common in English-speaking countries.

After the active verb, you usually need a direct object, ideally one that comes with numbers or at least a very concrete description of who or what will be acted upon. “Grocery stores must stock at least two brands of cage-free eggs” is a concrete policy; “Grocery stores should prioritize lower-cruelty brands” is merely a vague expression of desire. “Parents must vaccinate their children for measles by age 2” is a firm policy; “children should be free from preventable disease” is merely a goal.

If your direct object isn’t something with a meaning that everyone will agree on, then you need a crisp definition for your direct object, too. For example, what exactly is an AI “model card?” If you want to require AI developers to publish model cards, you might need to say what information is required to be in a model card, where this information will come from, what level of accuracy is required, or how often the model card needs to be updated. Differing levels of detail are appropriate for different contexts; you would not, for example, put all of that information on a poster that you’re waving at a protest. However, if you yourself don’t *have* a satisfying answer to those questions that you can share on request, then you don’t yet have a concrete policy.

QUESTION 3: Who will evaluate whether it’s been done?

The answer to the third question typically needs to specify some kind of institution that will make the decision about whether a policy requirement has been satisfied, as well as what criteria they’ll use to make that decision. The institution could be a judicial or administrative court, especially if the policy is a legal requirement, but it doesn’t have to be – often the institution making the decision is some kind of industry association, trade group, or public interest group. For example, the Marine Stewardship Council makes decisions about whether its policies on responsible seafood are being followed by each supplier. Your local school or university might make decisions about whether a child has received enough vaccinations to attend classes. The Open Source Initiative claims the authority to decide whether a given piece of software has complied with its policy to allow everyone to freely use and modify that software, although my understanding is that this authority is somewhat contested.

Notably, I can’t think of an example of a decision-making body that rules on AI safety policy requirements, because there is a huge deficit in the AI safety field’s ability to answer this third question. Zach Stein-Perlman over at AI Lab Watch has done heroic work in collecting data about what AI developers are and aren’t doing, but it doesn’t seem to be tied to anything as crisp as a decision-making process; there’s no single “yes or no” question that gets answered by his evaluations, nor would AI developers expect to suffer any particular consequences if AI Lab Watch decided that the answer to one of their questions was “no.”

QUESTION 4: What is the consequence if they haven’t done it?

This brings me to the fourth question: who cares? Or, more literally, what is the consequence that the people regulated by your policy can expect to suffer if they fail to comply with your required actions? Policies can’t and don’t magically compel people to follow them; instead, they offer a conditional consequence: if you take action X, then we will ensure that consequence Y follows. Depending on how tempting action X is and how credible the thread of consequence Y is, some people might change their behavior in response to this threat.

In the examples we’ve discussed above, a school’s consequence for not vaccinating your children might be that those children are turned away at the door and not allowed to attend classes. The Marine Stewardship Council’s consequence for fishing endangered species with drag nets is that they won’t let you put their fancy blue label on your product, and if you try to put the label there anyway, they’ll sue you for fraud. If your policy involves some kind of grant funding or subsidy, then the consequence of not following the policy could be as simple as not receiving the money.

The state government of California’s consequence if you put chickens in cages and then sell their eggs to grocery stores is that they will (theoretically) send government inspectors to your farm, notice that your farm has cages, order you to remove those cages, and then fine you, shut down your business, or put you in jail if you disobey that order.

It’s worth asking yourself how many of the steps in this process you really expect to occur – what matters is not just what’s written down on paper about policies, but also how those policies are likely to be enforced (or not) in practice. In America, driving a car over the speed limit is technically a criminal infraction that can carry jail time, but in practice the number of people who go to jail solely for speeding is so low as to be statistically zero; if you’re not drunk and you don’t injure others, then it’s incredibly unlikely that an American judge will sentence you to jail time purely for speeding.

If you want to make sure that violating your policy will come with meaningful consequences that could actually motivate people to make large or unpleasant changes to their behavior, then you may need to go back to step 3 and define the evaluation process in more detail. It’s usually not enough to say “the government will decide whether you’ve published a model card” – you may need, instead, to say something like “the office of AI Model Review, which reports directly to the Secretary of Commerce, will evaluate each AI company that has at least $100 million in assets at least once each calendar year and fill out this checklist of criteria about whether their model cards are adequate.”

Specifics are the lifeblood of policy: the more you can be concrete about who will do what to whom, and about when, where, and how they will do it, and about what will happen if they don’t, then the more likely you are to change other people’s behavior.

LESSWRONG
LW