Design policy to be testable

Yair Halberstadt

My father in law is an expert on legislation. He's travelled around the world advising countries on how to craft laws and constitutions.

One of the projects he's working on at the moment is advising on a bill to promote integrated education of different cultural and religious groups in Northern Ireland.

He was discussing the policy with me the other day, and was saying that he thinks it will be highly effective: "It's impossible for a preacher to stand up and say that all catholics/protestants are evil, when you've grown up in the same class as them and simply know they're not."

I agreed the reasoning made sense, but was somewhat skeptical it would necessarily have the desired effect. I can easily think of just so stories where the opposite happens.

For example, kids naturally form cliques. And protestant kids are more likely to form cliques with other protestant kids, and catholic kids with catholic kids. So they might both grow up 'knowing' that the other group were very cliquey and not at all interested in being friends.

So I asked, how do they plan to test whether it worked?

He looked at me for a few seconds, and then said that he supposed you could look at statistics such as rates of conscription to the IRA.

"Wait a minute, " I replied, "do you mean to say that they haven't designed up front criteria for how to measure how effective this policy is? Then how on earth will they know if it's successful?"

Unfortunately, but not unsurprisingly, the idea of crafting policy in such a way that it can be easily testable just doesn't seem to exist!

I know that we're probably never going to have futarchies, or even deregulate healthcare. But I think the idea of designing policy to be testable is something that should be pretty uncontroversial (at least if you're not a politician).

I would love to live in a world where any bill required the following 5 items:

A description of how they're going to measure the impact of the bill.
A prediction distribution for the impact of the bill.
A level at which they'd consider the bill a success, and a level at which they'd consider it a failure.
A description of how they're going to measure how bad things would be conditional on the bill not passing.
A prediction distribution for this value.

I'm not even asking for these values to be used explicitly in any way, (e.g. I'm not suggesting we automatically fire lawmakers who propose bills which end up underachieving).

I think this will have a number of benefits:

Firstly it's impossible to evaluate a policy without at least implicitly considering the expected impact of the policy, vs the impact of not passing the policy. Since this is so critical for judging policy it should be made explicit, so that disagreements can be far more effective. Instead of saying it's a bad policy, you could distinguish between

I don't think the thing the policy is trying to achieve is worthwhile.
I disagree with your predictions.
I don't think your expected impact is worth the cost.

Which can lead to far more fruitful discussion.

The second benefit is that when it's a requirement for bills to describe how their impact is going to be measured, hopefully that'll encourage them to design the bill in such a way that this can be done easily and accurately. Ideally we'd have a non partisan group of scientists whose sole purpose is to evaluate whether this is a sufficiently unbiased instrument for measuring the impact.

This will be enormously valuable in helping design future policy when you can accurately see which past policies worked well and which flopped, both in the same country and others.

Finally this gives us a way to evaluate lawmakers. I would expect institutions to spring up which rate lawmakers (and governments) on how effective both their policies and their predictions were, and this will hopefully aid voters in making accurate decisions rather than just going off gut feelings.

Seems cut off. But the theme seems to be a lack of Hansonian understanding that policy debate isn't about finding the best policies. It's about power and appearing to help those who give you the power.

In small groups and businesses, testable policies and practices rule - they care about the results, not (as much) about the public reaction. In public and large-non-hierarchical groups, testable policy is a mistake, because you'll be blamed if the test shows that it's wrong. The best policies are those that are hard to argue against, and that show you care about your constituents.

Completed and reposted.

Cool. Now even more clear that this is a wish for different humans, or at least different kinds of negotiation and compromise in government.

I would love to live in a world where any bill required the following 5 items:

Don't use passive voice here. Bills don't require things, voters require things of their legislators. That level of clarity in policy and public discussion would make it impossible to sneak in the special-interest and coalition-supporting features of bills. Which would, in turn, make it impossible for most voters to fool themselves into thinking their government has their interests first.

This post seems to have cut off in the middle?

And not fully-complete in the parts that exist. "He was discussing the policy with me the other day". Which policy?

But, I like what I can read so far!

Yeah I meant to save as draft and pressed the wrong button

Finished and reposted.

Level 1: “There’s a lion across the river.” = There’s a lion across the river.

Level 2: “There’s a lion across the river.” = I don’t want to go (or have other people go) across the river

Level 3: “There’s a lion across the river.” = I’m with the popular kids who are too cool to go across the river.

Level 4: “There’s a lion across the river.” = A firm stance against trans-river expansionism focus grouped well with undecided voters in my constituency.

Source: https://www.lesswrong.com/posts/qDmnyEMtJkE9Wrpau/simulacra-levels-and-their-interactions

The optimization you are describing is strongly rooted in level 1. There's a clear image of a before/after state along with a feedback loop for correction. I presume this speaks to many people in the LW community, along with good project managers, engineers, etc., anyone really who really wants to change something in the world, whether it's big or small.

However, I think politicians and the bureaucrats that work for them (or vice versa?) are on levels 3 and 4. Having metrics and using them to score policies would only get in the way of signaling group affiliation or building consensus. This is just another way of saying that often, the real impact a decision has is of less importance than how the people making the decision look and act.

That said, I've found a similar technique mentioned in productivity circles. Perhaps it was in a book by Peter Drucker. The gist was: When making a decision, make to include criteria which tell you whether it's working or not.

All very interesting - there are things called impact assessments which are used for legislation to predict and justify impact - the Westminster Parliament does not insist on impact assessments for Bills - statutory instruments normally have them - the devolved legislatures normally have them - they carry from almost the kind of specific modelling you are talking about to something more generic and less helpful from your perspective.

On of the key goals of politicians is to avoid blame. If you would create policy is upfront criteria of how to measure how effective the policy is that opens up room for the policy maker to be blamed.

It would be great to have more information about which policies work but the key problem is to think through the incentives of how you could create a system where that works.

Interesting point about blame - but of course they also want credit ... Impact assessments carry risk for the reason you give, but also possibility of credit.

In my personal conversation, it seemed that avoiding blame is a stronger force than seeking credit. Yes, Minister which was based on a lot of conversation with insiders also suggests that back then in the UK avoiding blame was much more central than seeking credit.

This seems consistent with Zvi's concept of Asymmetric Justice.

I think this will have a number of benefits:

Since you're proposing a policy here, I feel like someone needs to ask: how would you measure and test these benefits?

They seem plausible to me, but I could tell just-so stories where this goes wrong. Like:

Some amount of leaving-things-unsaid turns out to be necessary for humans to cooperate.
Policy is just too complicated. Prediction markets rarely have any idea whether a bill will succeed at its stated impact measures.
A non partisan group of scientists isn't enough for us to break Goodhart's law. We can improve the things we measure, but the things we can't or don't measure turn to shit.

That's a fair point.

However I think my thoughts here are firmly in the idea generation camp, rather than the concrete proposal camp. If this ever does make it to policy, I'd very much hope it will stick to it's own standard.

There's a bunch of criticisms here that politicians/politics don't work that way, you haven't considered incentives, and that may be true, but...

Everybody knows feels relevant here, plus that thing Scott's described where (IIRC, something along the lines of) he goes from "those people seem crazy" over a few steps to "oh, they were right, but they're being unsophisticated about how they say it".

Like, maybe this is unworkable because politicians aren't actually trying to cause good outcomes. But even if that's the case, they still generally pretend they're trying to cause good outcomes. Making that pretense marginally less believable seems marginally good. And so it seems good to talk about "here's some ways politicians could more reliably cause good outcomes".

I like the general idea, policy that is drafted with clear statements about what it does, why it's doing things they way it's designed and what the desired outcomes are aimed for. It seems to be somewhat similar to cases where the courts have thrown laws back to the law makers because they are not clear enough for legal enforcement by the courts. However that is a really poor initial check as something first has to get to a court and then go in front of a judge that is willing to call BS on some legislation or regulation.

I think there is a lot of room for improving the legislative processes -- certainly in the USA where I am. But I also agree with those noting that in many ways such structures or process requirements are not really aligned with now politics and policy setting actually work. But it seems like that might be one of the shackles we might want Leviathan to have to wear.

Completed and reposted.

Cool. Now even more clear that this is a wish for different humans, or at least different kinds of negotiation and compromise in government.

I would love to live in a world where any bill required the following 5 items:

This post seems to have cut off in the middle?

And not fully-complete in the parts that exist. "He was discussing the policy with me the other day". Which policy?

But, I like what I can read so far!

Yeah I meant to save as draft and pressed the wrong button

Finished and reposted.

Level 1: “There’s a lion across the river.” = There’s a lion across the river.

Level 2: “There’s a lion across the river.” = I don’t want to go (or have other people go) across the river

Level 3: “There’s a lion across the river.” = I’m with the popular kids who are too cool to go across the river.

Level 4: “There’s a lion across the river.” = A firm stance against trans-river expansionism focus grouped well with undecided voters in my constituency.

Source: https://www.lesswrong.com/posts/qDmnyEMtJkE9Wrpau/simulacra-levels-and-their-interactions

It would be great to have more information about which policies work but the key problem is to think through the incentives of how you could create a system where that works.

Interesting point about blame - but of course they also want credit ... Impact assessments carry risk for the reason you give, but also possibility of credit.

This seems consistent with Zvi's concept of Asymmetric Justice.

I think this will have a number of benefits:

Since you're proposing a policy here, I feel like someone needs to ask: how would you measure and test these benefits?

They seem plausible to me, but I could tell just-so stories where this goes wrong. Like:

Some amount of leaving-things-unsaid turns out to be necessary for humans to cooperate.
Policy is just too complicated. Prediction markets rarely have any idea whether a bill will succeed at its stated impact measures.
A non partisan group of scientists isn't enough for us to break Goodhart's law. We can improve the things we measure, but the things we can't or don't measure turn to shit.

That's a fair point.

There's a bunch of criticisms here that politicians/politics don't work that way, you haven't considered incentives, and that may be true, but...

LESSWRONG
LW

LESSWRONG
LW

43

Design policy to be testable

43

43

43