Review

Robin Hanson suggests, partly in response to calls for a pause in development of AGI, liability rules for risks related to AGI rapidly becoming powerful.

My intuitive reaction was to classify foom liability as equivalent to a near total ban on AGI.

Now that I've found time to think more carefully about it, I want to advocate foom liability as a modest improvement over any likely pause or ban on AGI research. In particular, I want the most ambitious AI labs worldwide to be required to have insurance against something like $10 billion to $100 billion worth of damages.

Liability has the obvious advantage that whoever is financially responsible for the liability (AI labs? insurance companies?) will have better incentives to study the risks carefully than would politicians, bureaucrats, or standards committees composed of industry representatives.

A more subtle advantage: limited risk that it will be used to protect incumbents or permanently stifle good technologies.

I see somewhat less risk of liability being hijacked by culture warriors.

Liability wouldn't produce ideal incentives, due to scenarios where AGI accidents cause more damage than any company can compensate for (e.g. human extinction). I'll guess that liability would provide one tenth of the optimal incentive.

But remember that there's not too much reason to worry about improving the financial incentives to avoid extinction. I'm not too concerned with the risk that DeepMind will train a system that they think has a 5% chance of killing us all. I'm more concerned about a less responsible lab training a system that their insurance company would think had a 5% chance of starting a minor war, and which I think has a 5% chance of killing us all. I.e. my main concern is with AI labs that are overconfident about safety, but who are rational enough to respond to projected penalties for medium-sized accidents.

For the risks that worry me most, there would need to be an international agreement on such liability / insurance. That creates difficulties beyond those that would confront a direct ban. Some countries would allow fly-by-night insurance companies to provide "insurance" that would not, in practice, cover much of the liability.

How Much Liability

Key variables are: who needs to buy liability insurance (or provide the equivalent evidence of self-insurance), and how much harm does the insurance policy need to cover?

It feels pretty arbitrary to assign a specific number to many of the AI accidents that I can imagine.

Suppose an AI makes a serious attempt at world conquest: It hacks and blackmails its way into apparent control of several nuclear arsenals. It releases scandalous evidence about influential people who warned about AIs gaining power. Yet it gets shut down without any clear harm to innocent people.

I'm guessing that penalties in the vicinity of $10 billion, give or take a factor or 10, would be in the right ballpark.

Much more than that would mean that only the wealthiest organizations could afford to work on AGI. I was tempted to claim that such organizations will on average be more responsible than smaller organizations. Then I remembered Elon Musk, and decided it was unclear.

It feels harder to adequately describe what entities would need to be insured. E.g. open source clones of leading LLMs are unlikely to be smarter than the original LLMs (compute-intensive training seems likely to remain important), but might be more dangerous because they attract more reckless tinkerers who make them more agenty. That is a problem for any proposal to regulate or slow AI development. I'm focusing in this post on how foom liability compares to other options. I'm not claiming to know whether any of these proposals are feasible.

Scenarios

I'll now imagine several scenarios as to how foom liability would play out. I'm giving probabilities to indicate my vague intuitions about which scenarios are most likely. All probabilities are conditional on some sort of international agreement to require strict liability for foom-like accidents.

Scenario 1: Permanent Ban on AGI

I am reminded of what liability rules have done to innovation in small aircraft, and of what regulation has done to nuclear power. From Where is my Flying Car?:

One of the more ironic regulatory pathologies that has shaped the world of general aviation is that most of the planes we fly are either 40 years old or homemade - and that we were forced into that position in the name of safety.

In this scenario, I imagine an airtight requirement that key types of AI research get an insurance company to write a policy covering that research. I also imagine that insurance companies are reluctant to write those policies. The existence of somewhat special rules for foom liability reinforces widespread concern over the risks, in ways that cause a perpetual upward creep in expected damage awards from arguably dysfunctional courts.

This requires suppression of unauthorized research that is a good deal more effective than what I know how to implement. Yet if most professors at leading universities decided that AGI would make their jobs obsolete (while GPT-4 won't), then I wouldn't want to bet against their ability to devise an airtight ban.

I'll give this scenario a 5% chance.

Scenario 2: Leaky Ban on AGI

More likely, it will be hard to fully enforce any insurance requirement. Most of the problem is likely to come from the difficulty of identifying what kind of research is risky enough to require insurance.

An analogy would be rules that outlaw unlicensed taxis and hotels. Uber and Airbnb created businesses that compete with incumbents in those industries, without meeting the formal definition of taxis and hotels, defeating the goal of protecting those incumbents from competition.

I imagine that it's hard to draft an insurance requirement that reliably distinguishes between safe software and software that might foom. Any attempt to strike a reasonable balance between allowing good software to remain unregulated, and restricting all risky software, will leave opportunities for clever startups to create unlicensed AGIs that will foom to whatever extent foom is possible.

I'm imagining in this scenario that AI labs mostly keep roughly the same mix of focus on capability and safety research that they have now. They postpone risky training of large systems. That slows down capability advances, and slows down some safety research that depended on the availability of more powerful AIs.

I expect this scenario would buy AI safety researchers a year or five. That would come at the cost of an increased risk that AGIs will be designed by more reckless developers. It's unclear what net affects this scenario would have on our safety.

I'll give this scenario a 55% chance.

Scenario 3: Goldilocks

In my hoped for scenario, foom liability is effective at delaying AGI development for a year or two, and at spurring increased safety research.

Insurance companies initially indicate that they would charge more than $10 billion for a policy covering the most reputable AI labs. A year later, they sell one policy for $5 billion. Six months after that, multiple policies are sold for $3 billion.

Many people in the industry end up agreeing that the insurance requirement caused the industry to reduce some important risks, at a fairly acceptable cost in delaying benefits.

I'll give this scenario a 10% chance.

Scenario 4: Oligopoly

The main effect might be to slow development, by prohibiting small organizations from doing any important AI development.

I'm imagining here that a handful of companies are able to buy insurance for maybe $10 billion each. They were already doing most of what they were able to do to minimize risks. The insurance was a nuisance due to the need to articulate safety measures, most of which required expert knowledge to understand. The insurance companies didn't learn enough to provide any useful ideas about safety beyond whatever was the default path. Everyone ends up agreeing that there are important risks, but we find no way to reach any consensus on how to handle the risks.

My best guess is that this makes us slightly safer, via stopping a few reckless companies from competing, and via slowing down competitive races between leading AI labs.

That safety comes at a cost of increased concentration of power in a few big companies.

I'll give this scenario a 10% chance.

Scenario 5: Full Speed Ahead

Foom liability might be ineffective. The benefits of AI could persuade many companies to pursue powerful AGI regardless of the insurance costs.

This is a clear possibility if companies are allowed to self-insure, or if small companies are able to compete.

I'll give this scenario a 20% chance.

Scenario 6: China versus West

I'm unclear whether this scenario is affected by the difference between liability versus a temporary ban on development, so maybe it doesn't belong in this post at all. China seems slower than the US to treat smarter-than-human AI as a near-mode issue. China seems at least as willing in general to ban new technologies, and somewhat more likely to enforce those bans effectively. Most likely Chinese concern over smarter-than-human AI will follow US concern with a delay of a year or so.

I'd be pretty optimistic about an international agreement if China was mainly concerned with a commercial balance of power. But I see a strange interplay between AGI risk and conflict over Taiwan.

My biggest concern is that China will see restrictions on AGI (including foom liability) as partly an attempt to keep China from participating in the AI revolution. That would increase the already important pressure on China to at least blockade Taiwan. The resulting GPU shortage would delay AI progress by a year or so. That would be a high-risk way of buying time for safety research.

There will likely be political pressure in the US for advocates of restrictions on AI to ally with forces that want to cripple China.

I'll treat this as a subset of the Leaky Ban scenario, and not give it a separate probability.

Conclusion

There are still many details that would need to be clarified. Imagine that an AGI manipulates South Korea into liberating North Korea, causing 100k immediate deaths, but the AGI projects doing that will save lives in the longer run. How do we decide whether to penalize the AGI's creators? I suspect we get decent incentives whichever way we decide such questions, as long as we have relatively clear rules for deciding on those penalties.

A foom insurance requirement looks hard to implement well, but only a little bit harder than a more direct pause or ban on AGI development.

I'll guess that a foom insurance has a 5% chance of producing an important safety benefit. Given how precarious our position looks, that seems like a great deal.

New Comment
10 comments, sorted by Click to highlight new comments since:

Keep in mind that liability is for HARMS, not RISKS.  I know Robin understands this, but chooses to ignore it for reasons that don't make sense to me (likely related to needing to publish wild ideas).

There can be no liability claims for extinction, because everyone with standing is dead (or at least prevented from suing).  There CAN be liability claims for lesser harms, but I haven't seen anyone even start to lay out what a specific actionable harm that's less than extinction would look like.

Hanson does not ignore this, he is very clear about it

it seems plausible that for every extreme scenario like [extinction by foom] there are many more “near miss” scenarios which are similar, but which don’t reach such extreme ends. For example, where the AI tries but fails to hide its plans or actions, where it tries but fails to wrest control or prevent opposition, or where it does these things yet its abilities are not broad enough for it to cause existential damage. So if we gave sufficient liability incentives to AI owners to avoid near-miss scenarios, with the liability higher for a closer miss, those incentives would also induce substantial efforts to avoid the worst-case scenarios.

The purpose of this kind of liability is to provide an incentive gradient pushing actors away from the preconditions of harm. Many of those preconditions are applicable to harms at differing scales. For example, if an actor allowed AI systems to send emails in an unconstrained and unmonitored way, that negligence is an enabler for both automated spear-phishing scams (a "lesser harms") and for AI-engineered global pandemics.

Do you (or Robin) have any examples in other domains where "near-miss" outcomes that don't actually result in actionable individual harm are treated as liabilities, especially insurable ones?  I can only think of cases (tobacco, environmental regulation) where it's aggregated by legislation into regulatory fines, completely separate from a liability framework.

I know it's fun and edgy to pretend that we can make up some just-so stories about how laws and human legal systems work, and then extend that theory to make it fit our preferences.  But it would seem simpler and more direct to just say "government should directly prevent X risky behaviors". 

Can you re-state that? I find the phrasing of your question confusing.

(Are you saying there is no harm in the near-miss scenarios, so liability doesn't help? If so I disagree.)

Yes, in a near-miss scenario, there's no actual harm.  There's nothing to base the liability on - the perpetrator didn't actually damage the claimant.

I see. The liability proposal isn't aimed at near-miss scenarios with no actual harm. It is aimed at scenarios with actual harm, but where that actual harm falls short of extinction + the conditions contributing to the harm were of the sort that might otherwise contribute to extinction.

You said no one had named "a specific actionable harm that's less than extinction" and I offered one (the first that came to mind) that seemed plausible, specific, and actionable under Hanson's "negligent owner monitoring" condition.

To be clear, though, if I thought that governments could just prevent negligent owner monitoring (& likewise with some of the other conditions) as you suggested, I would be in favor of that!

EDIT: Someone asked Hanson to clarify what he meant by "near-miss" such that it'd be an actionable threshold for liability, and he responded:

Any event where A causes a hurt to B that A had a duty to avoid, the hurt is mediated by an AI, and one of those eight factors I list was present.

Oh, I may be fully off-base here.  But I'm confused in how existing liability mechanisms don't apply, in cases where A causes hurt to B that A had a duty to avoid, regardless of AI involvement.  I don't think anyone is claiming that AI somehow shields a company from liability.

Ah, re-reading with that lens, it seems the proposal is to add "extra liability" to AI-involved harms, not to create any new liabilities for near-misses.  My reaction against this is a lot weaker - I'm on-board with a mix of punitive and restorative damages for many legal claims of liability. 

I think we're more or less on the same page now. I am also confused about the applicability of existing mechanisms. My lay impression is that there isn't much clarity right now.

For example this uncertainty about who's liable for harms from AI systems came up multiple times during the recent AI hearings before the US Senate, in the context of Section 230's shielding of computer service providers from certain liabilities, to what extent that it & other laws extend here. In response to Senator Graham asking about this, Sam Altman straight up said "We’re claiming we need to work together to find a totally new approach. I don’t think Section 230 is the even the right framework."

Robin Hanson wants AIs to replace humans. He thinks that is very good, more good than anything else. Every argument he makes regarding AI should be presumed to be his autistic attempt at convincing people to do whatever he thinks will make AIs replace humans faster.

Here's an example of that.

Would you like to make a case why you believe, say Deepmind, would not produce an AI that poses an x-risk, but a smaller lab would? It's not intuitive for me why this could be the default case. Is it because we expect smaller labs to have lesser to zero guardrails in place?