Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Paul Christiano presented some low-key AI catastrophe scenarios; in response, Robin Hanson argued that Paul's scenarios were not consistent with the "large (mostly economic) literature on agency failures".

He concluded with:

For concreteness, imagine a twelve year old rich kid, perhaps a king or queen, seeking agents to help manage their wealth or kingdom. It is far from obvious that this child is on average worse off when they choose a smarter more capable agent, or when the overall pool of agents from which they can choose becomes smarter and more capable. And its even less obvious that the kid becomes maximally worse off as their agents get maximally smart and capable. In fact, I suspect the opposite.

Thinking on that example, my mind went to Edward the Vth of England (one of the "Princes in the Tower"), deposed then likely killed by his "protector" Richard III. Or of the Guangxu Emperor of China, put under house arrest by the Regent Empress Dowager Cixi. Or maybe the ten year-old Athitayawong, king of Ayutthaya, deposed by his main administrator after only 36 days of reign. More examples can be dug out from some of Wikipedia's list of rulers deposed as children.

We have no reason to restrict to child-monarchs - so many Emperors, Kings, and Tsars have been deposed by their advisers or "agents". So yes, there are many cases where agency fails catastrophically for the principal and where having a smarter or more rational agent was a disastrous move.

By restricting attention to agency problems in economics, rather than in politics, Robin restricts attention to situations where institutions are strong and behaviour is punished if it gets too egregious. Though even today, there is plenty of betrayal by "agents" in politics, even if the results are less lethal than in times gone by. In economics, too, we have fraudulent investors, some of which escape punishment. Agents betray their principals to the utmost - when they can get away with it.

So Robin's argument is entirely dependent on the assumption that institutions or rivals will prevent AIs from being able to abuse their agency power. Absent that assumption, most of the "large (mostly economic) literature on agency failures" becomes irrelevant.

So, would institutions be able to detect and punish abuses by future powerful AI agents? I'd argue we can't count on it, but it's a question that needs its own exploration, and is very different from what Robin's economic point seemed to be.

New Comment
7 comments, sorted by Click to highlight new comments since: Today at 7:59 AM

While I have no reason to suspect Hanson's summary of the agency literature is inaccurate, I feel like he really focused on the question of "should we expect AI agents on average to be dangerous" and concluded the answer was no, based on human and business agents.

This doesn't seem to address Christiano's true concern, which I would phrase more like "what is the likelihood at least one powerful AI turns dangerous because of principal agent problems."

One way to square this might be to take some of Hanson's own suggestions to imagine a comparison case. For example, if we look at the way real businesses have failed as agents in different cases, and then assume the business is made of Ems instead, does that make our problem worse or better?

My expectation is that it would mostly just make the whole category higher-variance; the successes will be more successful, but the failures will do more damage. If everything else about the system stays the same, this seems like a straight increase in catastrophic risk.

While I have no reason to suspect Hanson's summary of the agency literature is inaccurate

I'm not sure the implicit message in his summary is accurate. He says "But this literature has not found that smarter agents are more problematic, all else equal". This is perfectly compatible with "nobody has ever modelled this problem at all"; if someone had modelled it and said that smarter agents don't misbehave, then that should have been cited. He says that the problem is generally modelled with the agent (and the principle) being unboundedly rational. This means that smartness and rationality cannot be modelled within the model at all (and I suspect that these are the usual "unboundedly-rational-with-extremely-limited-actions-sets" which fail at realistically modelling either bounded or unbounded rationality).

That could be. I had assumed that when referring to the literature he was including some number of real-world examples against which those models are measured, like the number of lawsuits over breach of contract versus the estimated number of total contracts, or something. Reviewing the piece I realize he didn't specify that, but I note that I would be surprised if the literature didn't include anything of the sort and also that it would be unusual for him to neglect current real examples.

I think that is a great insight. If we are going to rely on good institutions to tame behaviors, create the incentives that promote the desired behaviors, then we need the time for those institutions to develop to the point they can serve the purpose.

Perhaps rather than the current lit on P-A problems and how they are resolved today, one needs to look at the old histories about how such institutions arose and the various paths, and back tracking I suspect, that history shows.

If I understand one of the big concerns about AI in this regard, we also need to keep in mind that we might not have the same luxury of long times for institutional development and evolution in response to poor structures and the many unknowns that are discovered.

More examples can be dug out from some of Wikipedia's list of rulers deposed as children.

This suffers from an obvious huge selection bias, right? Also:

We have no reason to restrict to child-monarchs - so many Emperors, Kings, and Tsars have been deposed by their advisers or "agents".

I guess this is a general argument that we shouldn't delegate things to others. But the interesting argument to worry about AI is that delegating to smarter artificial agents is much worse than the usual delegation scenario that we're used to, because of how much smarter they are than us. So if child-monarchs are deposed by advisers at the same rate as emperors, that's some evidence that the gap between the wisdom of the principal and the agent doesn't make things worse (although it could be that the absolute level of the wisdom of the agent modulates the effect of the gap, since this is held approximately constant in the sample).

I mainly mentioned child-rulers because Robin was using that example; and I used "getting deposed" as an example of agency problems that weren't often (ever?) listed in the economics literature.

TBC I agree that child-rulers are a relevant case, I just think that the frequency with which they are deposed relative to adult-rulers is what matters evidentially. I think I agree that getting deposed doesn't fit well into the agency literature though.