What can the principal-agent literature tell us about AI risk?
This work was done collaboratively with Tom Davidson. Thanks to Paul Christiano, Ben Garfinkel, Daniel Garrett, Robin Hanson, Philip Trammell and Takuro Yamashita for helpful comments and discussion. Errors our own. Introduction The AI alignment problem has similarities with the principal-agent problem studied by economists. In both cases, the problem is: how do we get agents to try to do what we want them to do? Economists have developed a sophisticated understanding of the agency problem and a measure of the cost of failure for the principal, “agency rents”. If principal-agent models capture relevant aspects of AI risk scenarios, they can be used to assess their plausibility. Robin Hanson has argued that Paul Christiano’s AI risk scenario is essentially an agency problem, and therefore that it implies extremely high agency rents. Hanson believes that the principal-agent literature (PAL) provides strong evidence against rents being this high. In this post, we consider whether PAL provides evidence against Christiano’s scenario and the original Bostrom/Yudkowsky scenario. We also examine whether the extensions to the agency framework could be used to gain insight into AI risk, and consider some general difficulties in applying PAL to AI risk. Summary * PAL isn’t in tension with Christiano’s scenario because his scenario doesn’t imply massive agency rents; the big losses occur outside of the principal-agent problem, and the agency literature can’t assess the plausibility of these losses. Extensions to PAL could potentially shed light on the size of agency rents in this scenario, which are an important determinant of the future influentialness of AI systems. * Mapped onto a PAL model, the Bostrom/Yudkowsky scenario is largely about the principal’s unawareness of the agent’s catastrophic actions. Unawareness models are rare in PAL probably because they usually aren’t very insightful. This lack of insightfulness also seems to prevent existing PAL models or poss