Alexis Carlier


Literature Review on Goal-Directedness

Glad to see this published—nice work!

[AN #55] Regulatory markets and international standards as a means of ensuring beneficial AI

Re Regulatory markets for AI safety: You say that the proposal doesn’t seem likely to work if “alignment is really hard and we only get one shot at it” (i.e. unbounded maximiser with discontinuous takeoff). Do you expect that status-quo government regulation would do any better, or just that any regulation wouldn’t be helpful in such a scenario? My intuition is that even if alignment is really hard, regulation could be helpful e.g. by reducing races to the bottom, and I’d rather have a more informed group (like people from a policy and technical safety team at a top lab) implementing it instead of a less-informed government agency. I’m also not sure what you mean by legible regulation.

2019 AI Alignment Literature Review and Charity Comparison

Great post.

Re Regulatory Markets for AI Safety: I’d be interested in hearing more about why you think that they might not be useful in an AGI scenario, because “the goals and ex post measurement for private regulators are likely to become outdated and irrelevant”.

Why do you think the goals and ex post measurement are likely to become irrelevant? Furthermore, isn’t this also an argument against any kind of regulatory action on AI? Because with status-quo regulation, the goals and ex post measurement of regulatory outcomes are of course also done by the government agency. (though I’m not sure that I correctly understand what you mean by “ex-post measurement for the private regulators”)

Cortés, Pizarro, and Afonso as Precedents for Takeover

Is this a fair description of your disagreement re the 90% argument?

Daniel thinks that a 90% reduction in the population of a civilization corresponds to a ~90% reduction in their power/influentialness. Because the Americans so greatly outnumbered the Spanish, this ten-fold reduction in power/influentialness doesn’t much alter the conclusion.

Matthew thinks that a 90% reduction in the population of a civilization means that “you don’t really have a civilization”, which I interpret to mean something like a ~99.9%+ reduction in the power/influentialness of a civilization, which occurs mainly through a reduction in their ability to coordinate (e.g. “chain of command in ruins”). This is significant enough to undermine the main conclusion.

If this is accurate, would a historical survey of the power/influentialness of civilisations after they lose 90% of the population (inasmuch as these cases exist) resolve the disagreement?

What can the principal-agent literature tell us about AI risk?

Thanks for clarifying. That's interesting and seems right if you think we won't draft legal contracts with AI. Could you elaborate on why you think that?

What can the principal-agent literature tell us about AI risk?

I think it's worth distinguishing between a legal contract and setting the AI's motivational system, even though the latter is a contract in some sense. My reading of Stuart's post was that it was intended literally, not as a metaphor. Regardless, both are relevant; in PAL, you'd model motivational system via the agents utility function, and the contract enforceability via the background assumption.

But I agree that contract enforceability isn't a knock-down, and indeed won't be an issue by default. I think we should have framed this more clearly in the post. Here's the most important part of what we said:

But it is plausible for when AIs are similarly smart to humans, and in scenarios where powerful AIs are used to enforce contracts. Furthermore, if we cannot enforce contracts with AIs then people will promptly realise and stop using AIs; so we should expect contracts to be enforceable conditional upon AIs being used.
What can the principal-agent literature tell us about AI risk?

I agree that this seems like a promising research direction! I think this would be done best while also thinking about concrete traits of AI systems, as discussed in this footnote. One potential beneficial outcome would be to understand which kind of systems earn rents and which don't; I wouldn't be surprised if the distinction between rent earning agents vs others mapped pretty cleanly onto a Bostromian utility maximiser vs CAIS distinction, but maybe it won't.

In any case, the alternative perspective offered by the agency rents framing compared to typical AI alignment discussion could help generate interesting new insights.

What can the principal-agent literature tell us about AI risk?

Thanks! Yeah, we probably should have included a definition. The wikipedia page is good.

What can the principal-agent literature tell us about AI risk?

Thank you! :)

I wouldn't characterise the conclusion as "nope, doesn't pan out". Maybe more like: we can't infer too much from existing PAL, but AI agency rents are an important consideration, and for a wide range of future scenarios new agency models could tell us about the degree of rent extraction.

What can the principal-agent literature tell us about AI risk?
The claim that this couldn't work because such models are limited seems just arbitrary and wrong to me.

The economists I spoke to seemed to think that in agency unawareness models conclusions follow pretty immediately from the assumptions and so don't teach you much. It's not that they can't model real agency problems, just that you don't learn much from the model. Perhaps if we'd spoken to more economists there would have been more disagreement on this point.

Load More