Glad to see this published—nice work!
Re Regulatory markets for AI safety: You say that the proposal doesn’t seem likely to work if “alignment is really hard and we only get one shot at it” (i.e. unbounded maximiser with discontinuous takeoff). Do you expect that status-quo government regulation would do any better, or just that any regulation wouldn’t be helpful in such a scenario? My intuition is that even if alignment is really hard, regulation could be helpful e.g. by reducing races to the bottom, and I’d rather have a more informed group (like people from a policy and technical safety team at a top lab) implementing it instead of a less-informed government agency. I’m also not sure what you mean by legible regulation.
Re Regulatory Markets for AI Safety: I’d be interested in hearing more about why you think that they might not be useful in an AGI scenario, because “the goals and ex post measurement for private regulators are likely to become outdated and irrelevant”.
Why do you think the goals and ex post measurement are likely to become irrelevant? Furthermore, isn’t this also an argument against any kind of regulatory action on AI? Because with status-quo regulation, the goals and ex post measurement of regulatory outcomes are of course also done by the government agency. (though I’m not sure that I correctly understand what you mean by “ex-post measurement for the private regulators”)
Is this a fair description of your disagreement re the 90% argument?
Daniel thinks that a 90% reduction in the population of a civilization corresponds to a ~90% reduction in their power/influentialness. Because the Americans so greatly outnumbered the Spanish, this ten-fold reduction in power/influentialness doesn’t much alter the conclusion.
Matthew thinks that a 90% reduction in the population of a civilization means that “you don’t really have a civilization”, which I interpret to mean something like a ~99.9%+ reduction in the power/influentialness of a civilization, which occurs mainly through a reduction in their ability to coordinate (e.g. “chain of command in ruins”). This is significant enough to undermine the main conclusion.
If this is accurate, would a historical survey of the power/influentialness of civilisations after they lose 90% of the population (inasmuch as these cases exist) resolve the disagreement?
Thanks for clarifying. That's interesting and seems right if you think we won't draft legal contracts with AI. Could you elaborate on why you think that?
I think it's worth distinguishing between a legal contract and setting the AI's motivational system, even though the latter is a contract in some sense. My reading of Stuart's post was that it was intended literally, not as a metaphor. Regardless, both are relevant; in PAL, you'd model motivational system via the agents utility function, and the contract enforceability via the background assumption.
But I agree that contract enforceability isn't a knock-down, and indeed won't be an issue by default. I think we should have framed this more clearly in the post. Here's the most important part of what we said:
But it is plausible for when AIs are similarly smart to humans, and in scenarios where powerful AIs are used to enforce contracts. Furthermore, if we cannot enforce contracts with AIs then people will promptly realise and stop using AIs; so we should expect contracts to be enforceable conditional upon AIs being used.
I agree that this seems like a promising research direction! I think this would be done best while also thinking about concrete traits of AI systems, as discussed in this footnote. One potential beneficial outcome would be to understand which kind of systems earn rents and which don't; I wouldn't be surprised if the distinction between rent earning agents vs others mapped pretty cleanly onto a Bostromian utility maximiser vs CAIS distinction, but maybe it won't.
In any case, the alternative perspective offered by the agency rents framing compared to typical AI alignment discussion could help generate interesting new insights.
Thanks! Yeah, we probably should have included a definition. The wikipedia page is good.
Thank you! :)
I wouldn't characterise the conclusion as "nope, doesn't pan out". Maybe more like: we can't infer too much from existing PAL, but AI agency rents are an important consideration, and for a wide range of future scenarios new agency models could tell us about the degree of rent extraction.
The claim that this couldn't work because such models are limited seems just arbitrary and wrong to me.
The economists I spoke to seemed to think that in agency unawareness models conclusions follow pretty immediately from the assumptions and so don't teach you much. It's not that they can't model real agency problems, just that you don't learn much from the model. Perhaps if we'd spoken to more economists there would have been more disagreement on this point.