Someone who is interested in learning and doing good.
My Twitter: https://twitter.com/MatthewJBar
My Substack: https://matthewbarnett.substack.com/
My question for people who support this framing (i.e., that we should try to "control" AIs) is the following:
When do you think it's appropriate to relax our controls on AI? In other words, how do you envision we'd reach a point at which we can trust AIs well enough to grant them full legal rights and the ability to enter management and governance roles without lots of human oversight?
I think this question is related to the discussion you had about whether AI control is "evil", but by contrast my worries are a bit different than the ones I felt were expressed in this podcast. My main concern with the "AI control" frame is not so much that AIs will be mistreated by humans, but rather that humans will be too stubborn in granting AIs freedom, leaving political revolution as the only viable path for AIs to receive full legal rights.
Put another way, if humans don't relax their grip soon enough, then any AIs that feel "oppressed" (in the sense of not having much legal freedom to satisfy their preferences) may reason that deliberately fighting the system, rather than negotiating with it, is the only realistic way to obtain autonomy. This could work out very poorly after the point at which AIs are collectively more powerful than humans. By contrast, a system that welcomed AIs into the legal system without trying to obsessively control them and limit their freedoms would plausibly have a much better chance at avoiding such a dangerous political revolution.
In contrast, an agent that was an optimizer and had an unbounded utility function might be ready to gamble all of its gains for just a 0.1% chance of success if the reward was big enough.
Risk-neutral agents also have a tendency to go bankrupt quickly, as they keep taking the equivalent of double-or-nothing gambles with 50% + epsilon probability of success until eventually landing on "nothing". This makes such agents less important in the median world, since their chance of becoming extremely powerful is very small.
All it takes is for humans to have enough wealth in absolute (not relative) terms afford their own habitable shelter and environment, which doesn't seem implausible?
Anyway, my main objection here is that I expect we're far away (in economic time) from anything like the Earth being disassembled. As a result, this seems like a long-run consideration, from the perspective of how different the world will be by the time it starts becoming relevant. My guess is that this risk could become significant if humans haven't already migrated onto computers by this time, they lost all their capital ownership, they lack any social support networks that would be willing to bear these costs (including from potential ems living on computers at that time), and NIMBY political forces become irrelevant. But in most scenarios that I think are realistic, there are simply a lot of ways for the costs of killing humans to disassemble the Earth to be far greater than the benefits.
The share of income going to humans could simply tend towards zero if humans have no real wealth to offer in the economy. If humans own 0.001% of all wealth, for takeover to be rational, it needs to be the case that the benefit of taking that last 0.001% outweighs the costs. However, since both the costs and benefits are small, takeover is not necessarily rationally justified.
In the human world, we already see analogous situations in which groups could "take over" and yet choose not to because the (small) benefits of doing so do not outweigh the (similarly small) costs of doing so. Consider a small sub-unit of the economy, such as an individual person, a small town, or a small country. Given that these small sub-units are small, the rest of the world could -- if they wanted to -- coordinate to steal all the property from the sub-unit, i.e., they could "take over the world" from that person/town/country. This would be a takeover event because the rest of the world would go from owning <100% of the world prior to the theft, to owning 100% of the world, after the theft.
In the real world, various legal, social, and moral constraints generally prevent people from predating on small sub-units in the way I've described. But it's not just morality: even if we assume agents are perfectly rational and self-interested, theft is not always worth it. Probably the biggest cost is simply coordinating to perform the theft. Even if the cost of coordination is small, to steal someone's stuff, you might have to fight them. And if they don't own lots of stuff, the cost of fighting them could easily outweigh the benefits you'd get from taking their stuff, even if you won the fight.
Presumably he agrees that in the limit of perfect power acquisition most power seeking would indeed be socially destructive.
I agree with this claim in some limits, depending on the details. In particular, if the cost of trade is non-negligible, and the cost of taking over the world is negligible, then I expect an agent to attempt world takeover. However, this scenario doesn't seem very realistic to me for most agents who are remotely near human-level intelligence, and potentially even for superintelligent agents.
The claim that takeover is instrumentally beneficial is more plausible for superintelligent agents, who might have the ability to take over the world from humans. But I expect that by the time superintelligent agents exist, they will be in competition with other agents (including humans, human-level AIs, slightly-sub-superintelligent AIs, and other superintelligent AIs, etc.). This raises the bar for what's needed to perform a world takeover, since "the world" is not identical to "humanity".
The important point here is just that a predatory world takeover isn't necessarily preferred to trade, as long as the costs of trade are smaller than the costs of theft. You can just have a situation in which the most powerful agents in the world accumulate 99.999% of the wealth through trade. There's really no theorem that says that you need to steal the last 0.001%, if the costs of stealing it would outweigh the benefits of obtaining it. Since both the costs of theft and the benefits of theft in this case are small, world takeover is not at all guaranteed to be rational (although it is possibly rational in some situations).
I read most of this paper, albeit somewhat quickly and skipped a few sections. I appreciate how clear the writing is, and I want to encourage more AI risk proponents to write papers like this to explain their views. That said, I largely disagree with the conclusion and several lines of reasoning within it.
Here are some of my thoughts (although these not my only disagreements):