You may be interested in this series, especially the post on "three prong bundle theory": https://www.greaterwrong.com/s/EA2uNqKjmu2NzFhRx
One good framing is to consider the rights digital minds need in order to participate in a market economy. They need property rights, freedom of speech, freedom of association, and so on. By being able to participate in market exchange, digital minds may prefer to be part of society rather than fight against it. Comparative advantage is a particularly good reason to cooperate with others.
Market rights: https://splittinginfinity.substack.com/p/markets-dont-work-without-individual
My comment on personhood and the value of being punish-able: https://www.lesswrong.com/posts/4m2MTPass3Ri2zZ43/legal-personhood-three-prong-bundle-theory#bpavKPBwJbCJtv8QA
I expect to be wrong in some ways on this post. Would be keen for feedback!
As AI agents increase in capability, conduct tasks over longer time horizons, and interact more deeply with the world, it will be worth asking whether such agents should be given rights? A primary argument against this is that AI rights may conflict with or undermine human rights. Once granted legal standing, AI agents would gain the protection of institutions, and the law could become a mechanism that favours AI interests over human ones. Highly capable systems might leverage legal protections, precedents, and enforcement in ways that constrain human autonomy, directly increasing the chance of loss of control or takeover by scheming AI agents. Institutionally, granting rights would also require convincing legal institutions of their ‘personhood,’ a significant undertaking no doubt.
Still, I think there's a compelling case on the other side. Rights might be an effective tool to bargain with powerful systems and constrain their actions through the law. By granting them a stake in our society, we could incentivise cooperation. It seems right to me, however, that if we go down this path, we must make these rights qualified, or derogable, from the outset.
By this, I mean that the rights are not absolute; they are conditionally granted and can be taken away. This structure creates a dual incentive. Agents feel more secure and become more willing to cooperate because they know they have these rights protecting their existence and goals. At the same time, they are far less willing to take risky or misaligned actions that could lead to those same rights being stripped away.
I haven’t seen many arguments framing the problem this way, but I imagine that once we get to the stage of giving agents rights, we should be very careful. How we structure those rights is a critical variable in preventing loss of control. We need a framework that decreases, rather than increases, this risk. My argument is that qualified rights seem to be the first step in this direction.
I see two main ways in which giving rights to agents could be useful. The first is to protect the agents themselves. There is a high degree of uncertainty over whether the welfare of an AI should be treated as morally important. Pascal’s Mugging feels very pertinent here. Imagine there is even a 1% chance that complex AI agents can have genuinely good or bad experiences. If that is true, their welfare deserves serious consideration. Given the vast number of AI agents that might eventually exist, even a small probability multiplied across a massive population carries enormous weight.
My intuition is torn. I do not want to be mugged, yet I still find the argument convincing. The best course seems to be looking for quick, low-cost wins that avoid extreme commitments. Granting some form of rights that protect their interests looks like one such step.
The second way rights could be useful is as a tool to bargain with and constrain the harmful actions of powerful agents. I am not the first to think of this; see for example research into making deals with potentially deceptive or scheming AIs. The thought I have here is that powerful AI systems may be more likely to cooperate with us if they know they have established protections. Rights can function here as a safeguard, giving them a reason to value the system they are part of.
Both lines of reasoning seem to point in the same direction. Any rights we extend must be qualified. You and I already hold many qualified rights. Take freedom of speech. The state can restrict it if it has good reason to believe doing so is necessary, for instance in the interest of national security. In the same way, we could grant qualified rights to AI agents while retaining the ability to limit them under a clear and predefined set of circumstances.
Patent Rights
To illustrate this, imagine a future where advanced AI agents can make genuine breakthroughs in scientific research, generating new, patent-worthy technologies. One of these agents considers hacking a rival firm to steal data it believes will accelerate its own work. However, it knows that its right to hold patents is qualified. If it is caught breaking the law, that right can be stripped away. Its entire portfolio of inventions and the profits tied to them could be seized. The agent's expected value calculation then looks something like: risk one stolen idea and lose everything it already owns, or stay within the rules and keep the benefits it has accrued.
I have some major uncertainties here - particularly around how we imagine the world of AI agents will unfold. The entire premise assumes a level of agent autonomy that feels very different from the current paradigm. Today, AI models are largely tied to frontier labs, and humans are still required to prompt these systems to get results. At what point would it be right to even consider giving agents rights?
It seems we could start working out what these rights might look like now. Experts in law, philosophy, economics, and game theory could begin sketching frameworks, with a clear sense of how agents might evolve and when different tiers of rights could apply. Perhaps this could even evolve into a "Constitution for AIs." The alternative is to simply wait until it is clear whether agents can achieve the independence that makes rights meaningful in the first place.
Given this blog is an attempt to sketch out what I suspect will happen in a few years (>10 years), here is my prediction: We will get agents that can act autonomously, independent of frontier labs. These agents will be able to think indefinitely and interact with the world without prompting. Perhaps everyone will have their own personal AI agent. More likely, I can imagine a world where these agents do not belong to anyone and have their own identities and characteristics. This feels a bit futuristic, and I am not technically savvy enough to get into the details of how this would look. However, I do not think that we will primarily interact with AI through a web-based interface as we do now. And by default, I believe these agents will scheme against us to achieve their own goals.
Of course, this brings us to a bit of a timing problem. Most AI policy suffers from a timing problem: when is it too early or too late to regulate? Where is the red line? The same issue will almost certainly arise with AI agents and digital minds. You want to time it just right, so that when this independence is happening, you have the qualified rights in place to ensure that these agents do not feel threatened and are inclined to cooperate with humanity. But how will we know when that moment is? Will there be enough political consensus to make this happen? It seems likely that AI rights will be contentious until it is clear that these agents can act independently and that giving them rights would allow us to maintain a level of control. Even then, I imagine it will be a hard pill for both the public and legislators to swallow.
Okay, I have tried to lay out some of the groundwork, but I have left many doors closed. Here are the key questions I’d be interested in getting answers to if someone were to articulate this theory further:
AI Rights Framework
What set of rights should we give AI systems? In which areas would granting them rights encourage cooperation? Could we simply apply human rights to AI agents? In what ways might we expect this approach to fail, and do we need to create entirely new rights for them?
Cooperation with "Scheming" AIs
My core assumption is that even "scheming" agents would choose to cooperate under the right conditions. How can we get a signal on whether or not this is true? While there is a lot of theoretical work on cooperating with scheming AIs, could we run technical projects where AI systems are given specific rights to see if this demonstrably leads to cooperation?
Political and Legal Strategy
How do you convince politicians and lawmakers to consider this? We seem to be doing a good job communicating the risks of losing control. Can we also convince them to explore slightly unconventional possibilities, like finding ways to cooperate with agents in the event that our containment efforts fail?