I view intent alignment as one step towards a broader goal of decoupling deliberation from competition.
- Deliberation. Thinking about what we want, learning about the world, talking and learning from each other, resolving our disagreements, figuring out better methodologies for making further progress…
- Competition. Making money and racing to build infrastructure, managing political campaigns and maneuvering within the political system, running ads to persuade people, fighting wars…
Competition pushes us to become the kind of people and communities who can win a fight, to delegate to whichever kind of AI is available first, and to adopt whatever ideologies are most memetically fit.
Deliberation pushes us to become the kind of people and communities who we want to be, to delegate only when we trust an AIs judgment more than our own, and to adopt views that we really believe.
I think it’s likely that competition is going to accelerate and become more complex over the next 100 years, especially as AI systems begin to replace humans and compete on our behalf. I’m afraid that this may derail human deliberation and lead us to a place we don’t want to go.
I would like humans and humanity to have the time, space, and safety to grow and change in whatever way we decide — individually and collectively — that we want to.
You could try to achieve this by “pausing” competition. Alice and Bob could agree to stop fighting while they try to figure out what they want and work out their disagreements. But that’s a tall order — it requires halting not only military conflict, but any economic development that could put someone at an advantage later on. I don’t want to dismiss this kind of ambitious goal (related post), but I think it’s uncertain and long-term enough that you probably want a stop-gap solution.
An alternative approach is to “decouple” competition from deliberation. Alice and Bob keep competing, but they try to make sure that deliberation happens independently and the result isn’t affected by competition. (“Pausing” is the special case of decoupling where deliberation finishes before competition starts.)
In a world without AI, decoupling is possible to a limited extent. Alice and Bob can spend time competing while planning to deliberate later after the dust has settled(or have their descendants deliberate). But it’s inevitable that Alice and Bob will be different after competing with each other for many years, and so they are not completely decoupled.
Alignment and decoupling
Aligned AI may eventually make decoupling much easier. Instead of Alice and Bob competing directly, they may delegate to AI systems who will make money and fight wars and keep them safe. Once Alice and Bob have a clearer sense of what they want, they can direct their AI to use its influence appropriately. (This is closely related to the strategy stealing assumption.)
Eventually it doesn’t even matter if Alice and Bob participate in the competition themselves, since their personal contribution would be so small relative to their AIs. At that point it’s easy for Alice and Bob to spend their time deliberating instead of thinking about competition at all.
If their AI systems are competent enough to keep them safe and isolate them from the fallout from competition, then the outcome of their deliberation doesn’t depend much on the competition occurring in the background.
Misalignment and coupling
Misaligned AI could instead introduce a severe coupling. In the worst case, my best strategy to compete is to build and empower AI systems who want to compete, and my AI also ends up competing with me in the long run.
In the catastrophe scenario, we have relatively little control over how our society’s values evolve— we end up pursuing whatever kinds of goals the most competent AI systems typically pursue.
Discussions of alignment often drift to questions like “But what do we really want?” or “how do we handle humanity’s diverse and sometimes-conflicting desires?”
Those questions seem important and challenging, but I think it’s clear that the answers shouldn’t depend on whatever values are easiest to give AI. That is, we want to decouple the question “what should we do in light of uncertainty and disagreement?” from the question “what is the most effective AI design for making money?”
Appendix: a bunch of random thoughts
Persuasion and limits of decoupling
Persuasion often doesn’t fit cleanly into “deliberation” or “competition.”
On the one hand, talking to people is a critical part of deliberation:
- It’s a fundamental part of reconciling conflicting desires and deciding what we collectively want.
- Having contact with people, and being influenced by people around us, helps us become the people/communities we want to become (and to stay sane).
- Other people have experiences and knowledge we don’t, and may think in different ways that improve the quality of the group’s conclusions.
- Being exposed to good arguments for a view, discovered by people who take it seriously, can be a step in evaluating that view.
On the other hand, the exact same kinds of interaction give scope for competition and manipulation:
- If Alice and Bob are talking to each other as they deliberate, each has a motive to influence the other by carefully filtering what they say, making misleading statements, playing off of each other’s fears or biases, and so on.
- The possibility of manipulation gives Alice and Bob a motive to race ahead and become smarter faster in order to manipulate each other. This is in conflict with an individual desire to take it slow (for example it may push them to delegate to unaligned AI).
- In communities with more individuals there are even more opportunities for conflict, e.g. to exploit group norms or skirt enforcement, to get more access to more people’s attention, and so on. These can lead to similar deadweight loss or incentives to race.
Wei Dai has talked about many of these issues over the years on Less Wrong and this section is largely inspired by his comments or conversations with him.
I don’t think intent alignment addresses this problem, and I’m not sure there’s any clean answer. Some possible approaches:
- Alice and Bob can split up and deliberate separately, potentially for a very long time, before they are ready to reconvene. This may be compatible with Alice and Bob continuing to interact, but not with them genuinely learning from each other.
- Alice and Bob can try to have an agreement to avoid racing ahead or engaging in some kinds of manipulation, and analogous a broader society could adopt such norms or divide into communities with internal agreements of this form. They may want to make such agreements relatively early if there is growing suspicion about someone manipulating the social contract to empower themselves.
- If Alice and Bob split up for a while it may be difficult for them to reconvene, since either of them may have decided to adopt an adversarial stance while they were separated (and if they are adopt an adversarial stance it may be very hard to negotiate them in good faith until reaching technological maturity). They could take various exotic approaches to try overcoming this problem, e.g. sharing details of their history with each other or each embedding themselves in new communities (built for purpose with trusted provenance).
Overall I expect this to be messy. It’s a place where I don’t fully expect it to be possible to fully decouple competition and deliberation, and I wish we had a better story about how to deliberate well in light of that.
Politics of decoupling
Although I think “Decoupling deliberation and competition” is a broadly desirable goal, any implementation will likely benefit some people at others’ expense (like many other efforts to improve the world). So I don’t ever expect it to be a politically clean project.
- Without decoupling there may be mounting pressure to “pause” competition in various ways. Many pauses would result in big changes in the balance of power (e.g. by lowering the value of AI or of military capabilities). So you could easily end up with conflict between people who would prefer “pause” and those who prefer “decouple,” or between people who prefer different decoupling strategies.
- Failures of decoupling often push values in a predictable direction. For example, some people may simply want something to spread from Earth throughout the universe, and they benefit from coupling.
- There is tons of messiness around “persuasion.” Many decoupling approaches would reduce opportunities for some kinds of persuasion (e.g. buying ads or shouting at people), and that will inevitably disadvantage some people (e.g. those who have a lot of money to spend or those whose positions sound best in shouting matches). So people with those advantages may try to use them while possible in order to avoid decoupling.
A double-edged sword
I think that competition currently serves an important sanity-check on our deliberation, and getting rid of it is scary (even if I’m excited on balance).
In an idealized decoupling, the resources someone ends up with don’t depend at all on how they deliberate. This can result in dedicating massive resources to projects that no one really likes. For example:
- Alice may decide that she doesn’t care what happens with her resources and never wants to think about the question seriously. Normally she would get outcompeted by people who care more about future influence.
- Bob’s community may have deeply dysfunctional epistemic norms, leading them both to consistently make errors when thinking about empirical questions and to reach insane conclusions about what they should do with their resources. Normally they would get outcompeted by people with more accurate views.
- Charlie isn’t very careful or effective. Over the course of a long enough deliberative process they are inevitably going to build some misaligned AI or drive themselves insane or something. Normally their carelessness would lead to them gradually losing out relative to more effective agents.
I reasonably often find myself grateful that some dysfunctional norms or epistemic practices will most likely become obsolete. It’s a bit scary to think about a world where the only solution is waiting for someone to snap out of it.
Competition isn’t a robust safeguard, and it certainly isn’t optimal. A careful deliberator would make early steps to ensure that their deliberation had the same kind of robustness conferred by competition — for example they would be on the lookout for any places where their choices would lead to them getting outcompeted “in the wild” and then think carefully about whether they endorse those choices anyway. But I’m afraid that most of us are below the bar where paternalistically forcing us to “keep ourselves honest” is valuable.
I don’t really have a settled view on these questions. Overall I still feel comfortable with decoupling, but I hope that we can collectively decide on some regime that captures some of the benefits of this kind of “competitive discipline” without the costs. For example, even in a mostly-decoupled world we could end up agreeing on different domains of “safe” competition (e.g. it feels much better for states to compete on “being a great place to live” than to fight wars), or imposing temporary paternalistic restrictions and relaxing them only once some reasonably high bar of competence is demonstrated.
The balance of power affects deliberation
Negotiation and compromise is an important part of deliberation, but it depends on the current state of competition.
Suppose that Alice and Bob start talking while they don’t know who is going to end up more influential. But they are talking slowly, in the way most comfortable to them, while competition continues to accelerate at the maximum possible rate. So before they can reach any agreement, it may be clear that one of them is vastly more influential.
Alice and Bob would have preferred to make an early agreement to treat each other with respect, back when they were both ignorant about who would end up with the power. But this opportunity is lost forever once they have seen the outcome.
Alice and Bob can try to avoid this by quickly reaching a compromise. But that seems hard, and having to make a precise agreement fast may take them far away from the deliberative process they would have preferred.
I don’t have any real story about coping with this problem, though I’m less worried about it than persuasion. Some possible (but pretty weird) approaches:
- If Alice and Bob need to reach an agreement in a hurry, they may be able to make some minimal agreement like “we’ll both isolate ourselves so we don’t see the result of the competition until we’ve reached a better agreement” or “long after the competition is over, we’ll allocate resources based on a high-fidelity prediction of what we would have agreed to if we had never seen the result of the competition.”
- After winning the competition and waiting for considerable technological progress, Alice can simulate what Bob would have done if he had won the competition. If this is done carefully, I think you can get to the situation where Alice really doesn’t know if she won the competition (or if she is just in a simulation run by Bob to figure out how nice to be to Alice). Then we can have a discussion between Alice and simulated-Bob (who thinks of this as a conversation between Bob and simulated-Alice) from that state of ignorance.
The singularity, the distant future, and the “long reflection”
In some ways my picture of the future is very aggressive/unusual. For example I think that we are likely to see explosive economic growth and approximate technological maturity within the next 50–100 years (and potentially much sooner).
But in other ways it feels like I have a much more “boring” picture of the future. I expect technology could radically transform the world on a timescale that would be disorienting to people, but for the most part that’s not how we want our lives to go in order to have the best chance of reaching the best conclusions about what to do in the long run. We do want some effects of technology — we would like to stop being so hungry and sick, to have a little bit less reason to be at each other’s throats, and so on — but we also want to be isolated from the incomprehensible, and to make some changes slowly and carefully.
So I expect there to be a very recognizable thread running through humanity’s story, where many of the humans alive today just continue to being human and growing in a way that is familiar and comfortable, perhaps changing more quickly than we have in the past but never so quickly that we are at risk of losing our footing. The point of this is not because that’s how to have the best life (which may well involve incomprehensible mind-alteration or hyper-optimized virtual reality or whatever). It’s because we still have a job to do.
The fact that you are able to modify a human to be much smarter does not mean that you need to, and indeed I think it’s important that you take that process slow. The kinds of moral change we are most familiar with and trust involve a bunch of people thinking and talking, gradually refining their norms and making small changes to their nature, raising new generations one after another.
During that time we have a lot to do to safeguard the process; to become more and more comfortable that it’s proceeding in a good direction even as we become wiser and wiser; to do lots of moral philosophy and political philosophy and psychology at every stage in case they provide clues about how to take the next step wisely. We can take the things that scare us or that we dislike about ourselves, and we can very gingerly remove or change them piece by piece. But I think it doesn’t have to be nearly as weird as people often imagine it.
Moreover, I think that the community of humans taking things slowly and living recognizable lives isn’t an irrelevant sideshow that anyone serious would ignore in favor of thinking about the crazy stuff AI is doing “out there” (or the hyper-optimized experiences some of our descendants may immerse themselves in). I think there’s a real sense in which it’s the main thread of the human story; it’s the thread that determines our future and gradually expands to fill the universe.
Put differently, I think people sometimes imagine abdicating responsibility to crazy AI systems that humans build. I think that will happen someday, but not when we can first build AI — indeed, it won’t happen until those AI systems no longer seem crazy.
In the weirdest cases, we decouple by building an AI that merely needs to think about what humans would want rather than deferring to any real flesh-and-blood humans. But even those cases are more like a change than an ending — we pack up our things from Earth and continue our story inside a homey simulation. And personally I don’t expect to do even that until everyone is good and ready for it, many years after it first becomes possible.