Decoupling deliberation from competition

paulfchristiano

I view intent alignment as one step towards a broader goal of decoupling deliberation from competition.

Deliberation. Thinking about what we want, learning about the world, talking and learning from each other, resolving our disagreements, figuring out better methodologies for making further progress…
Competition. Making money and racing to build infrastructure, managing political campaigns and maneuvering within the political system, running ads to persuade people, fighting wars…

Competition pushes us to become the kind of people and communities who can win a fight, to delegate to whichever kind of AI is available first, and to adopt whatever ideologies are most memetically fit.

Deliberation pushes us to become the kind of people and communities who we want to be, to delegate only when we trust an AIs judgment more than our own, and to adopt views that we really believe.

I think it’s likely that competition is going to accelerate and become more complex over the next 100 years, especially as AI systems begin to replace humans and compete on our behalf. I’m afraid that this may derail human deliberation and lead us to a place we don’t want to go.

Decoupling

I would like humans and humanity to have the time, space, and safety to grow and change in whatever way we decide — individually and collectively — that we want to.

You could try to achieve this by “pausing” competition. Alice and Bob could agree to stop fighting while they try to figure out what they want and work out their disagreements. But that’s a tall order — it requires halting not only military conflict, but any economic development that could put someone at an advantage later on. I don’t want to dismiss this kind of ambitious goal (related post), but I think it’s uncertain and long-term enough that you probably want a stop-gap solution.

An alternative approach is to “decouple” competition from deliberation. Alice and Bob keep competing, but they try to make sure that deliberation happens independently and the result isn’t affected by competition. (“Pausing” is the special case of decoupling where deliberation finishes before competition starts.)

In a world without AI, decoupling is possible to a limited extent. Alice and Bob can spend time competing while planning to deliberate later after the dust has settled(or have their descendants deliberate). But it’s inevitable that Alice and Bob will be different after competing with each other for many years, and so they are not completely decoupled.

Alignment and decoupling

Aligned AI may eventually make decoupling much easier. Instead of Alice and Bob competing directly, they may delegate to AI systems who will make money and fight wars and keep them safe. Once Alice and Bob have a clearer sense of what they want, they can direct their AI to use its influence appropriately. (This is closely related to the strategy stealing assumption.)

Eventually it doesn’t even matter if Alice and Bob participate in the competition themselves, since their personal contribution would be so small relative to their AIs. At that point it’s easy for Alice and Bob to spend their time deliberating instead of thinking about competition at all.

If their AI systems are competent enough to keep them safe and isolate them from the fallout from competition, then the outcome of their deliberation doesn’t depend much on the competition occurring in the background.

Misalignment and coupling

Misaligned AI could instead introduce a severe coupling. In the worst case, my best strategy to compete is to build and empower AI systems who want to compete, and my AI also ends up competing with me in the long run.

In the catastrophe scenario, we have relatively little control over how our society’s values evolve— we end up pursuing whatever kinds of goals the most competent AI systems typically pursue.

Discussions of alignment often drift to questions like “But what do we really want?” or “how do we handle humanity’s diverse and sometimes-conflicting desires?”

Those questions seem important and challenging, but I think it’s clear that the answers shouldn’t depend on whatever values are easiest to give AI. That is, we want to decouple the question “what should we do in light of uncertainty and disagreement?” from the question “what is the most effective AI design for making money?”

Appendix: a bunch of random thoughts

Persuasion and limits of decoupling

Persuasion often doesn’t fit cleanly into “deliberation” or “competition.”

On the one hand, talking to people is a critical part of deliberation:

It’s a fundamental part of reconciling conflicting desires and deciding what we collectively want.
Having contact with people, and being influenced by people around us, helps us become the people/communities we want to become (and to stay sane).
Other people have experiences and knowledge we don’t, and may think in different ways that improve the quality of the group’s conclusions.
Being exposed to good arguments for a view, discovered by people who take it seriously, can be a step in evaluating that view.

On the other hand, the exact same kinds of interaction give scope for competition and manipulation:

If Alice and Bob are talking to each other as they deliberate, each has a motive to influence the other by carefully filtering what they say, making misleading statements, playing off of each other’s fears or biases, and so on.
The possibility of manipulation gives Alice and Bob a motive to race ahead and become smarter faster in order to manipulate each other. This is in conflict with an individual desire to take it slow (for example it may push them to delegate to unaligned AI).
In communities with more individuals there are even more opportunities for conflict, e.g. to exploit group norms or skirt enforcement, to get more access to more people’s attention, and so on. These can lead to similar deadweight loss or incentives to race.

Wei Dai has talked about many of these issues over the years on Less Wrong and this section is largely inspired by his comments or conversations with him.

I don’t think intent alignment addresses this problem, and I’m not sure there’s any clean answer. Some possible approaches:

Alice and Bob can split up and deliberate separately, potentially for a very long time, before they are ready to reconvene. This may be compatible with Alice and Bob continuing to interact, but not with them genuinely learning from each other.
Alice and Bob can try to have an agreement to avoid racing ahead or engaging in some kinds of manipulation, and analogous a broader society could adopt such norms or divide into communities with internal agreements of this form. They may want to make such agreements relatively early if there is growing suspicion about someone manipulating the social contract to empower themselves.
If Alice and Bob split up for a while it may be difficult for them to reconvene, since either of them may have decided to adopt an adversarial stance while they were separated (and if they are adopt an adversarial stance it may be very hard to negotiate them in good faith until reaching technological maturity). They could take various exotic approaches to try overcoming this problem, e.g. sharing details of their history with each other or each embedding themselves in new communities (built for purpose with trusted provenance).

Overall I expect this to be messy. It’s a place where I don’t fully expect it to be possible to fully decouple competition and deliberation, and I wish we had a better story about how to deliberate well in light of that.

Politics of decoupling

Although I think “Decoupling deliberation and competition” is a broadly desirable goal, any implementation will likely benefit some people at others’ expense (like many other efforts to improve the world). So I don’t ever expect it to be a politically clean project.

For example:

Without decoupling there may be mounting pressure to “pause” competition in various ways. Many pauses would result in big changes in the balance of power (e.g. by lowering the value of AI or of military capabilities). So you could easily end up with conflict between people who would prefer “pause” and those who prefer “decouple,” or between people who prefer different decoupling strategies.
Failures of decoupling often push values in a predictable direction. For example, some people may simply want something to spread from Earth throughout the universe, and they benefit from coupling.
There is tons of messiness around “persuasion.” Many decoupling approaches would reduce opportunities for some kinds of persuasion (e.g. buying ads or shouting at people), and that will inevitably disadvantage some people (e.g. those who have a lot of money to spend or those whose positions sound best in shouting matches). So people with those advantages may try to use them while possible in order to avoid decoupling.

A double-edged sword

I think that competition currently serves an important sanity-check on our deliberation, and getting rid of it is scary (even if I’m excited on balance).

In an idealized decoupling, the resources someone ends up with don’t depend at all on how they deliberate. This can result in dedicating massive resources to projects that no one really likes. For example:

Alice may decide that she doesn’t care what happens with her resources and never wants to think about the question seriously. Normally she would get outcompeted by people who care more about future influence.
Bob’s community may have deeply dysfunctional epistemic norms, leading them both to consistently make errors when thinking about empirical questions and to reach insane conclusions about what they should do with their resources. Normally they would get outcompeted by people with more accurate views.
Charlie isn’t very careful or effective. Over the course of a long enough deliberative process they are inevitably going to build some misaligned AI or drive themselves insane or something. Normally their carelessness would lead to them gradually losing out relative to more effective agents.

I reasonably often find myself grateful that some dysfunctional norms or epistemic practices will most likely become obsolete. It’s a bit scary to think about a world where the only solution is waiting for someone to snap out of it.

Competition isn’t a robust safeguard, and it certainly isn’t optimal. A careful deliberator would make early steps to ensure that their deliberation had the same kind of robustness conferred by competition — for example they would be on the lookout for any places where their choices would lead to them getting outcompeted “in the wild” and then think carefully about whether they endorse those choices anyway. But I’m afraid that most of us are below the bar where paternalistically forcing us to “keep ourselves honest” is valuable.

I don’t really have a settled view on these questions. Overall I still feel comfortable with decoupling, but I hope that we can collectively decide on some regime that captures some of the benefits of this kind of “competitive discipline” without the costs. For example, even in a mostly-decoupled world we could end up agreeing on different domains of “safe” competition (e.g. it feels much better for states to compete on “being a great place to live” than to fight wars), or imposing temporary paternalistic restrictions and relaxing them only once some reasonably high bar of competence is demonstrated.

The balance of power affects deliberation

Negotiation and compromise is an important part of deliberation, but it depends on the current state of competition.

Suppose that Alice and Bob start talking while they don’t know who is going to end up more influential. But they are talking slowly, in the way most comfortable to them, while competition continues to accelerate at the maximum possible rate. So before they can reach any agreement, it may be clear that one of them is vastly more influential.

Alice and Bob would have preferred to make an early agreement to treat each other with respect, back when they were both ignorant about who would end up with the power. But this opportunity is lost forever once they have seen the outcome.

Alice and Bob can try to avoid this by quickly reaching a compromise. But that seems hard, and having to make a precise agreement fast may take them far away from the deliberative process they would have preferred.

I don’t have any real story about coping with this problem, though I’m less worried about it than persuasion. Some possible (but pretty weird) approaches:

If Alice and Bob need to reach an agreement in a hurry, they may be able to make some minimal agreement like “we’ll both isolate ourselves so we don’t see the result of the competition until we’ve reached a better agreement” or “long after the competition is over, we’ll allocate resources based on a high-fidelity prediction of what we would have agreed to if we had never seen the result of the competition.”
After winning the competition and waiting for considerable technological progress, Alice can simulate what Bob would have done if he had won the competition. If this is done carefully, I think you can get to the situation where Alice really doesn’t know if she won the competition (or if she is just in a simulation run by Bob to figure out how nice to be to Alice). Then we can have a discussion between Alice and simulated-Bob (who thinks of this as a conversation between Bob and simulated-Alice) from that state of ignorance.

The singularity, the distant future, and the “long reflection”

In some ways my picture of the future is very aggressive/unusual. For example I think that we are likely to see explosive economic growth and approximate technological maturity within the next 50–100 years (and potentially much sooner).

But in other ways it feels like I have a much more “boring” picture of the future. I expect technology could radically transform the world on a timescale that would be disorienting to people, but for the most part that’s not how we want our lives to go in order to have the best chance of reaching the best conclusions about what to do in the long run. We do want some effects of technology — we would like to stop being so hungry and sick, to have a little bit less reason to be at each other’s throats, and so on — but we also want to be isolated from the incomprehensible, and to make some changes slowly and carefully.

So I expect there to be a very recognizable thread running through humanity’s story, where many of the humans alive today just continue to being human and growing in a way that is familiar and comfortable, perhaps changing more quickly than we have in the past but never so quickly that we are at risk of losing our footing. The point of this is not because that’s how to have the best life (which may well involve incomprehensible mind-alteration or hyper-optimized virtual reality or whatever). It’s because we still have a job to do.

The fact that you are able to modify a human to be much smarter does not mean that you need to, and indeed I think it’s important that you take that process slow. The kinds of moral change we are most familiar with and trust involve a bunch of people thinking and talking, gradually refining their norms and making small changes to their nature, raising new generations one after another.

During that time we have a lot to do to safeguard the process; to become more and more comfortable that it’s proceeding in a good direction even as we become wiser and wiser; to do lots of moral philosophy and political philosophy and psychology at every stage in case they provide clues about how to take the next step wisely. We can take the things that scare us or that we dislike about ourselves, and we can very gingerly remove or change them piece by piece. But I think it doesn’t have to be nearly as weird as people often imagine it.

Moreover, I think that the community of humans taking things slowly and living recognizable lives isn’t an irrelevant sideshow that anyone serious would ignore in favor of thinking about the crazy stuff AI is doing “out there” (or the hyper-optimized experiences some of our descendants may immerse themselves in). I think there’s a real sense in which it’s the main thread of the human story; it’s the thread that determines our future and gradually expands to fill the universe.

Put differently, I think people sometimes imagine abdicating responsibility to crazy AI systems that humans build. I think that will happen someday, but not when we can first build AI — indeed, it won’t happen until those AI systems no longer seem crazy.

In the weirdest cases, we decouple by building an AI that merely needs to think about what humans would want rather than deferring to any real flesh-and-blood humans. But even those cases are more like a change than an ending — we pack up our things from Earth and continue our story inside a homey simulation. And personally I don’t expect to do even that until everyone is good and ready for it, many years after it first becomes possible.

I reasonably often find myself grateful that some dysfunctional norms or epistemic practices will most likely become obsolete. It’s a bit scary to think about a world where the only solution is waiting for someone to snap out of it.

I've been thinking a lot about this lately, so I'm glad to see that it's on your mind too, although I think I may still be a bit more concerned about it than you are. Couple of thoughts:

What if our "deliberation" only made it as far as it did because of "competition", and that nobody or very few people knows how to deliberate correctly in the absence of competitive pressures? Basically, our current epistemic norms/practices came from the European Enlightenment, and they were spread largely via conquest or people adopting them to avoid being conquered or to compete in terms of living standards, etc. It seems that in the absence of strong competitive pressures of a certain kind, societies can quickly backslide or drift randomly in terms of epistemic norms/practices, and we don't know how to prevent this.
What's your expectation of the fraction of total potential value that will be lost due to people failing to deliberate correctly (e.g., failing to ever "snap out of it", or getting "persuaded" by bad memes and then asking their AIs to lock in their beliefs/values)? It seems to me that it's very large, easily >50%. I'm curious how others would answer this question as well.

Alice and Bob can try to have an agreement to avoid racing ahead or engaging in some kinds of manipulation, and analogous a broader society could adopt such norms or divide into communities with internal agreements of this form.

In a sane civilization, tons of people would already be studying how to make and enforce such agreements, e.g., how to define what kinds of behaviors count as "manipulation", and more generally what are good epistemic norms/practices and how to ensure that many people adopt such norms/practices. If this problem is solved, then maybe we don't need to solve metaphilosophy (in the technical or algorithmic sense), as far as preventing astronomical waste arising from bad deliberation. Unfortunately it seems there's approximately zero people working on either problem.

I would rate "value lost to bad deliberation" ("deliberation" broadly construed, and including easy+hard problems and individual+collective failures) as comparably important to "AI alignment." But I'd guess the total amount of investment in the problem is 1-2 orders of magnitude lower, so there is a strong prima facie case for longtermists prioritizing it.

Overall I think I'm quite a bit more optimistic than you are, and would prioritize these problems less than you would, but still agree directionally that these problems are surprisingly neglected (and I could imagine them playing more to the comparative advantages/interests of longermists and the LW crowd than topics like AI alignment).

What if our "deliberation" only made it as far as it did because of "competition", and that nobody or very few people knows how to deliberate correctly in the absence of competitive pressures? Basically, our current epistemic norms/practices came from the European Enlightenment, and they were spread largely via conquest or people adopting them to avoid being conquered or to compete in terms of living standards, etc. It seems that in the absence of strong competitive pressures of a certain kind, societies can quickly backslide or drift randomly in terms of epistemic norms/practices, and we don't know how to prevent this.

This seems like a quantitative difference, basically the same as your question 2. "A few people might mess up and it's good that competition weeds them out" is the rosy view, "most everyone will mess up and it's good that competition makes progress possible at all" is the pessimistic view (or even further that everyone would mess up and so you need to frequently split groups and continue applying selection).

We've talked about this a few times but I still don't really feel like there's much empirical support for the kind of permanent backsliding you're concerned about being widespread. Maybe you think that in a world with secure property rights + high quality of life for everyone (what I have in mind as a prototypical decoupling) the problem would be much worse. E.g. maybe communist china only gets unstuck because of their failure to solve basic problems in physical reality. But I don't see much evidence for that (and indeed failures of property rights / threats of violence seem to play an essential role in many scenarios with lots of backsliding).

What's your expectation of the fraction of total potential value that will be lost due to people failing to deliberate correctly (e.g., failing to ever "snap out of it", or getting "persuaded" by bad memes and then asking their AIs to lock in their beliefs/values)? It seems to me that it's very large, easily >50%. I'm curious how others would answer this question as well.

There are some fuzzy borders here, and unclarity about how to define the concept, but maybe I'd guess 10% from "easy" failures to deliberate (say those that could be avoided by the wisest existing humans and which might be significantly addressed, perhaps cut in half, by competitive discipline) and a further 10% from "hard" failures (most of which I think would not be addressed by competition).

It seems to me like the main driver of the first 10% risk is the ability to lock in a suboptimal view (rather than a conventional deliberation failure), and so the question is when that becomes possible, what views towards it are like, and so on. This is one of my largest concerns about AI after alignment.

I am most inclined to intervene via "paternalistic" restrictions on some classes of binding commitments that might otherwise be facilitated by AI. (People often talk about this concern in the context of totalitarianism, whereas that seems like a small minority of the risk to me / it's not really clear whether a totalitarian society is better or worse on this particular axis than a global democracy.)

We’ve talked about this a few times but I still don’t really feel like there’s much empirical support for the kind of permanent backsliding you’re concerned about being widespread.

I'm not claiming direct empirical support for permanent backsliding. That seems hard to come by, given that we can't see into the far future. I am observing quite severe current backsliding. For example, explicit ad hominem attacks, as well as implicitly weighing people's ideas/arguments/evidence differently, based on things like the speaker's race and sex, have become the norm in local policy discussions around these parts. AFAICT, this originated from academia, under "standpoint epistemology" and related ideas.

On the other side of the political spectrum, several people close to me became very sure that "the election was stolen" due to things like hacked Dominion machines and that the military and/or Supreme Court was going to intervene in favor of Trump (to the extent that it was impossible for me to talk them out of these conclusions). One of them, who I had previously thought was smart/sane enough to entrust a great deal of my financial resources with, recently expressed concern for my life because I was going to get the COVID vaccine.

Is this an update for you, or have you already observed such things yourself or otherwise known how bad things have become?

There are some fuzzy borders here, and unclarity about how to define the concept, but maybe I’d guess 10% from “easy” failures to deliberate (say those that could be avoided by the wisest existing humans and which might be significantly addressed, perhaps cut in half, by competitive discipline) and a further 10% from “hard” failures (most of which I think would not be addressed by competition).

Given these numbers, it seems that you're pretty sure that almost everyone will eventually "snap out of" any bad ideas they get talked into, or they talk themselves into. Why? Is this based on some observations you've made that I haven't seen, or history that you know about that I don't? Or do you have some idea of a mechanism by which this "snapping out of" happens?

Here's an idea of how random drift of epistemic norms and practices can occur. Beliefs (including beliefs about normative epistemology) function in part as a signaling device, similar to clothes. (I forgot where I came across this idea originally, but a search produced a Robin Hanson article about it.) The social dynamics around this kind of signaling produces random drift in epistemic norms and practices, similar to random drift in fashion / clothing styles. Such drift coupled with certain kinds of competition could have produced the world we have today (i.e., certain groups happened upon especially effective norms/practices by chance and then spread their influence through competition), but may lead to disaster in the future in the absence of competition, as it's unclear what will then counteract future drift that will cause continued deterioration in epistemic conditions.

Another mechanism for random drift is technological change that disrupts previous epistemic norms/practices without anyone specifically intending to. I think we've seen this recently too, in the form of, e.g., cable news and social media. It seems like you're envisioning that future humans will deliberately isolate their deliberation from technological advances (until they're ready to incorporate those advances into how they deliberate), so in that scenario perhaps this form of drift will stop at some point, but (1) it's unclear how many people will actually decide to do that, and (2) even in that scenario there will still be a large amount of drift between the recent past (when epistemic conditions still seemed reasonably ok, although I had my doubts even back then), which (together with other forms of drift) might never be recovered from.

As another symptom what's happening (the rest of this comment is in a "paste" that will expire in about a month, to reduce the risk of it being used against me in the future)

Current human deliberation and discourse are strongly tied up with a kind of resource gathering and competition, and because of this I don't have a good picture of how things will look after the two are decoupled, nor know how to extrapolate past performance (how well human deliberation worked in the past and present) into this future.

Currently, people's thinking and speech are in large part ultimately motivated by the need to signal intelligence, loyalty, wealth, or other "positive" attributes, which help to increase one's social status and career prospects, and attract allies and mates, which are of course hugely important forms of resources, and some of the main objects of competition among humans.

Once we offload competition to AI assistants, what happens to this motivation behind discourse and deliberation, and how will that affect discourse and deliberation itself? Can you say more about what you envision happening in your scenario, in this respect?

I'm curious about how this interacts with space colonisation. The default path of efficient competition would likely lead to maximally fast space-colonisation, to prevent others from grabbing it first. But this would make deliberating together with other humans a lot trickier, since some space ships would go to places where they could never again communicate with each other. For things to turn out ok, I think you either need:

to pause before space colonisation.
to finish deliberating and bargaining before space colonisation.
to equip each space ship with the information necessary for deciding what to do with the space they grab. In order of increasing ambitiousness:

You could upload a few leaders' or owners' brains (or excellent predictive model thereof) and send them along with their respective colonisation ships; hoping that they will individually reach good decisions without discussing with the rest of humanity.
You could also equip each colonisation ship with the uploads of all other human brains that they might want to deliberate with (or excellent predictive models thereof), so that they can use those other human as discussion partners and data for their deliberation-efforts.
You also set up these uploads in a way that makes them figure out what bargain would have been struck on Earth; and then have each space ship individually implement this. Maybe this happens by default with acausal trade; or maybe everyone in some reasonably big coalition could decide to follow the decision of some specified deliberative process that they don't have time to run on Earth.

to use some communication scheme that lets you send your space ships ahead to compete in space, and then lets you send instructions to your own ships once you've finished deliberating on Earth.

E.g. maybe you could use cryptography to ensure that your space ships will follow instructions signed with the right code; which you only send out once you've finished bargaining. (Though I'm not sure if your bargaining-partners would be able to verify how your space ships would react to any particular message; so maybe this wouldn't work without significant prior coordination.)

I'm curious wheter you're optimistic about any of these options, or if you have something else in mind.

(Also, all of this assumes that defensive capabilities are a lot stronger than offensive capabilities in space. If offense is comparably strong, than we also have the problem that the cosmic commons might be burned in wars if we don't pause or reach some other agreement before space colonisation.)

I think I'm basically optimistic about every option you list.

I think space colonization is extremely slow relative to deliberation (at technological maturity I think you probably have something like million-fold speedup over flesh and blood humans, and colonization takes place over decades and millennia rather than years). Deliberation may not be "finished" until the end of the universe, but I think we will e.g. have deliberated enough to make clear agreements about space colonization / to totally obsolete existing thinking / likely to have reached a "grand compromise" from which further deliberation can be easily decentralized.
I think it's very easy for someone to purchase a slice of every ship or otherwise ensure representation, and have a delegate they trust (perhaps the same one they would have used for deliberating locally, e.g. just a copy of their favorite souped-up emulation) on every ship. The technology for that seems to come way before tech for maximally fast space colonization (and you don't really leave the solar system until you have extremely mature space colonization, since you'll get very easily overtaken later). That could involve people having influence over each of the colonization projects, or could involve delegates whose only real is to help inform someone who actually has power in the project / to participate in acausal trade.
I think it's fairly likely that space ships will travel slowly enough that you can beam information to them and do the kind of scheme you outline where you deliberate at home and then beam instructions out. I think this is pretty unlikely, but if everything else fails it would probably be reasonably painless. I think the main obstruction would be leaving your descendants abroad vulnerable if your descendants at home get compromised. (It's also a problem if descendants at home go off the rails, but getting compromised is more concerning because it can happen to either descendants abroad or at home).

(Also, all of this assumes that defensive capabilities are a lot stronger than offensive capabilities in space. If offense is comparably strong, than we also have the problem that the cosmic commons might be burned in wars if we don't pause or reach some other agreement before space colonisation.)

This seems like maybe the most likely single reason you need to sort everything out in advance, though the general consideration in favor of option value (and waiting a year or two being no big deal) seems even more important. I do expect to have plenty of time to do that.

I haven't thought about any of these details much because it seems like such an absurdly long subjective time before we leave the solar system, and so there will be huge amounts of time for our descendants to make bargains before them. I am much more concerned about destructive technologies that require strong coordination long before we leave. (Or about option value lost by increasing the computational complexity of your simulation and so becoming increasingly uncorrelated with some simulators.)

One reason you might have to figure these things out in advance is if you try to decouple competition from deliberation by doing something like secure space rights (i.e. binding commitments to respect property rights, have no wars ever, and divide up the cosmos in an agreeable way). It's a bit hard to see how we could understand the situation well enough to reach an agreeable compromise directly (rather than defining a mutually-agreeable deliberative process to which we will defer and which has enough flexibility to respond to unknown unknowns about colonization dynamics) but if it was a realistic possibility then it might require figuring a lot of stuff out sooner rather than later.

Thanks, computer-speed deliberation being a lot faster than space-colonisation makes sense. I think any deliberation process that uses biological humans as a crucial input would be a lot slower, though; slow enough that it could well be faster to get started with maximally fast space colonisation. Do you agree with that? (I'm a bit surprised at the claim that colonization takes place over "millenia" at technological maturity; even if the travelling takes millenia, it's not clear to me why launching something maximally-fast – that you presumably already know how to build, at technological maturity – would take millenia. Though maybe you could argue that millenia-scale travelling time implies millenia-scale variance in your arrival-time, in which case launching decades or centuries after your competitors doesn't cost you too much expected space?)

If you do agree, I'd infer that your mainline expectation is that we succesfully enforce a worldwide pause before mature space-colonisation; since the OP suggests that biological humans are likely to be a significant input into the deliberation process, and since you think that the beaming-out-info schemes are pretty unlikely.

(I take your point that as far as space-colonisation is concerned; such a pause probably isn't strictly necessary.)

I agree that biological human deliberation is slow enough that it would need to happen late.

By "millennia" I mostly meant that traveling is slow (+ the social costs of delay are low, I'm estimating like 1/billionth of value per year of delay). I agree that you can start sending fast-enough-to-be-relevant ships around the singularity rather than decades later. I'd guess the main reason speed matters initially is for grabbing resources from nearby stars under whoever-gets-their-first property rights (but that we probably will move away from that regime before colonizing).

I do expect to have strong global coordination prior to space colonization. I don't actually know if you would pause long enough for deliberation amongst biological humans to be relevant. So on reflection I'm not sure how much time you really have as biological humans. In the OP I'm imagining 10+ years (maybe going up to a generation) but that might just not be realistic.

Probably my single best guess is that some (many?) people would straggle out over years or decades (in the sense that relevant deliberation for controlling what happens with their endowment would take place with biological humans living on earth), but that before that there would be agreements (reached at high speed) to avoid them taking a huge competitive hit by moving slowly.

But my single best guess is not that likely and it seems much more likely that something else will happen (and even that I would conclude that some particular other thing is much more likely if I thought about it more).

Interesting essay!

In your scenario where people deliberate while their AIs handle all the competition on their behalf, you note that persuasion is problematic: this is partly because, with intent-aligned AIs, the system is vulnerable to persuasion in that "what the operator intends" can itself become a target of attack during conflict.

Here is another related issue. In a sufficiently weird or complex situation, "what the operator intends" may not be well-defined -- the operator may not know it, and the AI may not be able to infer it with confidence. In this case, clarifying what the human really wants seems to require more deliberation, which is what we were trying to screen off in the first place!

Furthermore, it seems to me that unbounded competition tends to continually spiral out, encompassing more and more stuff, and getting weirder and more complex: there are the usual arms race dynamics. There are anti-inductive dynamics around catching your opponent by surprise by acting outside their ontology. And there is also just the march of technology, which in your scenario hasn't stopped, and which keeps creating new possibilities and new dimensions for us to grapple with around what we really want. (I'm using state-run social media disinformation campaigns as an intuition pump here.)

So in your scenario, I just imagine the human operators getting overwhelmed pretty quickly, unable to keep from being swept up in conflict. This is unless we have some kind of pretty strong limits on it.

Planned summary for the Alignment Newsletter:

Under a [longtermist](https://forum.effectivealtruism.org/tag/longtermism) lens, one problem to worry about is that even after building AI systems, humans will spend more time competing with each other rather than figuring out what they want, which may then lead to their values changing in an undesirable way. For example, we may have powerful persuasion technology that everyone uses to persuade people to their line of thinking; it seems bad if humanity’s values are determined by a mix of effective persuasion tools, especially if persuasion significantly diverges from truth-seeking.
One solution to this is to coordinate to _pause_ competition while we deliberate on what we want. However, this seems rather hard to implement. Instead, we can at least try to _decouple_ competition from deliberation, by having AI systems acquire <@flexible influence@>(@The strategy-stealing assumption@) on our behalf (competition), and having humans separately thinking about what they want (deliberation). As long as the AI systems are competent enough to shield the humans from the competition, the results of the deliberation shouldn’t depend too much on competition, thus achieving the desired decoupling.
The post has a bunch of additional concrete details on what could go wrong with such a plan that I won’t get into here.

Something I find myself noticing as a sort of a gap in the discourse is the lack of the idea of a "right" and specifically the sort of lowest level core of this: a "property right".

It seems to me that such things emerge in nature. When I see dogs (that aren't already familiar) visit each other, and go near each other's beds, or food bowls... it certainly seems to me, when I empathically project myself into the dog's perspectives as though "protection of what I feel is clearly mine against another that clearly wants it" is a motivating factor that can precipitate fights.

(I feel like anyone who has been to a few big summer BBQs where numerous people brought their dogs over will have seem something like this at some point, and such experiences seem normal to me from my youth, but maybe few people in modern times can empathize with my empathy for dogs that get into fights? The evidence I have here might not work as convincing evidence for others... and I'm not sure how to think in a principled way about mutual gaps in normatively formative experiences like this.)

Unpacking a bit: the big danger seems often to be when a weak/old/small dog has a strong/mature/big dog show up as a visitor in their territory.

If the visitor is weak, they tend not to violate obvious norms. Its just polite. Also its just safe. Also... yeah. The power and the propriety sort of naturally align.

But if the visitor is strong and/or oblivious and/or mischievous they sometimes seem to think they can "get away with" taking a bite from another dog's bowl, or laying down in another dog's comfy bed, and then the weaker dog (often not seeming to know that the situation is temporary, and fearful of precedents, and desperate to retain their livelihood at the beginning of a new struggle while they still have SOME strength?) will not back down... leading to a fight?

The most salient counter-example to the "not talking about property rights" angle, to me, would be Robin Hanson's ideas which have been floating around for a long time, and never really emphasized that I've seen?

Here's a working (counter) example from 2009 where Robin focuses on the trait of "law-abidingness" as the thing to especially desire in future robots, and then towards the end he connects this directly to property rights:

The later era when robots are vastly more capable than people should be much like the case of choosing a nation in which to retire. In this case we don’t expect to have much in the way of skills to offer, so we mostly care that they are law-abiding enough to respect our property rights. [bold not in original] If they use the same law to keep the peace among themselves as they use to keep the peace with us, we could have a long and prosperous future in whatever weird world they conjure. In such a vast rich universe our “retirement income” should buy a comfortable if not central place for humans to watch it all in wonder.

Obviously it might be nice if (presuming the robots become autonomous) they take care of us out of some sense of charity or what have you? Like... they have property, then they give it up for less than it costs. To be nice. That would be pleasant I think.

However, we might download our minds into emulation environments, and we might attach parts of the simmed environments to external world measurements, and we might try to put a virtual body into causal correspondence with robotic bodies... so then we could have HUMANS as the derivative SOURCE of the robot minds, and then... well... humans seem to vary quite a bit on how charitable they are? :-(

But at least we expect humans not to steal, hopefully... Except maybe we expect them to do that other "special" kind of theft sometimes... and sometimes we want to call this transfer good? :-/

I feel like maybe "just war" and "just taxation" and so on could hypothetically exist, but also like they rarely exist in practice in observed history... and this is a central problem when we imagine AIs turning all the processes of history "up to 11, and on fast forward, against humans"?

Also, however, I sort of fear this framing... it seems rare in practice in our discourse, and perhaps likely to cause people to not become thereby BETTER at discussing the topic?

Perhaps someone knows of a good reason for "property and rights and property rights and laws and taming the (often broken) government itself" to remain a thing we rarely talk about?

If Alice and Bob are talking to each other as they deliberate

I think this is a typo, it should say "compete" instead of "deliberate".

I worry about persuasion becoming so powerful that it blocks deliberation: How can Alice know whether Bob (or his delegated AI) is deliberating in good faith or trying to manipulate her?

In this scenario, small high-trust communities can still deliberate, but mutual mistrust prevents them from communicating their insights to the rest of the world.

I meant "while they deliberate," as in the deliberation involves them talking to work out their differences or learn from each other. But of course the concern is that this in itself introduces an opportunity for competition even if they had otherwise decoupled deliberation, and indeed the line between competition and deliberation doesn't seem crisp for groups.

I think this exchange between Paul Christiano (author) and Wei Dai (commenter) is pretty important food for thought, for anyone interested in achieving a good future in the long run, and for anyone interested in how morality and society evolve more generally.

I reasonably often find myself grateful that some dysfunctional norms or epistemic practices will most likely become obsolete. It’s a bit scary to think about a world where the only solution is waiting for someone to snap out of it.

I've been thinking a lot about this lately, so I'm glad to see that it's on your mind too, although I think I may still be a bit more concerned about it than you are. Couple of thoughts:

What if our "deliberation" only made it as far as it did because of "competition", and that nobody or very few people knows how to deliberate correctly in the absence of competitive pressures? Basically, our current epistemic norms/practices came from the European Enlightenment, and they were spread largely via conquest or people adopting them to avoid being conquered or to compete in terms of living standards, etc. It seems that in the absence of strong competitive pressures of a certain kind, societies can quickly backslide or drift randomly in terms of epistemic norms/practices, and we don't know how to prevent this.
What's your expectation of the fraction of total potential value that will be lost due to people failing to deliberate correctly (e.g., failing to ever "snap out of it", or getting "persuaded" by bad memes and then asking their AIs to lock in their beliefs/values)? It seems to me that it's very large, easily >50%. I'm curious how others would answer this question as well.

Alice and Bob can try to have an agreement to avoid racing ahead or engaging in some kinds of manipulation, and analogous a broader society could adopt such norms or divide into communities with internal agreements of this form.

What if our "deliberation" only made it as far as it did because of "competition", and that nobody or very few people knows how to deliberate correctly in the absence of competitive pressures? Basically, our current epistemic norms/practices came from the European Enlightenment, and they were spread largely via conquest or people adopting them to avoid being conquered or to compete in terms of living standards, etc. It seems that in the absence of strong competitive pressures of a certain kind, societies can quickly backslide or drift randomly in terms of epistemic norms/practices, and we don't know how to prevent this.

What's your expectation of the fraction of total potential value that will be lost due to people failing to deliberate correctly (e.g., failing to ever "snap out of it", or getting "persuaded" by bad memes and then asking their AIs to lock in their beliefs/values)? It seems to me that it's very large, easily >50%. I'm curious how others would answer this question as well.

We’ve talked about this a few times but I still don’t really feel like there’s much empirical support for the kind of permanent backsliding you’re concerned about being widespread.

Is this an update for you, or have you already observed such things yourself or otherwise known how bad things have become?

There are some fuzzy borders here, and unclarity about how to define the concept, but maybe I’d guess 10% from “easy” failures to deliberate (say those that could be avoided by the wisest existing humans and which might be significantly addressed, perhaps cut in half, by competitive discipline) and a further 10% from “hard” failures (most of which I think would not be addressed by competition).

As another symptom what's happening (the rest of this comment is in a "paste" that will expire in about a month, to reduce the risk of it being used against me in the future)

to pause before space colonisation.
to finish deliberating and bargaining before space colonisation.
to equip each space ship with the information necessary for deciding what to do with the space they grab. In order of increasing ambitiousness:

You could upload a few leaders' or owners' brains (or excellent predictive model thereof) and send them along with their respective colonisation ships; hoping that they will individually reach good decisions without discussing with the rest of humanity.
You could also equip each colonisation ship with the uploads of all other human brains that they might want to deliberate with (or excellent predictive models thereof), so that they can use those other human as discussion partners and data for their deliberation-efforts.
You also set up these uploads in a way that makes them figure out what bargain would have been struck on Earth; and then have each space ship individually implement this. Maybe this happens by default with acausal trade; or maybe everyone in some reasonably big coalition could decide to follow the decision of some specified deliberative process that they don't have time to run on Earth.

to use some communication scheme that lets you send your space ships ahead to compete in space, and then lets you send instructions to your own ships once you've finished deliberating on Earth.

E.g. maybe you could use cryptography to ensure that your space ships will follow instructions signed with the right code; which you only send out once you've finished bargaining. (Though I'm not sure if your bargaining-partners would be able to verify how your space ships would react to any particular message; so maybe this wouldn't work without significant prior coordination.)

I'm curious wheter you're optimistic about any of these options, or if you have something else in mind.

I think I'm basically optimistic about every option you list.

I think space colonization is extremely slow relative to deliberation (at technological maturity I think you probably have something like million-fold speedup over flesh and blood humans, and colonization takes place over decades and millennia rather than years). Deliberation may not be "finished" until the end of the universe, but I think we will e.g. have deliberated enough to make clear agreements about space colonization / to totally obsolete existing thinking / likely to have reached a "grand compromise" from which further deliberation can be easily decentralized.
I think it's very easy for someone to purchase a slice of every ship or otherwise ensure representation, and have a delegate they trust (perhaps the same one they would have used for deliberating locally, e.g. just a copy of their favorite souped-up emulation) on every ship. The technology for that seems to come way before tech for maximally fast space colonization (and you don't really leave the solar system until you have extremely mature space colonization, since you'll get very easily overtaken later). That could involve people having influence over each of the colonization projects, or could involve delegates whose only real is to help inform someone who actually has power in the project / to participate in acausal trade.
I think it's fairly likely that space ships will travel slowly enough that you can beam information to them and do the kind of scheme you outline where you deliberate at home and then beam instructions out. I think this is pretty unlikely, but if everything else fails it would probably be reasonably painless. I think the main obstruction would be leaving your descendants abroad vulnerable if your descendants at home get compromised. (It's also a problem if descendants at home go off the rails, but getting compromised is more concerning because it can happen to either descendants abroad or at home).

(Also, all of this assumes that defensive capabilities are a lot stronger than offensive capabilities in space. If offense is comparably strong, than we also have the problem that the cosmic commons might be burned in wars if we don't pause or reach some other agreement before space colonisation.)

(I take your point that as far as space-colonisation is concerned; such a pause probably isn't strictly necessary.)

I agree that biological human deliberation is slow enough that it would need to happen late.

Interesting essay!

Planned summary for the Alignment Newsletter:

Under a [longtermist](https://forum.effectivealtruism.org/tag/longtermism) lens, one problem to worry about is that even after building AI systems, humans will spend more time competing with each other rather than figuring out what they want, which may then lead to their values changing in an undesirable way. For example, we may have powerful persuasion technology that everyone uses to persuade people to their line of thinking; it seems bad if humanity’s values are determined by a mix of effective persuasion tools, especially if persuasion significantly diverges from truth-seeking.
One solution to this is to coordinate to _pause_ competition while we deliberate on what we want. However, this seems rather hard to implement. Instead, we can at least try to _decouple_ competition from deliberation, by having AI systems acquire <@flexible influence@>(@The strategy-stealing assumption@) on our behalf (competition), and having humans separately thinking about what they want (deliberation). As long as the AI systems are competent enough to shield the humans from the competition, the results of the deliberation shouldn’t depend too much on competition, thus achieving the desired decoupling.
The post has a bunch of additional concrete details on what could go wrong with such a plan that I won’t get into here.

Something I find myself noticing as a sort of a gap in the discourse is the lack of the idea of a "right" and specifically the sort of lowest level core of this: a "property right".

Unpacking a bit: the big danger seems often to be when a weak/old/small dog has a strong/mature/big dog show up as a visitor in their territory.

If the visitor is weak, they tend not to violate obvious norms. Its just polite. Also its just safe. Also... yeah. The power and the propriety sort of naturally align.

The later era when robots are vastly more capable than people should be much like the case of choosing a nation in which to retire. In this case we don’t expect to have much in the way of skills to offer, so we mostly care that they are law-abiding enough to respect our property rights. [bold not in original] If they use the same law to keep the peace among themselves as they use to keep the peace with us, we could have a long and prosperous future in whatever weird world they conjure. In such a vast rich universe our “retirement income” should buy a comfortable if not central place for humans to watch it all in wonder.

But at least we expect humans not to steal, hopefully... Except maybe we expect them to do that other "special" kind of theft sometimes... and sometimes we want to call this transfer good? :-/

Also, however, I sort of fear this framing... it seems rare in practice in our discourse, and perhaps likely to cause people to not become thereby BETTER at discussing the topic?

Perhaps someone knows of a good reason for "property and rights and property rights and laws and taming the (often broken) government itself" to remain a thing we rarely talk about?

If Alice and Bob are talking to each other as they deliberate

I think this is a typo, it should say "compete" instead of "deliberate".

I worry about persuasion becoming so powerful that it blocks deliberation: How can Alice know whether Bob (or his delegated AI) is deliberating in good faith or trying to manipulate her?

In this scenario, small high-trust communities can still deliberate, but mutual mistrust prevents them from communicating their insights to the rest of the world.

LESSWRONG
LW

LESSWRONG
LW

98

Decoupling deliberation from competition

98

Ω 45

Decoupling

Alignment and decoupling

Misalignment and coupling

Appendix: a bunch of random thoughts

Persuasion and limits of decoupling

Politics of decoupling

A double-edged sword

The balance of power affects deliberation

The singularity, the distant future, and the “long reflection”

98

Ω 45

98

Ω 45