Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

I'd like to put forward another description of a basic issue that's been around for a while. I don't know if there's been significant progress on a solution, and would be happy to pointed to any such progress. I've opted to go for a relatively rough and quick post that doesn't dive too hard into the details, to avoid losing the thought at all. I may be up for exploring details further in comments or follow-ups.

The Question: How do you respect the wishes (or preferences) of a subject over whom you have a lot of control?

The core problem: any indicator/requirement/metric about respecting their wishes is one you can manipulate (even inadvertently). 

For example, think about trying to respect the preferences of the child you're babysitting when you simply know from experience what they will notice, how they will feel, what they will say they want, and what they will do, when you put them in one environment versus another (where the environment could be as small as what you present to them in your behaviour). Is there any way to provide them a way to meaningfully choose what happens?

We could think about this in a one-shot case where there's a round of information gathering and coming to agreement on terms, and then an action is taken. But I think this is a simplification too far, since a lot of what goes into respecting the subject/beneficiary is giving them space for recourse, space to change their mind, space to realise things that were not apparent with the resources for anticipation they had available during the first phase.

So let's focus more on the case where there's an ongoing situation where one entity has a lot of power over another but nevertheless wants to secure their consent for whatever actually happens, in a meaningful sense.

Lots of cases where this happens in real life, mostly where the powerful entity has a lot of their own agenda and doesn't care a huge amount about the subject (they may care a lot, but maybe not as much as they do about their other goals):

  • rape (the perhaps central example invoked by "consent")
  • advertising
  • representative democracy
  • colonisation ("civilising" as doing what's good for them)

Our intuitions may be mostly shaped by that kind of situation, where there's a strong need to defend against self-interest, corruption, or intention to gain and abuse power. 

But I think there's a hard core of a problem left even if we remove the malicious or somewhat ill-intentioned features from the powerful entity. So let's focus: what does it mean to fully commit to respecting someone's autonomy, as a matter of genuine love or a strong sense of morality or something along those lines, even when you have a huge amount of power over them.

What forms power can take:

  • brute force, resources that give you physical power
  • support from others (that make you - your interests - a larger entity)
  • intelligence: the ability to predict and strategise in more detail, over longer time horizons, and faster, than the subject you are trying to engage with
  • speed - kinda the same as intelligence, but maybe worth pulling out as its own thing
  • knowledge, experience - similar to intelligence. but maybe in this case emphasising access to private relevant information. Think also of information asymmetry in negotiation.

Examples where this shows up in real life already (and where people seem to mostly suck at it, maybe due to not even trying, but there are some attempts to take it seriously: see work by Donaldson and Kymlicka):

  • adaptive preferences
  • children
  • animals (pets, domesticated, and otherwise)
  • disabled people, esp. with cognitive disabilities
  • oppressed/minoritised people and peoples
  • future generations and other non-existent peoples

It may be that the only true solution here is a full commitment to egalitarianism that seeks to remove the power differentials in the first place (to the extent possible: I don't believe it's completely possible), and (somehow) to do structured decision making that is truly joint or communal.

What form does such decision-making need to take? (Hard mode: how could we come to figure out what form it should take together from our current unequal starting point?)

It could also be the case that preferences or wishes are simply not enough of a real thing to be a target of our respect. But then what? What matters? My best guess involves ongoing dialogue and inclusive and accessible community, but I don't have a complete answer. (And it's hard to do this of course while daring to care about relatively powerless subjects exposes one to a great deal of criticism if not ridicule/dismissal - possibly arising from defensiveness about the possibility of having caused harm and possibly continuing to do so.)

New Comment
10 comments, sorted by Click to highlight new comments since:

You can make make people/entities actually equal. You can also remove the need for the weaker entity to get the stronger entities permission. Either go more egalitarian or less authoritarian or both. Its worth noting that if you dont want to be authoritarian its important to blin yourself to information about the weaker party. The ebst way to not be overbearing is to not know what behavior they are getting up to. This is why children's privacy is so important. Its much easier to never known than to resist your urge to meddle.

I have yet to see a good rationalist or even very careful thoughtful treatment of individual or group power disparities.  It's especially difficult for changing situations (children who will increase in self-determination, elderly who are decreasing, and drunk or drugged people who may or may not be more aware in the next encounter).

In none of those cases can (or should) the power differential be removed.  It needs to be accepted and incorporated into behaviors and attitudes.  Honestly, this is the hard (and possibly unsolvable) question for AI alignment - when another entitity is smarter and more powerful than me, how do I want it to think of "for my own good"?


In none of those cases can (or should) the power differential be removed.

I agree -- in any situation where a higher-power individual feels that they have a duty to care for the wellbeing of a lower-power individual, "removing the power differential" ends up meaning abandoning that duty.

However, in the question of consent specifically, I think it's reasonable for a higher-power individual to create the best model they can of the lower-power individual, and update that model diligently upon gaining any new information that it had predicted the subject imperfectly. Having the more-powerful party consider what they'd want if they were in the exact situation of the less-powerful party (including having all the same preferences, experiences, etc) creates what I'd consider a maximally fair negotiation.

when another entitity is smarter and more powerful than me, how do I want it to think of "for my own good"?

I would want a superintelligence to imagine that it was me, as accurately as it could, and update that model of me whenever my behavior deviates from the model. I'd then like it to run that model at an equivalent scale and power to itself (or a model of itself, if we're doing this on the cheap) and let us negotiate as equals. To me, equality feels like a good-faith conversation of "here's what I want, what do you want, how can we get as close as possible to maximizing both?", and I want the chance to propose ways of accomplishing the superintelligence's goals that are maximally compatible with me also accomplishing my own.

Then again, the concept of a superintelligence focusing solely on what's for my individual good kind of grosses me out. I prefer the idea of it optimizing for a lot of simultaneous goods -- the universe,the species, the neighborhood,the individual -- and explaining who else's good won and why if I inquire about why my individual good wasn't the top priority in a given situation.

I think it's reasonable for a higher-power individual to create the best model they can of the lower-power individual, and update that model diligently upon gaining any new information that it had predicted the subject imperfectly

I think that's reasonable too, but for moral/legal discussions, "reasonable" is a difficult standard to apply.  The majority of humans are unreasonable on at least some dimensions, and a lot of humans are incapable of modeling others particularly well.  And there are a lot of humans who are VERY hard to model, because they really aren't motivated the way we expect they "should" be, and "what they want" is highly indeterminate.  Young children very often fall into this category.  

What's the minimum amount of fidelity a model should have before abandonment is preferred?  I don't know.  

I'm surprised by the list of forms of power by what it leaves out.

A stereotypical example of power differences is bosses having relationships with their employees.

The boss has power over a different domain of the life of the employee than the domain of the relationship.

It's the problem of corruption where power from one domain leaks into a different domain where it doesn't belong.

If there's an option to advance one's career by sleeping with one's boss, that makes it issues of consent more tricky. Career incentives might pressure a person in the relationship even if they wouldn't want to be in it otherwise.

Just to confirm that this is a great example and wasn't deliberately left out.


Conversation about such decisions has to happen in the best common language available. This is very obvious with animals, where teaching them human language requires far more effort from everyone than learning how they already communicate and meeting them on their own intellectual turf.

Also, it's rare to have only a single isolated power differential in play. There are usually several, pointing in different directions. Draft animals can destroy stuff and injure people if they panic; pets can destroy their owners' possessions. Oppressed human populations can revolt; oppressed individuals can rebel in all kinds of creatively dangerous ways. In the rare event of dealing with only a single power gradient at once, being on top is easy because you decide what you're doing and then you do it and it works. But with multiple power gradients simultaneously in play, staying "on top" is a high-effort process and a good-faith negotiation can only happen when every participant puts in the effort to not be a jerk in the areas where their power happens to exceed that of others.

Suppose that the more powerful being is aligned to the less powerful: that is to say that (as should be the case in the babysitting example you give) the more powerful being's fundamental motive is the well-being of the less powerful being.. Assume also that a lot of the asymmetry is of intellectual capacity: the more powerful being is also a great deal smarter. I think the likely and correct outcome is that there isn't always consent, the less powerful being is frequently being manipulated into actions and reactions that they haven't actually consented to, and might not even be capable of realizing why they should consent to — but ones that, if they were as intellectually capable as the more powerful being, they would in fact consent to.

I also think that,. for situations where the less powerful being is able to understand the alternatives and make an rational and informed decision, and wants to, the more powerful should give them the option and let them do so.. That's the polite, respectful way to do things But often that isn't going to be practical, or desirable. and the baby sitter should just distract the baby before they get into the dangerous situation.

Consent is a concept that fundamentally assumes that I am the best person available to make decisions about my own well-being. Outside parental situations, for interactions between evolved intelligence like humans, that's almost invariably true. But if I had a superintelligence aligned to me, then yes, I would want it to keep me away from dangers so complex that I'm not capable of making an informed decision about them.

Relevant post by Richard Ngo: "Moral Strategies at different capability levels". Crucial excerpt:

Let’s consider three ways you can be altruistic towards another agent:

  • You care about their welfare: some metric of how good their life is (as defined by you). I’ll call this care-morality - it endorses things like promoting their happiness, reducing their suffering, and hedonic utilitarian behavior (if you care about many agents).
  • You care about their agency: their ability to achieve their goals (as defined by them). I’ll call this cooperation-morality - it endorses things like honesty, fairness, deontological behavior towards others, and some virtues (like honor).
  • You care about obedience to them. I’ll call this deference-morality - it endorses things like loyalty, humility, and respect for authority.


  • Care-morality mainly makes sense as an attitude towards agents who are much less capable than you, and/or can't make decisions for themselves - for example animals, future people, and infants.


  • Cooperation-morality mainly makes sense as an attitude towards agents whose capabilities are comparable to yours - for example others around us who are trying to influence the world.


  • Deference-morality mainly makes sense as an attitude towards trustworthy agents who are much more capable than you - for example effective leaders, organizations, communities, and sometimes society as a whole.

Thanks for this! I think the categories of morality is a useful framework. I am very wary of the judgement that care-morality is appropriate for less capable subjects - basically because of paternalism.