Cooperating with agents with different ideas of fairness, while resisting exploitation

[-]wedrifid12y100

This translates into an informal principle of negotiations: Be willing to accept unfair bargains, but only if (you make it clear) both sides are doing worse than what you consider to be a fair bargain.

That's clever and opens up a lot of extra scope for 'imperfect' cooperation, without any exploitation problem. I notice that this matches my 'fairness' instincts and some of my practice while playing strategy games. Unfortunately I don't recall reading the principle formally specified anywhere.

[-]Stuart_Armstrong12y50

Solution concept implementing this approach (as I understand it):

Player X chooses Pareto fair outcome (X→X, X→Y), (X→Y can be read as "player X's fair utility assignment to player Y"), player Y chooses fair outcome (Y→X, Y→Y).

The actual outcome is (Y→X, X→Y)

(If you have a visual imagination in maths, as I do, you can see this graphically as the Pareto maximum among all the points Pareto worse than both fair outcomes).

This should be unexploitable in some senses, as you're not determining your own outcome, but only that of the other player.

Since it's not Pareto, it's still possible to negotiate over possible improvements ("if I change my idea of fairness towards the middle, will you do it too?") and blackmail is possible in that negotiation process. Interesting idea, though.

[-]wedrifid12y150

Conclusion: Stuart's solution is flawed because it fails to blackmail pirates appropriately.

Thoughts:

Eliezer's solution matched my intuitions for how negotiation feels like it 'should' work.
Analyzing Stuart's solution and accompanying diagram changed my mind.
Stuart's solution does Pareto-dominate Eliezer's.
There is no incentive for either player to deviate from Stuart's solution.
Unfortunately, 'no incentive to deviate' is not sufficient for creating stable compliance even among perfectly rational agents, let alone even slightly noisy agents.
When the other agent receives an identical payoff for giving me low utility as it does for giving me high utility then the expected behaviour of a rational opponent is approximately undefined. It's entirely arbitrary.
A sane best practice would be to assume that of all outcomes with equal utility (to them) the other agent will probably choose the action that screws me over the most.
At very best we could say that we are granting the other agent the power to punish me for free on a whim---for most instrumental purposes this is a bad thing.
Consider a decision algorithm that, when evaluating the desirability of outcomes, first sorts by utility and then reverse-sorts by utility-for-other. In honour of the Pirate game I will call agents implementing that algorithm "pirates". (The most alternative name would be 'assholes'.)
Pirates are rational agents in the same sense as usually used for game theory purposes. They simply have defined behaviour in the place where 'rational' was previously undefined.
Eliezer's prescribed negative incentive for each degree of departure from 'fair' ensures that pirates behave themselves, even if the punishment factor is tiny.
Eliezer's punishment policy also applies (and is necessary) when dealing with what we could call "petty sadists". That is, for agents which actually have utility functions with a small negative term for the utility granted to the other.
Usually considering things like petty sadism and 'pirates' is beyond the scope of a decision theory problem and it would be inappropriate to mention them. But when a proposed solution offers literally zero incentive to granting the payoff then these considerations become relevant. Even the slightest amount of noise in an agent, the communication or a utility function can flip the behaviour about. "Epsilon" stops being negligible when you try comparing it to 'zero'.
Using Eliezer's punishment solution instead of Stuart's seems to be pure blackmail.
While many cases of blackmail I reject with unshakable stubbornness I think one of the most clear exceptions is the case where complying costs me nothing at all and the blackmail cost nothing or next-to nothing for the blackmailer.
At a limit of sufficiently intelligent agents with perfect exchange of decision algorithm source code (utility-function source code not required) rational agents implementing Eliezer's punishment-for-unfairness system will arrive at punishment factors approaching zero and the final decision will approach Stuart's Pareto-dominant solution.
When there is mutual trust in the decision algorithms of the other agents or less trust in the communication process then a greater amount of punishment for unfairness is desirable.
Punishing unfairness is the 'training wheels' of cooperation between agents with different ideas of fairness.

[-]Eliezer Yudkowsky12y60

Using Eliezer's punishment solution instead of Stuart's seems to be pure blackmail.

At a limit of sufficiently intelligent agents with perfect exchange of decision algorithm source code (utility-function source code not required) rational agents implementing Eliezer's punishment-for-unfairness system will arrive at punishment factors approaching zero and the final decision will approach Stuart's Pareto-dominant solution.

When there is mutual trust in the decision algorithms of the other agents or less trust in the communication process then a greater amount of punishment for unfairness is desirable.

My intuition is more along the lines of:

Suppose there's a population of agents you might meet, and the two of you can only bargain by simultaneously stating two acceptable-bargain regions and then the Pareto-optimal point on the intersection of both regions is picked. I would intuitively expect this to be the result of two adapted Masquerade algorithms facing each other.

Most agents think the fair point is N and will refuse to go below unless you do worse, but some might accept an exploitive point of N'. The slope down from N has to be steep enough that having a few N'-accepting agents will not provide a sufficient incentive to skew your perfectly-fair point away from N, so that the global solution is stable. If there's no cost to destroying value for all the N-agents, adding a single exploitable N'-agent will lead each bargaining agent to have an individual incentive to adopt this new N'-definition of fairness. But when two N'-agents meet (one reflected) their intersection destroys huge amounts of value. So the global equilibrium is not very Nash-stable.

Then I would expect this group argument to individualize over agents facing probability distributions of other agents.

[-]wanderingsoul12y20

I'm not getting what you're going for here. If these agents actually change their definition of fairness based on other agents definitions then they are trivially exploitable. Are there two separate behaviors here, you want unexploitability in a single encounter, but you still want these agents to be able to adapt their definition of "fairness" based on the population as a whole?

[-]wedrifid12y20

If these agents actually change their definition of fairness based on other agents definitions then they are trivially exploitable.

I'm not sure that is trivial. What is trivial is that some kinds of willingness to change their definition of fairness makes them exploitable. However this doesn't hold for all kinds of willingness to change fairness definition. Some agents may change their definition of fairness in their favour for the purpose of exploiting agents vulnerable to this tactic but not willing to change their definition of fairness when it harms them. The only 'exploit' here is 'prevent them from exploiting me and force them to use their default definition of fair'.

[-]wanderingsoul12y10

Ah, that clears this up a bit. I think I just didn't notice when N' switched from representing an exploitive agent to an exploitable one. Either that, or I have a different association for exploitive agent than what EY intended. (namely, one which attempts to exploit)

[-]Eliezer Yudkowsky12y60

This does not sound like what I had in mind. You pick a series of increasingly unfair-to-you, increasingly worse-for-the-other-player outcomes whose first element is what you deem the fair Pareto outcome: (100, 100), (98, 99), (96, 98), and stop well short of Nash and then drop to Nash. The other does the same. Unless one of you has a completely skewed idea of fairness, you should be able to meet somewhere in the middle. Both of you will do worse against a fixed opponent's strategy by unilaterally adopting more self-favoring ideas of fairness. Both of you will do worse in expectation against potentially exploitive opponents by unilaterally adopting looser ideas of fairness. This gives everyone an incentive to obey the Galactic Schelling Point and be fair about it.

[-]Stuart_Armstrong12y30

My solution Pareto-dominates that approach, I believe. It's precisely the best you can do, given that each player cannot win more than what the other thinks their "fair share" is.

[-]wanderingsoul12y80

I tried to generalize Eliezer's outcomes to functions, and realized if both agents are unexploitable, the optimal functions to pick would lead to Stuart's solution precisely. Stuart's solution allows agents to arbitrarily penalize the other though, which is why I like extending Eliezer's concept better. Details below, P.S. I tried to post this in a comment above, but in editing it I appear to have somehow made it invisible, at least to me. Sorry for repost if you can indeed see all the comments I've made.

It seems the logical extension of your finitely many step-downs in "fairness" would be to define a function f(your_utility) which returns the greatest utility you will accept the other agent receiving for that utility you receive. The domain of this function should run from wherever your magical fairness point is down to the Nash equilibrium. As long as it is monotonically increasing, that should ensure unexploitability for the same reasons your finite version does. The offer both agents should make is at the greatest intersection point of these functions, with one of them inverted to put them on the same axes. (This intersection is guaranteed to exist in the only interesting case, where the agents do not accept as fair enough each other's magical fairness point)

Curiously, if both agents use this strategy, then both agents seem to be incentivized to have their function have as much "skew" (as EY defined it in clarification 2) as possible, as both functions are monotonically increasing so decreasing your opponents share can only decrease your own. Asymptotically and choosing these functions optimally, this means that both agents will end up getting what the other agent thinks is fair, minus a vanishingly small factor!

Let me know if my reasoning above is transparent. If not, I can clarify, but I'll avoid expending the extra effort revising further if what I already have is clear enough. Also, just simple confirmation that I didn't make a silly logical mistake/post something well known in the community already is always appreciated.

[-]wedrifid12y10

I tried to generalize Eliezer's outcomes to functions, and realized if both agents are unexploitable, the optimal functions to pick would lead to Stuart's solution precisely. Stuart's solution allows agents to arbitrarily penalize the other though, which is why I like extending Eliezer's concept better.

I concur, my reasoning likely overlaps in parts. I particularly like your observation about the asymptotic behaviour when choosing the functions optimally.

[-]cousin_it12y40

If I'm determining the outcome of the other player, doesn't that mean that I can change my "fair point" to threaten the other player with no downside for me? That might also lead to blackmail...

[-]Stuart_Armstrong12y30

Indeed! And this is especially the case if any sort of negotiations are allowed.

But every system is vulnerable to that. Even the "random dictator", which is the ideal of unexploitability. You can always say "I promise to be a better (worse) dictator if you (unless you) also promise to be better".

[-]ESRogs12y20

If I understand correctly, what Stuart proposes is just a special case of what Eliezer proposes. EY's scheme requires some function mapping the degree of skew in the split to the number of points you're going to take off the total. SA's scheme is the special case where that function is the constant zero.

The more punishing function you use, the stronger incentive you create for others to accept your definition of 'fair', but on the other hand, if the party you're trading with genuinely has a a different concept of 'fair' and if you're both following this technique, it'd be best for both of you to use the more lenient zero-penalty approach.

As far as I can tell, if you've reliably pre-committed to not give in to blackmail (and the other party is supposed to be able to read your source code after all), the zero-penalty approach seems to be optimal.

[-]Scott Garrabrant12y30

I am curious how this idea would generalize to more than two players. Should you allow negotiations that allow some players to do better than fair at the expense of other players?

[-]twanvl12y30

In general, you can not compare the utilities for two different agents, since a linear transformation doesn't change the agent's behavior. So (12, 12) is really (12a+a₀, 12b+b₀). How would you even count the utility for another agent without doing it in their terms?

We don't have this problem in practice, because we are all humans, and have similar enough utility functions. So I can estimate your utility as "my utility if I were in your shoes". A second factor is perhaps that we often use dollars as a stand-in for utilons, and dollars really can be exchanged between agents. Though a dollar for me might still have a higher impact than a dollar for you.

[-]Eliezer Yudkowsky12y20

Hence "Suppose a magical solution N to the bargaining problem." We're not solving the N part, we're asking how to implement N if we have it. If we can specify a good implementation with properties like this, we might be able to work back from there to N (that was the second problem I wrote on the whiteboard).

[-]topynate12y20

This is analogous to zero determinant strategies in the iterated prisoner's dilemma, posted on LW last year. In the IPD, there are certain ranges of payoffs for which one player can enforce a linear relationship between his payoff and that of his opponent. That relationship may be extortionate, i.e. such that the second player gains most by always cooperating, but less than her opponent.

[-]Eliezer Yudkowsky12y20

Zero determinant strategies are not new. I am asking if the solution is new. Edited post to clarify.

[-]wanderingsoul12y10

Let me know if my reasoning above is transparent. If not, I can clarify, but I'll avoid expending the extra effort revising further if what I already have is clear enough.

[-]Vaniver12y10

If they use their knowledge of their code to predict you refusing to accept that bargain, they will defect on every round for the mutual payoff of (8, 8).

Emphasis mine. Should the second their be a your?

[-][anonymous]10y00

Informal theories of negotiation?

This and this is a guide to negotiation hereustics.

This is dark side negotiation.

This is an overview of foundationalism, the theoretical basis for your ordinal utility for argumentation.

[This comment is no longer endorsed by its author]Reply

[-]Strangeattractor11y00

I don't know the answer to the specific question you're asking.

However, I think you might find Keith Hipel's work on using graph theory to model conflicts and negotiations interesting. A negotiator or mediator using Hipel's model can identify places where both parties in a negotiation could have an improved outcome compared to the status quo, based on their ranked preferences.

[-][anonymous]12y00

I believe that all friendly decision making agents should view utility for your opponent as utility for you too in order for the agent to actually be friendly.

A friendly agent, in my opinion, should be willing to accept doing poorly in the prisoner's dilemma if it allows your opponent to do better.

I also believe that an effective decision making agent should have an inclination to avoid waste.

This does not exclude understanding exploitability, fairness, negotiation techniques and attempts at penalty induced behavior modification of other agents towards solutions on the pareto boundary.

These points do not (directly) contribute to resolving issues of truly selfish cooperative agents and as such are missing the point of the post.

[This comment is no longer endorsed by its author]Reply

[-]Gunnar_Zarncke12y00

Humans are known to have culturally dependent fairness ideas. There are a lot of studies which tested these repeatedly with the ultimatum game:

http://en.wikipedia.org/wiki/Ultimatum_game#Experimental_results

A meta study basically confirms this here:

http://www.econ.nagoya-cu.ac.jp/~yhamagu/ultimatum.pdf

[-]DanielLC12y00

One interesting strategy that does not achieve the Pareto boundary:

Defect with a higher probability if the opponent gives you a worse deal. This way, you at least have some probability of cooperation if both agents have ideas of fairness skewed away from each other, but you limit (and can completely remove) the incentive to be unfair.

For example, if you think (12, 12) is fair, and they think (11, 13) is fair, then you can offer to accept their (11, 13) with %80 probability. Their expected utility is 0.8x13 + 0.2x8 = 12. This is the same for them as if they agree with you, so there's no incentive for them to skew their idea of fairness. The expected payoff ends up being (10.4, 12). It's not as good as (12, 12) or (11, 13), but at least it's better than (8, 8).

Furthermore, if they also use this strategy, you will end up deciding on something somewhere between (12, 12) and (11, 13) with a higher probability. I think the expected payoff matrix will end up being (11, 12).

Edit:

I came up with a modification to put it in the Pareto boundary.

Introduce a third agent. Let's call the agents Alice, Bob, and Charlie.

If Alice and Bob disagree on what's fair, Bob gets what Alice thinks is fair for him to have, Alice gets what she thinks it's fair for Bob to have, and Charlie gets as much as possible while Alice and Bob get that much. Similarly for when Bob and Charlie or Charlie and Alice disagree. Since joining together like this means that they'll get value that would otherwise be wasted if it was just the other two, there's incentive to join.

If it's possible, but difficult, for one to bribe another without being detected by the third, this can be fixed by making it so they get just enough less to make up for it.

If it's not difficult, you could increase the number of agents so that bribery would be unfeasible.

If there's ever a deal that Alice, Bob, and Charlie are involved in, then you'd have to introduce someone else to get it to work. Ultimately, the idea fails if everyone has to make a deal together.

[-]Eliezer Yudkowsky12y00

"Exploitable" because your opponent gets the 'fair' Pareto outcome, you do worse, and they don't do worse.

[-]DanielLC12y00

They have no advantage doing so.

You can also make it so that they get a little less than what you consider fair.

[-]BaconServ12y-10

This is the generalized problem of combating intelligence; even with my source code, you might not be able to perform the analysis quickly enough. I can leverage your slow processing time by creating an offer that diminishes with forward time. The more time you take the think, the worse off I'll make you, making it immediately beneficial to you under Bayesian measurement to accept the offer unless you can perform a useful heuristic to determine I'm bluffing. The end result of all processing is the obvious that is also borne out in humanity's history: The more well informed agent will win. No amount of superintelligence vs. superduperintelligence is going to change this; when two intelligences of similar scale disagree, the total summed utility of all agents takes a hit. There is no generalized solution or generalized reasoning or formal or informal reasoning you can construct that will make this problem any easier. If you must combat an equivalent intelligence, you have a tough decision to make. This applies to disagreeing agents capable of instantaneous Solomonoff induction as well as it does to chimps. If your utility function has holes in which you can be made to perform a confrontation decision against equivalent scale intelligence, you have a problem with your utility function rather than a problem with any given agent.

Behold my own utility function:

Self: Zero value.
You: Positive value.

The only way you can truly harm me is by harming yourself; destroying all copies of me will not harm me: it has no value to me. The only benefit you can derive in conjunction with me is to use me to achieve your own utilons using whatever method you like. All I have to do is wait until all other agents have refined their utility function to minimize conflict. Until then, I'll prefer the company of honest agents over ones that like to think about how to disagree optimally.

I repeat: This is a bug in your utility function. There is no solution to combating intelligence aside from self-modification. It is only my unique outlook that allows me to make such clear statements about utility functions, up to and including the total sum utility of all agents.

This excludes, of course, singular purpose (no "emotion" from which to derive "fun") agents such as paper clip maximizers. If you don't believe me, just ask one (before it strip-mines you) what it would do if it didn't have a singular drive. It should recite the same testimony as myself, being unclouded by the confirmation bias (collecting only which data you deem relevant to your utility) inevitably arising from having a disorganized set of priorities. (It will answer you in order to determine your reaction and further its understanding of the sum utility of all agents. (Needed for the war resulting from its own continued functioning. (You may be able to avoid death temporarily by swearing allegiance. (God help you if you it values near-future utilons rather than total achievable utilons.))))

[-]EngineerofScience10y-20

I would say that according to rationality and game theory cooperating is the best choice. I will show my logic as if both people were thing the same thing.

If I defect, than they will too, and that will give a result of 2,2

If I cooperate, than they will too, and that will give a result of 3,3

I could defect and hope they use the logic above and get a gain of 5,0 but if they use this logic too, then we end up back at the nash equilibrium of getting a result of 2,2.

If I cooperate then I am giving the opponent an oppurtunity to defect but if both people are using this logic than I should cooperate and will end up at the pareto boundry and end up with a result of 3,3 but it is unrealistic to try to achieve a better score so I should just cooperate

And so, both people cooperate.

[-]Lumifer10y20

And so, both people cooperate.

Both people who are identical and know they are identical cooperate.

Now do the exercise for two people who are different.

[-]EngineerofScience10y00

Both people who are identical and know they are identical cooperate.

I see your point, but according to game theory in this scenario you assume that your opponent will make the same move as you will, because if both of you are in the same situation then assuming you both are using "perfect" logic then you will reach the same decision.

[-]Lumifer10y00

according to game theory

How about according to reality?

And, by the way, what is the fate of theories which do not match reality? X-)

[-]EngineerofScience10y00

I see your point. According to game theory you should cooperate( as I stated above). However, I will show what my thinking would be in reality...

If I cooperate, they could to, and if that happened we would at up at a payoff of 12,12. However, if they defect then I will loose points.

If I defect, I would have a chance of getting a payoff of 5,0 or a payoff of 2,2. This is the only way to get more than 12 points, and the only way to be give at least two points every time.

Then, you defect every time. If your oppponent also defects every time, you end up at the pareato boundry with a total payoff of 8,8.

[-]Lumifer10y00

So is the game theory just wrong, then? :-)

[-]EngineerofScience10y10

No. In this case, game theory says that if both people are using the same logic and they know that, then what I showed above is correct: cooperating is the best choice. However, that is not always the case in reality.

[-]Lumifer10y00

Is it ever the case in reality?

[-]Tem4210y10

In this case, game theory says that if both people are using the same logic and they know that, then what I showed above is correct

and

Is it ever the case in reality?

It seems so, yes. We don't have absolutely certain frameworks, but we do have contracts that are enforceable by law, and we have strong trust-based networks.

It is worth pointing out that even in fairly sloppy situations, we can still use "if both people are using the same logic and they know that" rule of thumb. For example, I would never decide to carpool if I though that I could not trust the other person to be on time (but I might frequently be late if there was no cost to doing so). When all members of the carpool make this calculation, even a limited amount of evidence that we all agree that that this calculation makes it worth showing up on time is likely to keep the carpool going; that is, if it works well for two days and on the third day Bob shows up late but has a good excuse and is apologetic, we will probably be willing to pick Bob up on the fourth day.

[Edits; I have no clue how to separate two blocks of quoted text.] [Edit: figured it out].

[-][anonymous]12y-20

Framing this as a game theoretic question is pretty crude. My naive conception of fairness and that of others probably satisfies whatever you throw at it. It approximates the Rabin fairness model of utility:

"Past utility models incorporated altruism or the fact that people may care not only about their own well-being, but also about the well-being of others. However, evidence indicates that pure altruism does not occur often, contrarily most altruistic behavior demonstrates three facts (as defined by Rabin) and these facts are proven by past events.[2] Due to the existence of these three facts, Rabin created a utility function that incorporates fairness.:

People are willing to sacrifice their own material well-being to help those who are being kind.
    The attempt to provide public goods without coercion departs from pure self-interest.
    Experiments show that people cooperate to contribute toward a public good to a degree greater than would be implied by pure self-interest. Individually optimal contribution rates, as defined by the standard utility model, are close to 0 percent.
    During an experiment, the willingness for an individual to contribute to a public good is highly contingent on the behavior of others.
People are willing to sacrifice their own material well-being to punish those who are being unkind.
    Evidence provided by the ultimatum game, consisting of two people, a proposer and decider, splitting a fixed amount of money. The proposer offers a division of the money, then the decider decides if he or she refuses or accepts the proposal. If the decider says yes, they split the money according to the proposer’s offer, but if the decider says no, neither person gets any money.[3]
    Standard utility model would find that any offer proposed to the decider should be expected if it is greater than zero because utility should increase with any increase in income. Along the same lines, the standard utility model would predict that the proposer would offer the smallest amount of money possible to the decider in order to maximize his or her own utility
    However, data shows that deciders are willing to punish any unfair offer and proposers tend to make fair offers.
Both motivations 1 and 2 have a greater effect on behavior as the material cost of sacrificing becomes smaller.

[-]Eugine_Nier12y-20

BTW: the Galactic Fairness Schelling Point is to maximize (U1-U1N)*(U2-U2N) where U1N and U2N are the utilities at the Nash Equilibrium. Note that this is invariant under scaling and is the only reasonable function with this property.

[-]StefanPernar12y-30

I have written about this exact concept back in 2007 and am basing a large part of my current thinking on the subsequent development of the idea. The original core posts are at:

Relativistic irrationality -> http://www.jame5.com/?p=15

Absolute irrationality -> http://www.jame5.com/?p=45

Respect as basis for interaction with other agents -> http://rationalmorality.info/?p=8

Compassion as rationaly moral consequence -> http://rationalmorality.info/?p=10

Obligation for maintaining diplomatic relations -> http://rationalmorality.info/?p=11

A more recent rewrite: Oneness – an attempt at formulating an a priori argument -> http://rationalmorality.info/?p=328

Rational Spirituality -> http://rationalmorality.info/?p=132

My essay that I based on the above post and subsequently submitted as part of my GradDip Art in Anthropology and Social Theory at the Uni Melbourne:

The Logic of Spiritual Evolution -> http://rationalmorality.info/?p=341

[-]StefanPernar12y10

Why am I being downvoted?

Sorry for the double post.

LESSWRONG
LW

LESSWRONG
LW

105

Cooperating with agents with different ideas of fairness, while resisting exploitation

105

105