Maximizing a geometric weighted average G(_,ψ) can always lead to Pareto optimality.

Given any Pareto optimal joint utility p, we can calculate weights ψ which make p optimal according to G(_,ψ).

That post describes why this problem is interesting, but the quick summary is: geometric utility aggregation is a candidate alternative to Harsanyi utility aggregation (which is an arithmetic weighted average), which handles some tradeoffs better than Harsanyi aggregation. The resulting choice function is geometrically rational, whereas the Harsanyi choice function is VNM-rational. This post is mostly math supporting the main post, with some details moved to their own posts later in this sequence.

We as a community, and I individually, would need pretty strong reasons to endorse a theory of rationality based on a broadening of the VNM axioms. I've been sufficiently radicalized by Scott Garrabrant, and thinking about how each system handles common decision problems, that I think more work in this direction is potentially very valuable. Here's how I solved a couple of the subproblems towards making progress in that direction.

Assumptions

Most of these proofs can be understood geometrically, and we'll need to make some geometric assumptions. One big property we'll be using is the fact that set of feasible joint utilities F∈Rn is always convex. This is always true for VNM-rational agents with utility functions, but we can prove the same result for agents with other functions describing their preferences, as long as the feasible utilities stay convex. This will be helpful when we start combining utilities in ways that make the resulting social choice function violate the VNM axioms, but preserve convexity.

We'll also need F to be compact, if we want to be able to find optimal points according to G or H. If F extends infinitely in any direction, there might not be any Pareto optimal joint utilities. This "problem" also appears in classic individual rationality: what does a rational agent do when they can simply pick a number u∈R and receive that much utility? Individual VNM-rationality isn't well-defined when there isn't an optimal option, so compactness seems like a weak assumption, but it implies that utilities are bounded with respect to the options our agents are facing, so I want to call it out.

It will be easier to prove uniqueness results if there aren't any redundant agents with constant utility. Any weight assigned to these agents has no effect on the optima of H or G, and in the context of optimizing over a set of feasible options it's safe to ignore such agents. There's no choice we can make which will affect them in any way. Geometrically, this corresponds to the requirement that F be n-dimensional.

It will make the math nicer if we shift all utility functions so that each agent assigns 0 utility to their least favorite option. If F is generated by taking the Pareto improvements over some disagreement pointd, as is done in the bargaining setting, this disagreement point will become the baseline for 0 utility for all agents.

I'll also be assuming that the number of players n is finite. I don't have any reason to think the results fail for the infinite case, but there are things we'd need to worry about for infinite-dimensional vector spaces that I didn't worry about for these first results. We'd want to swap out all the finite sums ∑ni=1 for integrals ∫di, for example, if we wanted to use continuous indices for agents.

Maximizing G Can Always Lead to Pareto Optimality

Showing that maximizing G(_,ψ) can always lead to Pareto optimality is relatively straightforward.

Our decision to shift away from negative utilities is already paying dividends: the weighted geometric average is monotonically increasing with respect to all individual utilities. Any Pareto improvement in utilities will lead to a weighted average that's at least what we started with.^{[1]}

Pareto Monotonicity: If p is a Pareto improvement over f, then G(p,ψ)≥G(f,ψ) for all weights w∈[0,1]n. Symbolically: p⪰f⟹G(p,ψ)≥G(f,ψ)

Among agents with positive weight ψi, any increase to their utility will increase G(_,ψ); maximizing G(_,ψ) will automatically pick up any Pareto improvements among these agents.

It turns out that when all agents have positive weight, ψi>0, this Pareto optimal point p will be the unique optimum of G among F. We'll prove this more rigorously in the next post, but intuitively when ψi>0 for all agents, the contour surface of joint utilities with the same G score as p (colored green in the picture below) curves away from F.

Will maximizing G(_,ψ) lead to Pareto optimality among a group of agents where some have 0 weight? In that case it depends on how we break ties! Assigning an agent 0 weight makes G and H insensitive to their welfare, so there can be optima of G and H which are not Pareto optimal in this case.^{[2]}

As an example, consider Alice deciding how much money she and Bob should receive, up to $100 each. There is no trade-off between her utility and Bob's and the only Pareto optimum is ($100, $100). But if Alice is a pure G or H maximizer and assigns Bob 0 weight, she's indifferent to Bob's utility and the optima are wherever Alice receives $100.

Fortunately, there are always optima of G(_,ψ) which are Pareto optimal, and we can use a tie-breaking rule which always picks one of these. One approach would be to derive new weights β which are guaranteed to be positive for all agents, and which are very close to ψ.

limϵ→0β(ψ,ϵ)=ψ

Then we could pick the point p on the Pareto frontier which is the limit as ϵ approaches 0.

p(ψ)=limϵ→0argmaxu∈FG(u,β(ψ,ϵ))

Maximizing G or H can always be done Pareto optimally. It turns out that this particular tie-breaking rule isn't guaranteed to make p(ψ) continuous, and in fact there might be cases where no such tie-breaking rule exists. However, it turns out that we can make p(ψ) continuous if we're willing to accept an arbitrarily good approximation of argmaxu∈FG(u,ψ). See Individual Utilities Shift Continuously as Geometric Weights Shift for more details.

Making p an Optimum of G

Going the other direction, let's pick a Pareto optimal joint utility p∈P and find weights ψ∈[0,1]n which make that point optimal among F with respect to G(_,ψ). These geometric weights ψ won't always be unique, for example at corners where the Harsanyi weights ϕ aren't unique. G won't always have unique optima either, which can happen when we give any agents 0 weight.

Let's handle a few easy cases up front: when n=1, the only option is ψ=[1]. This means that G(u,ψ)=u, and G(_,ψ) maximization reduces to individual utility maximization for a single agent. This is a nice base case for any sort of recursion: the aggregate of one utility function is just that same utility function, and Harsanyi aggregation works the same way.

Similarly, when P is a single point p, any weights will make p optimal according to G(_,ψ) or H(_,ϕ). Feel free to use any convention that works for your application, the simplest option is to just inherit ψ(→0)=ϕ(→0), but if F is shrinking towards becoming 0 dimensional then I prefer ψ(→0)=limp→→0ψ(p).

For two or more agents and a Pareto frontier with multiple options, here's the high-level overview of the proof:

Identify the Harsanyi hyperplane H, and the Harsanyi weights ϕ

Calculate geometric weights ψ which make p optimal among H according to G(_,ψ)

Show that this is sufficient to make p optimal among F.

The Harsanyi Hyperplane

One way to derive the Harsanyi weights for F is to find a hyperplane which separatesF from the rest of Rn.

Diffractor uses this technique here in that same great Unifying Bargaining sequence, and I'd actually forgotten I learned it from there, it's become so ingrained in my thinking. The idea is that maximizing a weighted arithmetic average can be thought of as picking a slope for a hyperplane, then sliding that hyperplane away from the origin until it just barely touches F.

The slope of this Harsanyi hyperplane H⊂Rn matches up with the slope of the Pareto frontier at p. If p is at a corner, where the slope change suddenly on either side, any convex combination of the slopes around p will work. The equation for a any hyperplane looks like a⋅u=a1u1+a2u2+...+anun=b, where a∈Rn and b∈R are constants. Given some Harsanyi weights ϕ∈[0,1]n, the Harsanyi hyperplane shows us all of the joint utilities with the same weighted average, and is defined by H(u,ϕ)=u⋅ϕ=H(p,ϕ). We can pick any joint utility u on H and it will have the same H score as p.

The Geometric Utilitarian Weights

The trick that makes all of this work is to pick weights ψ which cause p to be optimal with respect to G(_,ψ) among H. It turns out these are easy to calculate! For individual elements we can use the formula

ψi=piϕip⋅ϕ

And for all of the weights at once we can use

ψ=p⊙ϕp⋅ϕ

Where p⊙ϕ∈Rn is the element-wise product of p and ϕ: (p⊙ϕ)i=piϕi and p⋅ϕ∈Rn is the dot product p⋅ϕ=∑nj=1pjϕj. Check out Deriving the Geometric Utilitarian Weights for the details of how this was derived, and how we know that p is the unique optimum of G(_,ψ) among H if we use these weights. (So long as ψi>0 for all agents.)

Gradient Ascent

One way to think about maximizing G is to imagine a robot flying around in joint utility space Rn, following the gradient ∇uG to increase G as quickly as possible. This is the gradient ascent algorithm, and it can be used to find local optima of any function. Some functions have multiple optima, and in those cases it matters where your robot starts. But when there's just one global optimum, gradient ascent will find it.

If we ignore F and just set our robot loose trying to maximize G, it will never find an optimum. There's always an agent's utility where increasing their utility increases G. (∇uGi≥0 for all agents, and ∇uGi>0 for some agent).

However, if we use those weights ψ we calculated, ∇uG will always point at least a little in the direction of h, the normal vector to the Harsanyi hyperplane H. Check out Gradient Ascenders Reach the Harsanyi Hyperplane for the details there.

Now if we add the single constraint that our robots can't travel beyond H, they'll bump into H and then travel along ∇v(G∘H), since that's the gradient of G when our robots are constrained to only move along H.

But when ψi>0 for all agents, G(_,ψ) has a unique optimum on H, and it's p! No matter where our robots start within the interior of F, they'll find themselves inexorably drawn to the p given just the constraint that they can't cross H. If we add in the constraint that the robots need to stay within F, they might bump into those boundaries first, but they'll make their way over to the unique optimum of G(_,ψ) among all options in F, which is still p.

When ψi=0 for some agent, G(_,ψ) may have multiple optima where our robots might land, but those optima always include p!

More Details

The next two posts in this sequence go into more detail about two subproblems we summarized briefly here.

Deriving the Geometric Utilitarian Weights goes into more detail about how those weights ψ can be derived, and how we know p is the unique optimum among H if we use them.

Gradient Ascenders Reach the Harsanyi Hyperplane describes what the gradient of G(_,ψ) looks like, and how we know it always points at least a little towards H (and then away from H once we cross it).

When an individual is given 0 weight, increasing their utility doesn't increase the weighted average. But it doesn't decrease the weighted average either.

^{^}

Assigning an agent 0 weight makes G insensitive to their welfare, but increasing G might still increase their welfare. Because we might have assigned weight to another agent whose values are somewhat aligned with theirs. Our social aggregate might not care that Alice likes clean air, but it might still tell us to clean up the air if Bob likes it and Bob is given positive weight.

This is a supplemental post to Geometric Utilitarianism (And Why It Matters), which sets out to prove what I think are the main interesting results about Geometric Utilitarianism:

That post describes why this problem is interesting, but the quick summary is: geometric utility aggregation is a candidate alternative to Harsanyi utility aggregation (which is an arithmetic weighted average), which handles some tradeoffs better than Harsanyi aggregation. The resulting choice function is geometrically rational, whereas the Harsanyi choice function is VNM-rational. This post is mostly math supporting the main post, with some details moved to their own posts later in this sequence.

We as a community, and I individually, would need pretty strong reasons to endorse a theory of rationality based on a broadening of the VNM axioms. I've been sufficiently radicalized by Scott Garrabrant, and thinking about how each system handles common decision problems, that I think more work in this direction is potentially very valuable. Here's how I solved a couple of the subproblems towards making progress in that direction.

## Assumptions

Most of these proofs can be understood geometrically, and we'll need to make some geometric assumptions. One big property we'll be using is the fact that set of feasible joint utilities F∈Rn is always convex. This is always true for VNM-rational agents with utility functions, but we can prove the same result for agents with other functions describing their preferences, as long as the feasible utilities stay convex. This will be helpful when we start combining utilities in ways that make the resulting social choice function violate the VNM axioms, but preserve convexity.

We'll also need F to be compact, if we want to be able to find optimal points according to G or H. If F extends infinitely in any direction, there might not be any Pareto optimal joint utilities. This "problem" also appears in classic individual rationality: what does a rational agent do when they can simply pick a number u∈R and receive that much utility? Individual VNM-rationality isn't well-defined when there isn't an optimal option, so compactness seems like a weak assumption, but it implies that utilities are bounded with respect to the options our agents are facing, so I want to call it out.

It will be easier to prove uniqueness results if there aren't any redundant agents with constant utility. Any weight assigned to these agents has no effect on the optima of H or G, and in the context of optimizing over a set of feasible options it's safe to ignore such agents. There's no choice we can make which will affect them in any way. Geometrically, this corresponds to the requirement that F be n-dimensional.

It will make the math nicer if we shift all utility functions so that each agent assigns 0 utility to their least favorite option. If F is generated by taking the Pareto improvements over some disagreement point d, as is done in the bargaining setting, this disagreement point will become the baseline for 0 utility for all agents.

I'll also be assuming that the number of players n is finite. I don't have any reason to think the results fail for the infinite case, but there are things we'd need to worry about for infinite-dimensional vector spaces that I didn't worry about for these first results. We'd want to swap out all the finite sums ∑ni=1 for integrals ∫di, for example, if we wanted to use continuous indices for agents.

## Maximizing G Can Always Lead to Pareto Optimality

Showing that maximizing G(_,ψ) can always lead to Pareto optimality is relatively straightforward.

Our decision to shift away from negative utilities is already paying dividends: the weighted geometric average is monotonically increasing with respect to all individual utilities. Any Pareto improvement in utilities will lead to a weighted average that's at least what we started with.

^{[1]}Pareto Monotonicity: If p is a Pareto improvement over f, then G(p,ψ)≥G(f,ψ) for all weights w∈[0,1]n. Symbolically: p⪰f⟹G(p,ψ)≥G(f,ψ)Among agents with positive weight ψi, any increase to their utility will increase G(_,ψ); maximizing G(_,ψ) will automatically pick up any Pareto improvements among these agents.

It turns out that when all agents have positive weight, ψi>0, this Pareto optimal point p will be the

uniqueoptimum of G among F. We'll prove this more rigorously in the next post, but intuitively when ψi>0 for all agents, the contour surface of joint utilities with the same G score as p (colored green in the picture below) curves away from F.Will maximizing G(_,ψ) lead to Pareto optimality among a group of agents where some have 0 weight? In that case it depends on how we break ties! Assigning an agent 0 weight makes G and H insensitive to their welfare, so there can be optima of G and H which are not Pareto optimal in this case.

^{[2]}As an example, consider Alice deciding how much money she and Bob should receive, up to $100 each. There is no trade-off between her utility and Bob's and the only Pareto optimum is ($100, $100). But if Alice is a pure G or H maximizer and assigns Bob 0 weight, she's indifferent to Bob's utility and the optima are wherever Alice receives $100.

Fortunately, there are always optima of G(_,ψ) which are Pareto optimal, and we can use a tie-breaking rule which always picks one of these. One approach would be to derive new weights β which are guaranteed to be positive for all agents, and which are very close to ψ.

limϵ→0β(ψ,ϵ)=ψ

Then we could pick the point p on the Pareto frontier which is the limit as ϵ approaches 0.

p(ψ)=limϵ→0argmaxu∈FG(u,β(ψ,ϵ))

Maximizing G or H can always be done Pareto optimally. It turns out that this particular tie-breaking rule isn't guaranteed to make p(ψ) continuous, and in fact there might be cases where no such tie-breaking rule exists. However, it turns out that we can make p(ψ) continuous if we're willing to accept an arbitrarily good approximation of argmaxu∈FG(u,ψ). See Individual Utilities Shift Continuously as Geometric Weights Shift for more details.

## Making p an Optimum of G

Going the other direction, let's pick a Pareto optimal joint utility p∈P and find weights ψ∈[0,1]n which make that point optimal among F with respect to G(_,ψ). These geometric weights ψ won't always be unique, for example at corners where the Harsanyi weights ϕ aren't unique. G won't always have unique optima either, which can happen when we give any agents 0 weight.

Let's handle a few easy cases up front: when n=1, the only option is ψ=[1]. This means that G(u,ψ)=u, and G(_,ψ) maximization reduces to individual utility maximization for a single agent. This is a nice base case for any sort of recursion: the aggregate of one utility function is just that same utility function, and Harsanyi aggregation works the same way.

Similarly, when P is a single point p, any weights will make p optimal according to G(_,ψ) or H(_,ϕ). Feel free to use any convention that works for your application, the simplest option is to just inherit ψ(→0)=ϕ(→0), but if F is shrinking towards becoming 0 dimensional then I prefer ψ(→0)=limp→→0ψ(p).

For two or more agents and a Pareto frontier with multiple options, here's the high-level overview of the proof:

## The Harsanyi Hyperplane

One way to derive the Harsanyi weights for F is to find a hyperplane which separates F from the rest of Rn.

Diffractor uses this technique here in that same great Unifying Bargaining sequence, and I'd actually forgotten I learned it from there, it's become so ingrained in my thinking. The idea is that maximizing a weighted arithmetic average can be thought of as picking a slope for a hyperplane, then sliding that hyperplane away from the origin until it just barely touches F.

The slope of this Harsanyi hyperplane H⊂Rn matches up with the slope of the Pareto frontier at p. If p is at a corner, where the slope change suddenly on either side, any convex combination of the slopes around p will work. The equation for a any hyperplane looks like a⋅u=a1u1+a2u2+...+anun=b, where a∈Rn and b∈R are constants. Given some Harsanyi weights ϕ∈[0,1]n, the Harsanyi hyperplane shows us all of the joint utilities with the same weighted average, and is defined by H(u,ϕ)=u⋅ϕ=H(p,ϕ). We can pick any joint utility u on H and it will have the same H score as p.

## The Geometric Utilitarian Weights

The trick that makes all of this work is to pick weights ψ which cause p to be optimal with respect to G(_,ψ) among H. It turns out these are easy to calculate! For individual elements we can use the formula

ψi=piϕip⋅ϕ

And for all of the weights at once we can use

ψ=p⊙ϕp⋅ϕ

Where p⊙ϕ∈Rn is the element-wise product of p and ϕ: (p⊙ϕ)i=piϕi and p⋅ϕ∈Rn is the dot product p⋅ϕ=∑nj=1pjϕj. Check out Deriving the Geometric Utilitarian Weights for the details of how this was derived, and how we know that p is the unique optimum of G(_,ψ) among H if we use these weights. (So long as ψi>0 for all agents.)

## Gradient Ascent

One way to think about maximizing G is to imagine a robot flying around in joint utility space Rn, following the gradient ∇uG to increase G as quickly as possible. This is the gradient ascent algorithm, and it can be used to find local optima of any function. Some functions have multiple optima, and in those cases it matters where your robot starts. But when there's just one global optimum, gradient ascent will find it.

If we ignore F and just set our robot loose trying to maximize G, it will never find an optimum. There's always an agent's utility where increasing their utility increases G. (∇uGi≥0 for all agents, and ∇uGi>0 for some agent).

However, if we use those weights ψ we calculated, ∇uG will always point at least a little in the direction of h, the normal vector to the Harsanyi hyperplane H. Check out Gradient Ascenders Reach the Harsanyi Hyperplane for the details there.

Now if we add the single constraint that our robots can't travel beyond H, they'll bump into H and then travel along ∇v(G∘H), since that's the gradient of G when our robots are constrained to only move along H.

But when ψi>0 for all agents, G(_,ψ) has a unique optimum on H, and it's p! No matter where our robots start within the interior of F, they'll find themselves inexorably drawn to the p given just the constraint that they can't cross H. If we add in the constraint that the robots need to stay within F, they might bump into those boundaries first, but they'll make their way over to the unique optimum of G(_,ψ) among all options in F, which is still p.

When ψi=0 for some agent, G(_,ψ) may have multiple optima where our robots might land, but those optima always include p!

## More Details

The next two posts in this sequence go into more detail about two subproblems we summarized briefly here.

Deriving the Geometric Utilitarian Weights goes into more detail about how those weights ψ can be derived, and how we know p is the unique optimum among H if we use them.

Gradient Ascenders Reach the Harsanyi Hyperplane describes what the gradient of G(_,ψ) looks like, and how we know it always points at least a little towards H (and then away from H once we cross it).

Then we show a bonus result: Individual Utilities Shift Continuously as Geometric Weights Shift. Which is a nice property to have if you want your system's behavior to only change a little if you only change the weights a little, instead of discontinuously thrashing to a potentially very different behavior.

^{^}When an individual is given 0 weight, increasing their utility doesn't increase the weighted average. But it doesn't decrease the weighted average either.

^{^}Assigning an agent 0 weight makes G insensitive to their welfare, but increasing G might still increase their welfare. Because we might have assigned weight to another agent whose values are somewhat aligned with theirs. Our social aggregate might not care that Alice likes clean air, but it might still tell us to clean up the air if Bob likes it and Bob is given positive weight.