Toy model piece #5: combining partial preferences

Stuart_Armstrong

My previous approach to combining preferences went like this: from the partial preferences , create a normalised utility function ${ˆ U}_{i}$ that is defined over all worlds (and which is indifferent to the information that didn't appear in the partial model). Then simply add these utilities, weighted according to the weight/strength of the preference.

But this method fails. Consider for example the following partial preferences, all weighted with the same weight of $1$ :

$P_{1} : A > C$ .
$P_{2} : A > B$ .
$P_{3} : B > C$ .
$P_{4} : B > D$ .
$P_{5} : B > E$ .

If we follow the standard normalisation approach, then the normalised utility ${ˆ U}_{1}$ will be defined^[1] as:

${ˆ U}_{1} (A) = 0.5$ , ${ˆ U}_{1} (C) = - 0.5$ , and otherwise ${ˆ U}_{1} (-) = 0$ .

Then adding together all five utility functions would give:

$U ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ \begin{matrix} A B C D E \end{matrix} ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ \begin{matrix} 11 - 1 - 0.5 - 0.5 \end{matrix} ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠$ .

There are several problems with this utility. Firstly, the utility of $A$ and the utility of $B$ are the same, even though in the only case where there is a direct comparison between them, $A$ is ranked higher. We might say that we are missing the comparisons between $A$ and $D$ and $E$ , and could elicit these preferences using one-step hypotheticals. But what if comparing $A$ to $D$ is a complex preference, and all that happens is that the agent combines $A > B$ and $B > D$ ? If we added another partial preference that said $B > F$ , then $B$ would end up ranked above $A$ !

Another, more subtle point, is that the difference between $A$ and $C$ is too large. Simply having $A > B$ and $B > C$ would give $U (A) - U (C) = 1$ . Adding in $A > C$ moves this difference to $2$ . But note that $A > C$ is already implicit in $A > B$ and $B > C$ , so adding it shouldn't make the difference larger.

In fact, if the difference in utility between $A$ and $C$ were larger than $1$ , adding in $A > C$ should make the difference between $U (A)$ and $U (C)$ smaller: because having $A > C$ weighted at $1$ means that the agent's preference of $A$ over $C$ is not that strong.

Energy minimising between utilities

So, how should we combine these preferences otherwise? Well, if I have a preference $P_{i}$ , of weight $w_{i}$ , that ranks outcome $G$ below outcome $H$ (write this as $G <_{i} H$ ), then, if these outcomes appear nowhere else in any partial preference, $U (G) - U (H)$ will be $w_{i}$ .

So in a sense, that partial preference is trying to set the distance between those two outcomes to $w_{i}$ . Call this the energy-minimising condition for $P_{i}$ .

Then for a utility function $U$ , we can define the energy of $U$ , as compared with the (partially defined) normalised utility ${ˆ U}_{i}$ corresponding to $P_{i}$ . It is:

$\sum_{G <_{i} H} (w_{i} ({ˆ U}_{i} (H) - {ˆ U}_{i} (G)) - (U (H) - U (G)))^{2}$ .

This is the difference between the weighted distance between the outcomes that $w_{i} {ˆ U}_{i}$ , and the one that $U$ actually gives.

Because different partial preferences have different number of elements to compare, we can compute the average energy of $U$ :

$E (U, P_{i}) = \frac{\sum_{G <_{i} H} (w_{i} ({ˆ U}_{i} (H) - {ˆ U}_{i} (G)) - (U (H) - U (G)))^{2}}{\sum_{G <_{i} H} 1}$ .

Global energy minimising condition

But weights have another role to play here; they measure not only how much $H$ is preferred to $G$ , but how important it is to reach that preference. So, for humans, " $G < H$ with weight $ϵ$ " means both:

$H$ is not much preferred to $G$ .
The humans isn't too fussed about the ordering of $G$ and $H$ .

For general agents, these two could be separate phenomena; but for humans, they generally seem to be the same thing. So we can reuse the weights to compute the global energy for $U$ as compared to all partial preferences, which is just the weighted sum of its average energy for each partial preference:

$E (U, {P_{i}}) = \sum_{P_{i}} w_{i} E (U, P_{i}) = \sum_{P_{i}} w_{i} \frac{\sum_{G <_{i} H} (w_{i} ({ˆ U}_{i} (H) - {ˆ U}_{i} (G)) - (U (H) - U (G)))^{2}}{\sum_{G <_{i} H} 1}$ .

Then the actual ideal $U$ is defined to be the $U$ that minimises this energy term.

Solutions

Now, it's clear this expression is convex. But it need not be strictly convex (which would imply a single solution): for example, if $P_{1}$ ( $A > C$ ) and $P_{4}$ ( $B > D$ ) were the only partial preferences, then there would be no conditions on the relative utilities of ${A, C}$ , ${B, D}$ and ${E}$ .

Say that $H$ is linked to $G$ , by defining a link as "there exists a $P_{i}$ with $G \leq_{i} H$ or $H \leq_{i} G$ ", and then making this definition transitive and reflexive (it's automatically symmetric). In the example above, with $P_{i}$ , $1 \leq i \leq 5$ , all of ${A, B, C, D, E}$ are linked.

Being linked is an equivalence relation. And within a class of linked worlds, if we fix the utility of one world, then the energy minimisation equation becomes strictly convex (and hence has a single solution). Thus, within a class of linked worlds, the energy minimisation equation has a single solution, up to translation.

So if we want a single $U$ , translate the solution for each linked class so that the average utility in that class is equal to the average of every other linked class. And this would then define $U$ uniquely (up to translation).

For example, if we only had $P_{1}$ ( $A > C$ ) and $P_{4}$ ( $B > D$ ), this could set $U$ to be:

$U ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ \begin{matrix} A B C D E \end{matrix} ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ \begin{matrix} 0.5 0.5 - 0.5 - 0.5 0 \end{matrix} ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠$

Here, the average utility in each linked class ( ${A, C}$ , ${B, D}$ and ${E}$ ) is $0$ .

Applying this to the example

So, applying this approach to the full set of the $P_{i}$ , $1 \leq i \leq 5$ above (and fixing $U (B) = 0$ ), we'd get:

$U ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ \begin{matrix} A B C D E \end{matrix} ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ \begin{matrix} 2 / 3 0 - 2 / 3 - 1 - 1 \end{matrix} ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠$ .

Here $B$ is in the middle of $A$ and $C$ , as it should be, while the utilities of $D$ and $E$ are defined by their distance from $B$ only. The distance between $A$ and $C$ is $4 / 3 \approx 1.33333...$ . This is between $2$ (which would be given by $A > B$ and $B > C$ only) and $1$ (which would be given by $A > C$ only).

I've divided the normalisation from that post by $2$ , to fit better with the methods of this post. Dividing everything in a sum by the same constant gives the same equivalence class of utility functions. ↩︎

[-]adamShimi6y10

Why should all equivalence classes of linked world have the same average utility? That ensures the unicity of the utility function up to translation, but I'm not sure that's always the best way to do it. What is the intuition behind this specific choice?

[-]Stuart_Armstrong6y20

What is the intuition behind this specific choice?

That we don't currently have any information allowing us to distinguish between the equivalence classes. And the properties of what happens if we add this utility to one the distinguishes between the classes.

14

Toy model piece #5: combining partial preferences

14

Ω 8

Energy minimising between utilities

Global energy minimising condition

Solutions

Applying this to the example

14

Ω 8

14

Ω 8