Averaging value systems is worse than choosing one

by PhilGoetz11 min read29th Apr 201056 comments

7

Personal Blog

A continuation of Only humans can have human values.  Revised late in the evening on April 30.

Summary: I will present a model of value systems, and show that under it, the "averaged value system" found by averaging the values of all the agents,

  • (RESTATED:) has more internal inconsistency than you would on average get by picking one agent's values at random
  • is a less stable value system than you would get by picking one agent's values at random

ADDED: The reason for doing this is that numerous people have suggested implementing CEV by averaging different value systems together.  My intuition is that value systems are not random; they are optimized in some way.  This optimization is undone if you mix together different value systems simply by averaging them.  I demonstrate this in the case where we suppose they are optimized to minimize internal conflict.

To someone working with the assumptions needed for CEV, the second bullet point is probably more important.  Stability is central to CEV, while internal inconsistency may be a mere computational inconvenience.

ADDED: Inconsistencies in value systems

We find consistent correlations in value systems.  The US has two political parties, Republican and Democrat; and many people who find one or the other obviously, intuitively correct.  Most countries have a conservative/liberal dimension that many values line up along.  It's hard to know whether this is because people try to make their values consistent; or because game theory tends to produce two parties, or even because parties form along the first principle component of the scatterplot of the values of members of society, so that some essentially artifactual vector is guaranteed to be found to be the main dimension along which opinions vary.  However, it's at least suggestive.  You seldom find a country where the conservatives favor peace and the liberals favor war; or where the liberals value religious rules more than the conservatives.  I seldom find vegetarians who are against welfare, or loggers or oilmen who are animal-rights activists.

If it's a general principle that some process causes people to form value systems with less inconsistencies than they would have by gathering different pieces from different value systems at random, it's not a great leap of faith to say that value systems with less inconsistencies are better in some way than ones with more inconsistencies.  We can at the very least say that a cobbled-together value system lacks this property of naturally-occurring human value systems; and therefore is not itself a good example of a human value system.

You might study the space of possible environments in which an agent must act, and ask where in that space values are in conflict, and what the shape of the decision boundary surfaces between actions are in that space.  My intuition is that value systems with many internal conflicts have complex boundary surfaces in that space.

More complex decision boundaries enable an agent to have a decision function that makes finer discriminations, and therefore can make more use of the information in the environment.  However, overly-complex decision boundaries may be adding noise.

If you take the value systems held by a set of agents "in the wild", we can suppose their decision boundary surfaces are adapted to their environment and to their capabilities, so that they are doing a good job of balancing the complexity of the agent's decision surface vs. their computational power and the complexity of the life they face.

If you construct a value system from those value systems, in a way that does not use the combined information used to construct all of them, and you end up with a more-complex decision surface constructed from the same amount of underlying information as a typical "wild-type" value system, you could conclude that this decision surface is overly-complex, and the extra complexities are noise/overfitting.

I have other reasons I think that the degree of inconsistency within a value system could be a metric used to evaluate it.  The comments below explore some different aspects of this.  The topic needs at least a post of its own.  The idea that higher internal consistency is always better is too simple. However, if we have a population of wild-type value systems that we think are adapted by some self-organizing process, then if we combine them in a way that produces an artificial value system that is consistently biased in the same direction - either lower or higher internal consistency than wild-type - I think that is cause for concern.

(I don't know if there are any results showing that an associative network with a higher IC, as defined below, has a more complex decision surface.  I would expect this to be the case.  A Hopfield network with no internal conflict would have a plane for its decision surface, and be able to store only 2 patterns.)

A model of value systems

Model any value system as a fully-connected network, where the nodes are values, and the connection from one value to another gives the correlation (from -1 to 1) between the recommendations for behavior given by the two values.  Each node is assigned a real number from 0 to 1 indicating how strongly the agent holds the value associated with that node.  Connection weights are fixed by the environment; node values vary according to the value system.

The internal conflict (IC) in a value system is the negative of the sum, over all pairs of nodes, of the product of the node values and the connection weight between them.  This is an energy measure that we want to minimize.  Averaging value systems together is a reasonable thing to do, for an expected-utility-maximizer, only if the average of a set of value systems is expected to give a lower IC than the average IC of all of the value systems.  (Utility = - (internal conflict).)

IC(averaged values) > average(IC) if agents are better than random

Let there be N nodes.  Let a be an agent from the set A of all agents.  Let vai be the value agent a places on node i.  Let wij be the weight between nodes i and j.  Let the "averaged agent" b mean a constructed agent b (not in A) for which vbi = average over all a of vai.  Write "the sum over all i and j of S" as sum_{i, j}(S).

Average IC = ICa = - sum_{i, j} [wij x sum_a (vai x vaj)] / |A|

Expected IC from average agent b = ICb = - sum_{i, j} [wij x (sum_a(vai) / |A|) x (sum_a(vaj) / |A|)]

Now I will introduce the concept of a "random agent", which is an agent r constructed by choosing some other agent a at random for every node i, and setting vri = vai.  Hopefully you will agree that a random agent will have, on average, a higher IC than one of our original agents, because existing agents are at least a little bit optimized, by evolution or by introspection.

(You could argue that values are things that an agent never, by definition, willingly changes, or is even capable of changing.  Rather than get into a tricky philosophical argument, I will point out that, if that is so, then values have little to do with what we call "values" in English; and what follows applies more certainly to something more like the latter, and to what we think of when people say "values".  But if you also claim that evolution does not reduce value conflicts, you must have a simple, statically-coded priority-value model of cognition, eg Brooks' subsumption architecture; and you must also believe that the landscape of optimal action as a function of environment is everywhere discontinuous, or else you would expect agents in which a slight change in stimuli results in a different value achieving dominance to suffer a penalty for taking uncorrelated actions in situations that differ only slightly.)

We find the average IC of a random agent, which we agreed (I hope) is higher than the average IC of a real agent, by averaging the contribution from pair of nodes {i, j} over all possible choices of agents used to set vri and vrj.  The average IC of a random agent is then

ICr = Average IC of a random agent = - sum_{i, j} [wij x sum_a (vai x sum_a(vaj)))] / (|A| x |A|)

We see that ICr = ICb.  In other words, using this model, constructing a value system by averaging together other value systems gives you the same result that you would get, on average, by picking one agent's value for one node, and another agent's value for another node, and so on, at random.  If we assume that the value system held by any real agent is, on average, better than such a randomly-thrown-together value system, this means that picking the value system of any real agent will give a lower expected IC than picking the value system of the averaged agent.

I didn't design this model to get that result; I designed just one model, which seemed reasonable to me, and found the proof afterward.

Value systems are stable; an averaged value system is not

Suppose that agents have already evolved to have value systems that are consistent; and that agents often actively work to reduce conflicts in their value systems, by changing values that their other values disagree with.  (But see comments below on deep values vs. surface values.  A separate post justifying this supposition, and discussing whether humans have top-level goals, is needed.)  If changing one or two node values would reduce the IC, either evolution or the agent would probably have already done so.  This means we expect that each existing value system is already a local optimum in the space of possible node values.

If a value system is not at a local optimum, it's unstable.  If you give that value system to an agent, or a society, it's likely to change to something else - possibly something far from its original setting.  (Also, the fact that a value system is not a local optimum is a strong indicator that it has higher-than-typical IC, because the average IC of systems that are a little ways d away from a local minimum is greater than the average IC of systems at a local minimum, by an amount proportional to d.)

Averaging value systems together is therefore a reasonable thing to do only if the average of a set of value systems that are all local minima is guaranteed to give a value system that is also a local minimum.

This is not the case.  Consider value systems of 3 nodes, A, B, and C, with the weights AB=1, BC=1, AC=-1.  Here are two locally-optimal value systems.  Terms in conflict measures are written as node x connection x node:

A = 0, B = 1, C = 1: Conflict = -(0 x 1 x 1 + 1 x 1 x 1 + 1 x -1 x 0) = -1

A = 1, B = 1, C = 0: Conflict = -(1 x 1 x 1 + 1 x 1 x 0 + 0 x -1 x 1) = -1

The average of these two systems is

A = 1/2, B = 1, C = 1/2: Conflict = -(.5 x 1 x 1 + 1 x 1 x .5 + .5 x -1 x .5) = -.75

We can improve on this by setting A = 1:

A = 1, B = 1, C = 1/2: Conflict = -(1 x 1 x 1 + 1 x 1 x .5 + .5 x -1 x 1) = -1 < -.75

It would only be by random chance that the average of value systems would be locally optimal.  Averaging together existing values is thus practically guaranteed to give an unstable value system.

Let me point out again that I defined my model first, and the first example of two locally-optimal value systems that I tried out, worked.

You can escape these proofs by not being rational

If we suppose that higher-than-wild-type IC is bad, under what circumstance is it still justified to choose the averaged agent rather than one of the original agents?  It would be justified if you give an extremely high penalty for choosing a system with high IC, and do not give a correspondingly high reward for choosing a system with a wild-type IC.  An example would be if you chose a value system so as to minimize the chance of having an IC greater than that given by averaging all value systems together.  (In this context, I would regard that particular goal as cheating, as it is constructed to give the averaged value system a perfect score.  It suffers zero-risk bias.)

Such risk-avoidant goals would, I think, be more likely to be achieved by averaging (although I haven't done the math).  But they do not maximize expected utility.  They suffer risk-avoidance bias, by construction.

... or by doing very thorough factor analysis

If, as I mentioned in Only humans can have human values, you can perform factor analysis and identify truly independent, uncorrelated latent "values", then the above arguments do not apply.  You must take into account multiple hypothesis testing; using mathematics that guaranteed finding such a result would not impress me.  If, for instance, you were to simply perform PCA and say that the resulting eigenvectors are your true latent values, I would respond that the first dozen eigenvectors might be meaningful, but the next thousand are overfitted to the data.  You might achieve a great simplification of the problem, and greatly reduce the difference between ICa and ICb; but would still have ICa < ICb.

ADDED: Ensemble methods

In machine learning, "ensemble methods" mean methods that combine (often by averaging together) the predictions of different classifiers.  It is a robust result that ensemble methods have better performance than any of the individual methods comprising them.  This seems to contradict the claim that an averaged value systems would be worse than any of the individual value systems comprising it.

I think there is a crucial difference, however: In ensemble methods, each of the different methods has exactly the same goals (they are trained by a process that agrees on what are good and bad decisions).  An ensemble method is isomorphic to asking a large number of people who have the same value system to vote on a course of action.

7

56 comments, sorted by Highlighting new comments since Today at 9:06 PM
New Comment

I don't understand your formalism.

Model any value system as a fully-connected network, where the nodes are values, and the connection from one value to another gives the correlation (from -1 to 1) between the recommendations for behavior given by the two values.

How do nodes recommend behavior? Does each node recommend one particular action at each time step? Does it recommend different actions to different degrees? What does the agent actually do - run an election in which each node votes on the action to take?

The internal conflict (IC) in a value system is the negative of the sum, over all pairs of nodes, of the product of the node values and the connection weight between them. This is an energy measure that we want to minimize.

Why?

How do nodes recommend behavior? Does each node recommend one particular action at each time step? Does it recommend different actions to different degrees? What does the agent actually do - run an election in which each node votes on the action to take?

You're asking a lot. It isn't a cognitive architecture. It's just modelling certain aspects of what we colloquially call "values".

Without reference to a cognitive system, you can enumerate an agent's preference systems, which would be described in terms of preferred and non-preferred outcomes. Then you observe the agent's behavior over time, and categorize the outcome of each action, and consider all the preference systems that have preferences about that outcome, and count each case where a pair of preference systems had opposite preferences for that action. You don't even have to record the outcome; you're sampling the agent's behavior only to get a good distribution of outcomes (rather than the flat distribution you would get by enumerating all possible outcomes).

If you have a particular cognitive architecture, you could map the nodes onto things you think are value propositions, and track the influence different nodes have on different action recommendations via some message-passing / credit-assignment algorithm. If you have a finite number of possible actions or action predicates, you could vary node values in a random (or systematic) way and estimate the correlation between each node and each action. That would restrict you to considering just the value in the propositional content of what I called preference systems.

OK. Can you respond to my other question? Why should we care about this Internal Conflict thing, and why do we want to minimize it?

The internal conflict (IC) in a value system is the negative of the sum, over all pairs of nodes, of the product of the node values and the connection weight between them. This is an energy measure that we want to minimize.

Why?

Good question. Because I prefer a value system that's usually not self-contradictory over one that's usually self-contradictory. I can't convince you that this is good if you are a moral nihilist, which is a very popular position on LW and, I think, central to CEV. If all possible value systems are equally good, by all means, choose one that tells you to love or hate people based on their fingerprints, and kill your friends if they walk through a doorway backwards.

Empirically, value systems with high IC resemble conservative religious values, which take evolved human values, and then pile an arbitrary rule system on top of them which gives contradictory, hard-to-interpret results resulting in schizophrenic behavior that appears insane to observers from almost any other value system, causes great pain and stress to its practitioners, and often leads to bloody violent conflicts because of their low correlation with other value systems.

Say I'm shopping for a loaf of bread. I have two values. I prefer larger loaves over smaller loaves, and I prefer cheaper loaves over more expensive loaves.

Unfortunately, these values are negatively correlated with each other (larger loaves tend to cost more). Clearly, my values are an arbitrary rule system which gives contradictory, hard-to-interpret results resulting in schizophrenic behavior that appears insane to observers from almost any other value system.

So how should I resolve this? Should I switch to preferring smaller loaves of bread, or should I switch to preferring more expensive loaves of bread?

That depends on why you prefer larger loaves of bread.

  • If you're maximizing calories or just want to feel that you're getting a good deal, go for the highest calorie-to-dollar ratio, noting sales.

  • If you need more surface area for your sandwiches, choose bread that is shaped in a sandwich-optimal configuration with little hard-to-sandwich heel volume. Make thin slices so you can make more sandwiches, and get an amount of bread that will last just about exactly until you go to the store again or until you expect diminishing marginal utility from bread-eating due to staleness.

  • If you want large loaves to maximize the amount of time between grocery trips, buy 6 loaves of the cheapest kind and put 5 of them in the freezer, to take out as you finish room-temperature bread.

  • If you just think large loaves of bread are aesthetically pleasing, pick a kind of bread with lots of big air pockets that puff it up, which is priced by dough weight.

etc. etc.

Figuring out why you have a value, or what the value is attached to, is usually a helpful exercise when it apparently conflicts with other things.

Figuring out why you have a value, or what the value is attached to, is usually a helpful exercise when it apparently conflicts with other things.

I think that, though you have given good approaches to making a good tradeoff, the conflict between values in this example is real, and the point is that you make the best tradeoff you can in the context, but don't modify your values because the internal conflict makes it hard to achieve them.

Point taken - you certainly don't want to routinely solve problems by changing your values instead of changing your environment.

However, I think you tend to think about deep values, what I sometimes call latent values, while I often talk about surface values, of the type that show up in English sentences and in logical representations of them. People do change their surface values: they become vegetarian, quit smoking, go on a diet, realize they don't enjoy Pokemon anymore, and so on. I think that this surface-value-changing is well-modelled by energy minimization.

Whether there is a set of "deepest values" that never change is an open question. These are the things EY is talking about when he says an agent would never want to change its goals, and that you're talking about when you say an agent doesn't change its utility function. The EY-FAI model assumes such a thing exists, or that they should exist, or could exist. This needs to be thought about more. I think my comments in "Only humans can have human values" on "network concepts" are relevant. It's not obvious that a human's goal structure has top-level goals. It would be a possibly-unique exception among complex network systems if they do.

I see your point. I wasn't thinking of models where you have one preference per object feature. I was thinking of more abstract examples, like trying to be a cheek-turning enemy-loving Christian and a soldier at the same time.

I don't think of choosing an object whose feature vector has the maximum dot product with your preference vector as conflict resolution; I think of it (and related numerical constraint problems) as simplex optimization. When you want to sum a set of preferences that are continuous functions of continuous features, you can generally take all the preferences and solve directly (or numerically) to find the optimum.

In the "moral values" domain, you're more likely to have discontinuous rules (e.g., "X is always bad", or "XN is not"), and be performing logical inference over them. This results in situations that you can't solve directly, and it can result in circular or indeterminate chains of reasoning, and multiple possible solutions.

My claim is that more conflicts is worse, not that conflicts can or should be eliminated. But I admit that aspect of my model could use more justification.

Is there a way to distinguish moral values from other kinds of values? Coming up with a theory of values that explains both the process of choosing who to vote for, and threading a needle, as value-optimization, is going to be difficult.

In the "moral values" domain, you're more likely to have discontinuous rules (e.g., "X is always bad", or "XN is not"), and be performing logical inference over them. This results in situations that you can't solve directly, and it can result in circular or indeterminate chains of reasoning, and multiple possible solutions.

This line of thinking is setting off my rationalization detectors. It sounds like you're saying, "OK, I'll admit that my claim seems wrong in some simple cases. But it's still correct in all of the cases that are so complicated that nobody understands them."

I don't know how to distinguish moral values from other kinds of values, but it seems to me that this isn't exactly the distinction that would be most useful for you to figure out. My suggestion would be to figure out why you think high IC is bad, and see if there's some nice way to characterize the value systems that match that intuition.

I disagree with this.

I think a natural intuition about a moral values domain suggests that things are likely to be non-linear and discontinuous.

I don't think its so much saying the claim is wrong in simple cases, but its still correct in cases no one understands.

It's more saying the alternative claims being proposed are a long ways from handling any real world example, and I'm disinclined to believe that a sufficiently complicated system will satisfy continuity and linearity.

Also, we should distinguish between "why do I expect that existing value systems are energy-minimized" and "why should we prefer value systems that are energy-minimized".

The former is easier to answer, and I gave a bit of an answer in "Only humans can have human values".

The latter I could justify within EY-FAI by therefore claiming that being energy-minimized is a property of human values.

My suggestion would be to figure out why you think high IC is bad, and see if there's some nice way to characterize the value systems that match that intuition.

That's a good idea. My "final reason" for thinking that high IC is bad may be because high-IC systems are a pain in the ass when you're building intelligent agents. They have a lot of interdependencies among their behaviors, get stuck waffling between different behaviors, and are hard to debug. But we (as designers and as intelligent agents) have mechanisms to deal with these problems; e.g., producing hysteresis by using nonlinear functions to sum activation from different goals.

My other final reason is that I consciously try to energy-minimize my own values, and I think other thoughtful people who aren't nihilists do too. Probably nihilists do too, if only for their own convenience.

My other other final reason is that energy-minimization is what dynamic network concepts do. It's how they develop, as e.g. for spin-glasses, economies, or ecologies.

Because I prefer a value system that's usually not self-contradictory over one that's usually self-contradictory.

Sometimes values you really have are in conflict, you have options to achieve one to a certain extent, or to achieve the other to different extent, and you have to figure out which is more important to you. This does not mean that you give up the value you didn't choose, just that in that particular situation, it was more effective to pursue the other. Policy Debates Should Not Appear One-Sided.

I upvoted because the formalization is interesting and the observation of what happens when we average values is a good one. But I'm still far from convinced IC is really what we need to worry about.

Empirically, value systems with high IC resemble conservative religious values, which take evolved human values, and then pile an arbitrary rule system on top of them which gives contradictory, hard-to-interpret results resulting in schizophrenic behavior that appears insane to observers from almost any other value system, causes great pain and stress to its practitioners, and often leads to bloody violent conflicts because of their low correlation with other value systems.

I think all of this applies to my liberal values: arbitrary rule system on top of evolved values? Check. Appears insane to observers from almost any other value system? Check. Causes great pain and stress to its practitioners? Check. Bloody violent conflicts because of their low correlation with other value systems? Double check!

And I still like my liberal values!

Good point. Maybe tribal ethics have the least internal conflict, since they may be closest to an equilibrium reached by evolution.

The examples that come to mind when I try to think about this concretely are political/moral disputes like abortion, torture, or redistribution of wealth. One side thinks (for example) that an abortion is a terrible thing to do to an unborn child and that prohibiting abortions does not pose much of a hardship to women, while another group thinks that aborting a fetus is not that big a deal and that prohibiting abortions would be a large hardship to women. Averaging together turns this dispute between two separate coherent groups into an internal conflict: abortion is a bad thing to do to the baby/fetus but prohibiting abortions would pose a fairly sizable hardship to women.

So averaging increases internal conflict, but some internal conflict might not be so bad, since a lot of the processes that reduce internal conflict and separate people into coherent groups are biases: the affect heuristic, the halo effect, cognitive dissonance, group polarization, affective death spirals, etc.

We would really like to use examples of inconsistent values that have been resolved. We can't, because we're unaware of them, because they've been resolved.

I would not expect value systems in the wild, produced by evolution, to have low (i.e. negative) IC. I would expect them to have IC close to 0. This is because if you have two values that have a high correlation, then you may as well delete one of them. You get the most information from your value system if your values are uncorrelated.

That's a good point. But it relates more to "deep values", by which I mean the things you would find after you do factor analysis on the surface values you would enumerate if someone asked you to list your values.

What I posted applies more to surface values.

There was a paper in Science this year claiming that nearby cortical neurons should have low correlation, because that increases the signal's information content. There was another paper this year in Science claiming that nearby cortical neurons should have high correlation, because that amplifies the strength of the signal.

Let there be N nodes. Let a be an agent from the set A of all agents. Let vai be the value agent a places on node i. Let wij be the weight between nodes i and j. Let the "averaged agent" b mean a constructed agent b (not in A) for which vbi = average over all a of vai. Write "the sum over all i and j of S" as sum_{i, j}(S).

Average IC = ICa = - sum_{i, j} [wij x sum_a (vai x vaj)] / |A|

Expected IC from average agent b = ICb = - sum_{i, j} [wij x (sum_a(vai) / |A|) x (sum_a(vaj) / |A|)]

Am I the only one who's completely lost by this?

I should have said that |A| means the number of agents in the set A. sum.a(v.aj) means the sum, over all agents, of the value they place on node j. 'x' means multiplication, not a variable. ICa and ICb are variables I defined, and maybe I should have written them on the left like so:

ICa := Average IC = ...

Upvoted for interesting formalization. I feel that somewhere along this path lies a way of extracting value systems from human beings :-) Can we also define "behavior" as some function of your node network, and then somehow see "internal conflict" in behavioral terms?

Try it and see? My reply to Peter de Blanc above is relevant, but not sufficient to answer the question.

This is an interesting observation, but I don't think internal consistency is an appropriate measure for the "goodness" of a value system. It says nothing about whether entities with a particular value system are capable of forming stable societies; I wouldn't accept any value system that lead to always choosing D in the iterated prisoner's dilemma as good, for example, no matter how internally consistent. And a paperclippers's value system has a very hbigh degree of internal consistency.

In fact, I have a hard time accepting any quality measure over value systems which doesn't include degree of correlation with my own, or at least with most of the major, non-controversial points thereof.

In fact, I have a hard time accepting any quality measure over value systems which doesn't include degree of correlation with my own, or at least with most of the major, non-controversial points thereof.

That's not really a quality measure over values systems. It's just assertion of your value system.

A rational agent acts to maximize expected utility according to its own particular utility function, which represents its values, not the utility function it chose for being easy to maximize.

In other words, you believe that neither evolution nor agents have any tendency to resolve conflicting values.

See above: Rather than get into a tricky philosophical argument, I will point out that, if that is so, then values have little to do with what we call "values" in English; and what follows applies to something more like the latter, and to what we think of when people say "values"

In other words, you believe that neither evolution nor agents have any tendency to resolve conflicting values.

I made a statement about rational agents. Why are you putting words in my mouth about evolution and general (not necessarily rational) agents?

Rather than get into a tricky philosophical argument

This really is not complicated. It is just a matter of understanding what utility functions are for.

If I wanted to maximize my utility by choosing my utility function, I would just assign infinite utility to every state of the universe. But I would rather use a utility function that represents my values, so that I can get what I actually want by maximizing it.

if that is so, then values have little to do with what we call "values" in English; and what follows applies to something more like the latter, and to what we think of when people say "values"

I have no idea what this means. What meaning of "values" are you talking about?

You are not responding to Phil's post. He is talking about quite general agents, and setting up a model that could be applied to humanity as we actually are. Why are you talking about idealized von Neumann Morgenstern agents (that have utility functions)?

You are not responding to Phil's post.

I am responding to Phil's confusion about utility functions and values, to dissolve the wrong question that the post is trying to answer.

Why are you talking about idealized von Neumann Morgenstern agents (that have utility functions)?

It is useful to understand ideally rational agents when figuring out how you can be more rational. The incompatibility between the concept of an ideally rational agent's utility function and Phil's concept of value systems indicates problems in Phil's concept.

I am responding to Phil's confusion about utility functions and values ...

Do you hold that it is always a confusion to talk about what is rather than about what should be?

It is useful to ...

I meant 'why did you talk about it in that exact context', not 'why do you ever talk about it'.

The incompatibility between the concept of an ideally rational agent's utility function and Phil's concept of value systems indicates problems in Phil's concept.

I don't see that. After all, it is not impossible or hard to describe a von Neumann Morgenstern agent in Phil's system. They are a subset of the agents that he wrote about. Is there always a problem with a concept if it can be extended to cover situations other than the most idealized ones?

Do you hold that it is always a confusion to talk about what is rather than about what should be?

The confusion is thinking that maximizing utility includes choosing a utility function that is easy to maximize. If you really have something to protect, you want your utility function to represent that, no matter how hard it makes it to maximize your utility function. If you are looking for a group utility function, you should be concerned with what best represents the group members, given their relative negotiating power, not what sort of average or other combination is easiest to maximize.

I meant 'why did you talk about it in that exact context', not 'why do you ever talk about it'.

I understand, and I did respond to that question.

After all, it is not impossible or hard to describe a von Neumann Morgenstern agent in Phil's system. They are a subset of the agents that he wrote about.

If you think so, then describe a situation where reasonable complex ideally rational agents would want to combine their utility functions in the way that Phil is suggesting. (I don't think this even makes sense if they assign non linear utility to values within the ranges achievable in the environment.)

Is there always a problem with a concept if it can be extended to cover situations other than the most idealized ones?

I deny that this is an accurate description of Phil's concept. My criticism is that agents have to make serious rationality mistakes in order to care about Phil's reasons for recommending this combining process.

... choosing a utility function that is easy to maximize ...

Where in the TLP do you see this?

I understand, and I did respond to that question.

Do you talk about all useful things in all contexts? Otherwise, how is an explanation of why it is valuable a reasonable response to a question about what you did in a specific context?

If you think so ...

Do you actually see this as controversial?

... then describe a situation where reasonable complex ideally rational agents would ...

If you think that this is relevant, then explain how you think that a model that only works for ideally rational agents is useful for arguing about what values actual humans should give to an AI.

I deny that this is an accurate description of Phil's concept.

It is not a description of his concept. It is a question about your grounds for dismissing his model without any explanation.

... choosing a utility function that is easy to maximize ...

Where in the TLP do you see this?

Phil is trying to find a combined value system that minimizes conflicts between values. This would allow tradeoffs to be avoided. (Figuring out which tradeoffs to make when your actual values conflict is a huge strength of utility functions.) Do you see another reason to be interested in this comparison of value system combinations?

I understand, and I did respond to that question.

Do you talk about all useful things in all contexts? Otherwise, how is an explanation of why it is valuable a reasonable response to a question about what you did in a specific context?

Do you have to respond to everything with an inane question? Your base level question has been answered.

If you think so ...

Do you actually see this as controversial?

I see it as an unsupported claim. I see this question as useless rhetoric that distracts from your claims lack of support, and the points I was making. So, let's bring this back to the object level. Do you see a scenario where a group of ideally rational agents would want to combine their utility functions using this procedure? If you think it is only useful for more general agents to cope with their irrationality, do you see a scenario where a group of ideally rational agents who each care about a different general agent (and want the general agent to be effective at maximising its own fixed utility function) would advise the general agents they care about to combine their utility functions in this manner?

It is not a description of his concept.

A concept that "can be extended to cover situations other than the most idealized ones" is your description of Phil's concept contained in your question. It would make this discussion a lot easier if you did not flatly deny reality.

It is a question about your grounds for dismissing his model without any explanation.

Do you always accuse people of dismissing models without explanation when they in fact have dismissed a model with an explanation? (If you forgot, the explantion is that the model is trying to figure which combined value system/utility function is easiest to satisfy/maximise instead of which one best represents the input value systems/utility functions that represent the actual values of the group members.)

How do you like being asked questions which contain assumptions you disagree with?

I think it'd be a good policy to answer the question before discussing why it might be misguided. If you don't answer the question and only talk about it, you end up running in circles and not making progress.

For example:

Instead of

Is there always a problem with a concept if it can be extended to cover situations other than the most idealized ones?

I deny that this is an accurate description of Phil's concept....

It is not a description of his concept. It is a question about your grounds for dismissing his model without any explanation.

A concept that "can be extended to cover situations other than the most idealized ones" is your description of Phil's concept contained in your question

It could be

Is there always a problem with a concept if it can be extended to cover situations other than the most idealized ones?

No, of course not. I deny that this is an accurate description of Phil's concept....

Well, I think it is because of X...

Would people accept the proposition that we can learn something about high-internal-conflict vs. low-internal-conflict value systems by studying the difference between democracies (which sum together disparate value systems) and monarchies/dictatorships/aristocracies?

Did you just say this?

  • goodness of value systems - f(x)
  • value systems = local maxima of f, but only those higher than global average E f(x)
  • if you average locations of local maxima with E y, f(E y) will be close to E f(x), and by definition less than individual E f(x) <f(y)
  • there's no reason to believe E y will be local maximum, so it will be less stable

If so, it seems to be an artifact of your way of representing and averaging "value systems", not anything that applies to the real world.

  • goodness of value systems - f(x)

Yes.

  • value systems = local maxima of f, but only those higher than global average E f(x)

No. value systems = local maxima of f, period.

  • if you average locations of local maxima with E y, f(E y) will be close to E f(x), and by definition less than individual E f(x) <f(y)

What? Can't parse.

  • there's no reason to believe E y will be local maximum, so it will be less stable

Yes.

If so, it seems to be an artifact of your way of representing and averaging "value systems", not anything that applies to the real world.

Huh? Where did that come from?

value systems = local maxima of f, but only those higher than global average E f(x)

No. value systems = local maxima of f, period.

You said "if agents are better than random" in headlines. I take "than random" to mean " > E f(x)". So do you assume so or not?

What? Can't parse.

Let me try again.

  • A - set of all local maxima x
  • B - set of all local maxima x, for which f(x) > E f(x)

In either case, avg(A) and avg(B) are arbitrary points, and we have no a priori reason to believe they will be special in any way, so E f(avg(A)) = E f(avg(B)) = E f(x). Is this right?

In case of set B - we assumed for all x \in B . f(x) > E f(x), so picking avg(B) which is essentially a random point makes it worse.

In case of set A - E f(x) for set of arbitrary local maxima is no better than set of arbitrary points, so E(f(x) | x \in A) = E(f(x)) = E f(avg(a)), so your entire argument fails.

A - set of all local maxima x

B - set of all local maxima x, for which f(x) > E f(x)

f(x) is IC(x), and E is expected value?

I didn't discuss anything corresponding to your B.

In either case, avg(A) and avg(B) are arbitrary points, and we have no a priori reason to believe they will be special in any way, so E f(avg(A)) = E f(avg(B)) = E f(x). Is this right?

No. You just said they're local maxima, which is very special; and it would be surprising if either of their expectations were the same as E[f(x)].

In case of set B - we assumed for all x \in B . f(x) > E f(x), so picking avg(B) which is essentially a random point makes it worse.

This doesn't describe what I wrote; but B is not a random set, so avg(B) is not a random point.

In case of set A - E f(x) for set of arbitrary local maxima is no better than set of arbitrary points, so E(f(x) | x \in A) = E(f(x)) = E f(avg(a)), so your entire argument fails.

No; local maxima (actually minima in this case, but same thing) are not arbitrary points. They're locally maximal. Meaning they have a higher f(x) than all the points around them. So it's impossible for the equality you just gave to hold, except when there are no local maxima (eg, a plane), or perhaps in some peculiar set constructed so that infinities make finite differences not matter. EDIT: Also add cases where a space is constructed so that local maxima are more common when f(x) is small, which is the type taw used in his reply. This is a large number of possible spaces. I believe it's a minority of possible spaces, but it would take time to formalize that.

You say B is not a random set, but for arbitrary space and function, set of local maxima will behave essentially like a random set. It can as easily have average less or more than average of the whole space.

Here's a really really simple example:

  • Space: [-1/2..+1/2], f(x) = cos(2.02 pi x).
  • There are 3 local maxima, x=-1/2, x=0, x=1/2
  • E f(x) = -0.009899
  • E(f(x) | x is local maximum) = -0.333 - so lower than space average
  • f(avg of local maxima) = f(0) = 1.

Another one:

  • Space: [-1/2..+1/2], f(x) = cos(3.03 pi x).
  • There are 3 local maxima, x=-1/2, x=0, x=1/2
  • E f(x) = -0.20987
  • E(f(x) | x is local maximum) = +0.3647 - so higher than space average
  • f(avg of local maxima) = f(0) = 1.

It is as trivial to construct sets with any relationship between E f(x), f(avg of local maxima), and E(f(x) | x is local maximum).

for arbitrary space and function, set of local maxima will behave essentially like a random set. It can as easily have average less or more than average of the whole space.

Not "easily". Only if the space is constructed to have more local maxima when f(x) is small, or the whole space has areas where f(x) goes off to infinity with no local maximum, or has some other perversity of construction that I haven't thought of.

Space: [-1/2..+1/2], f(x) = cos(2.02 pi x). There are 3 local maxima, x=-1/2, x=0, x=1/2

You're exploiting the boundaries, which you've chosen specially for this purpose. I admit that when I said this outcome was impossible, I was ignoring that kind of a space.

If, however, you consider the set of spaces where the lower and upper bounds can take on any two real values (with the upper bound > lower bound), you'll find the average of the average of the local maxima is greater than the average of the average.

I concede that you can define spaces where the set of local maxima can be below average. But you are wrong to say that they are therefore like a random set.

Note that the fact that you can define spaces where the set of local maxima can be below the average in the space does not impact my proof, which never talked about the average in the space.

How about this function?

You keep trying to guess proper caveats, I can giving you trivial counterexamples.

This one has: range over entire R, values in [-11,+10], average value 0, global maximum 10, average of local maxima -2.6524 ?

Any function which is more bumpy when it's low, and more smooth when it's high will be like that. This particular one chosen for prettiness of visualization.

Any function which is more bumpy when it's low, and more smooth when it's high will be like that.

That's what I just said:

Only if the space is constructed to have more local maxima when f(x) is small,

You are hyper-focusing on this as if it made a difference to my proofs. Please note my previous comment: It does not matter; I never talked about the average IC over all possible agents. I only spoke of IC over various recombinations of existing agents, all of which I assumed to have IC that are local minima. The "random agent" is not an agent taken from the whole space; it's an agent gotten by recombining values from the existing agents.

You said "if agents are better than random" in headlines. I take "than random" to mean " > E f(x)". So do you assume so or not?

If agents are, on average, better than the average random agent as defined. No requirement that each agent be better than random.

Folks, if you genuinely want to avoid ruining the universe, this kind of formal, analytic article is exactly the kind of article that we need more of. Everything else is just dicking around by comparison.

Instead, you vote them down into oblivion. Possibly because you feel entitled to vote against an analytical post if you can find any single element in it that you disagree with.

LessWrong has few analytical posts, and even fewer that try to develop new formalisms. Exactly what standard are you holding this post up to? Who here on less wrong, or anywhere, has done a better job of formalizing values? (That's not a rhetorical question; post references.) What is your alternative?

Phil, I liked your post, but I empathize with the down-voters you're worried about. Why?

The real content in the OP is that you're measuring conflict in value systems, not some objective "worseness" that people know not to Platonize. This point was completely hidden to me until the 12th line where you said "This is an energy measure that we want to minimize", which is too late and too roundabout for a thesis statement.

The original summary repeated the term "worse" from the title instead of earning my trust and curiosity by telling me what the content would really be about (conflict measurement). As a result, I was very tempted to stop reading.

So suppose instead you started the summary with "We're going to measure conflict in value systems", and clarified right away what you meant by "worse" in the shortened title. Then:

  1. You'd earn my trust, and initiate a new curiosity: "How's he going to measure conflict?", and

  2. Your model would be more immediately intuitive: starting to read it, I'd think "Oh, of cousre he's plotting nodes and weighted edges... he wants to measure conflict," furthering my trust that the article is actually going somewhere.

From experience, I expected your post to have eventually-decipherable interesting content, so I kept reading, and turns out it did. But perhaps not everyone felt justified to do so if they didn't share that experience, and maybe they downvoted instead.

Generally speaking, if you want your ideas to reach lots of people and gain approval, you have to re-earn the reader's trust and curiosity with every article you write.

Thanks - that's a great critique.

I just realized I may have been a bit ambiguous at the end, so I ETAd "interesting" and "turns out it did" ... i.e., I upvoted, and cheers for the analysis.

I voted your post down because I didn't see why your formalism is important, and you didn't give any reasons why it is. I also think that your statements about evolution are wrong, but that's not why I voted you down.

I also downvoted the above comment because I don't like it when people complain about being downvoted. You seem to think that the only relevant feature of your post is that it is formal, and that if it was downvoted, then people must not like formal analysis. In fact there are other formal posts that were well-received by the LW community, such as Wei Dai's posts developing Updateless Decision Theory.

The formalism is important because the default assumption people use as to how CEV will be implemented is that it will be implemented by averaging value systems together.

I think that many people downvote anything containing a proof if they think that any step in the proof is wrong. But those downvotes aren't just interpreted by others as meaning "this proof is incorrect"; they're interpreted as meaning "this topic is unimportant" or "this approach is uninteresting".

My formalism is important, if for no other reason than because it is the only one addressing the question of averaging values together. It is the only work ever done on this particular critical step of CEV.

It is the only work ever done on this particular critical step of CEV.

No it isn't. What about the entire field of voting theory?

I initially thought that it doesn't address the question of the value of the output of the system, but on reflection it does. So, I stand corrected.

But those downvotes aren't just interpreted by others as meaning "this proof is incorrect"; they're interpreted as meaning "this topic is unimportant" or "this approach is uninteresting".

This goes the other way too. Often people will vote up posts for being interesting, and others can erroneously interpret the up votes as indicating the post is correct. I think it would be better if such a post were downvoted (not excessively) and some people left comments explaining that though the topic is interesting, the argument and conclusions are not correct. Someone who sees this and is capable of writing a better, correct article on the topic would be encouraged to do so.

Would it be worthwhile (given the added complexity) to vote on different aspects of posts, so it has seperately reported karma scores for correctness, being interesting, being a good approach, being useful, being entertaining, ect?

Would it be worthwhile (given the added complexity) to vote on different aspects of posts, so it has seperately reported karma scores for correctness, being interesting, being a good approach, being useful, being entertaining, ect?

I would support a more complex karma system like that, and I think you've got a reasonable set of categories.

I'm assuming it would be grafted onto the old system, so that old karma score would be retained, but the more specific scores would be the only ones which could be added.

"Interesting" and "entertaining" karma shouldn't count for getting permission to do top level posts.