In the Wiki article on complexity of value, Eliezer wrote:

The thesis that human values have high Kolmogorov complexity - our preferences, the things we care about, don't compress down to one simple rule, or a few simple rules.

[...]

Thou Art Godshatter describes the evolutionary psychology behind the complexity of human values - how they got to be complex, and why, given that origin, there is no reason in hindsight to expect them to be simple. 

But in light of Yvain's recent series of posts (i.e., if we consider our "actual" values to be the values we would endorse in reflective equilibrium, instead of our current apparent values), I don't see any particular reason, whether from evolutionary psychology or elsewhere, that they must be complex either. Most of our apparent values (which admittedly are complex) could easily be mere behavior, which we would discard after sufficient reflection.

For those who might wish to defend the complexity-of-value thesis, what reasons do you have for thinking that human value is complex? Is it from an intuition that we should translate as many of our behaviors into preferences as possible? If other people do not have a similar intuition, or perhaps even have a strong intuition that values should be simple (and therefore would be more willing to discard things that are on the fuzzy border between behaviors and values), could they think that their values are simple, without being wrong?

New Comment
40 comments, sorted by Click to highlight new comments since: Today at 4:33 PM

I don't understand what you or Yvain mean by "values" anymore, but perhaps we can make some progress without focusing on the word.

To borrow Eliezer's examples, do you think you would discard the value of boredom given enough time to reflect? Would you discard the desire to not be optimized too hard by an outside agent? Would you discard sympathy for other conscious humans? Complexity of value is not merely about complexity of taste buds. The things that we deeply care for, and wish to continue caring for, are complex as well.

As for those people who profess to value simplicity of values, I'm willing to bite the bullet and claim that most of them wouldn't agree to lose the complex stuff listed in the previous paragraph.

To borrow Eliezer's examples, do you think you would discard the value of boredom given enough time to reflect?

I don't know, but at least it seems plausible that I might. It's clear that I have boredom as an emotion and as a behavior, but I don't see a clear reason why I would want to make it into a preference. Certainly there are times when I wish I wouldn't get bored as easily as I actually do, so I don't want to translate boredom into a preference "as is".

If I think about why I might not want a future where everyone has no boredom and could enjoy the same thing over and over again, what's driving that seems to be an intuitive aversion towards triviality (i.e., things that look too easy or lack challenge). And if I think more about why I might not want a future where things look easy instead of difficult, I can't really think of anything except that it probably has to do with signaling that I'm someone who likes challenge, which does not seem like something I really want to base my "actual preferences" on.

Would you discard the desire to not be optimized too hard by an outside agent? Would you discard sympathy for other conscious humans?

These are similar for me. If I think about them long enough, it just becomes unclear why I might want to keep them.

Also, perhaps "keep" and "discard" aren't the right word here. What you actually need to do (in this view of what values are), is affirmatively create preferences (e.g., a utility function) from your intuitions. So for any potential value under consideration, you need reasons not for why you would want to "discard", but for why you would want to "create".

This should've been obvious from the start, but your comment has forced me to realize it only now: if we understand reflective equilibrium as the end result of unrestricted iterated self-modification, then it's very sensitive to starting conditions. You and I could end up having very different value systems because I'd begin my self-modification by strengthening my safeguards against simplification of values, while you'd begin by weakening yours. And a stupid person doing unrestricted iterated self-modification will just end up someplace stupid. So this interpretation of "reflective equilibrium" is almost useless, right?

The interpretation of "reflective equilibrium" that I currently have in mind is something like this (written by Eliezer), which I think is pretty close to Yvain's version as well:

I see the project of morality as a project of renormalizing intuition. We have intuitions about things that seem desirable or undesirable, intuitions about actions that are right or wrong, intuitions about how to resolve conflicting intuitions, intuitions about how to systematize specific intuitions into general principles.

And this may not be too different from what you have in mind when you say "unrestricted iterated self-modification" but I wanted to point out that we could easily diverge in reflective equilibrium even without "hardware" self-modification, just by thinking and applying our intuitions, if those intuitions and especially meta level intuitions differ at the start. (And I do think this is pretty obvious, so it confuses me when Eliezer does not acknowledge it when he talks about CEV.)

So this interpretation of "reflective equilibrium" is almost useless, right?

I'm not sure what you mean, but in this case it seems at least useful for showing that we don't have an argument showing that our "actual values" are complex. (Do you mean it's not useful as a way to build FAI?)

(Do you mean it's not useful as a way to build FAI?)

Yes.

we don't have an argument showing that our "actual values" are complex

Do you agree that FAI probably needs to have a complex utility function, because most simple ones lead to futures we wouldn't want to happen? The answer to that question doesn't seem to depend on notions like reflective equilibrium or Yvain's "actual values", unless I'm missing something again.

Do you agree that FAI probably needs to have a complex utility function, because most simple ones lead to futures we wouldn't want to happen?

How would I know that unless I knew what I "want"? What notion of "want" are you thinking of, if not something like "values endorsed in reflective equilibrium"? I assume you're not thinking of "want" as opposed to "like" ...

Perhaps you mean "there aren't any simple utility functions that I would choose to implement in an AI right now and let it run, knowing it would then take over the world" but I don't think that shows FAI probably needs to have a complex utility function. It could just be that I need more time to think things over but will eventually decide to implement a simple utility function.

Retreating further along the line of Eliezer's reasoning to find the point where you start to disagree: how about AIs that don't take over the world? For example, I want an AI that I can ask for a cheeseburger, and it will produce a cheeseburger for me while respecting my implied wishes to not burn the world with molecular nanotech or kill the neighbor's dog for meat. Do you agree that such a device needs to have lots of specific knowledge about humans, and not just about cheeseburgers? If yes, then how is the goal of solving the world's problems (saving kids in Africa, stopping unfriendly AIs, etc) relevantly different from the goal of making a cheeseburger?

Cousin_it and I had an offline chat. To recap my arguments:

  1. It's not clear that a cheeseburger-making AI needs to have lots of specific knowledge about humans. As we discussed, one possibility is to give it a utility function that assigns negative value to anything consequential crossing some geographical boundary, except a cheeseburger.
  2. More generally, the fact that we can't easily think of a way to solve a particular real-world problem (with minimal side effects) using an AI with a simple utility function is only weak evidence that such a simple utility function doesn't exist, since the space of utility functions simple enough to be hand coded is still enormous.
  3. Even if there are some real-world problems that can't be solved using simple utility functions, I do not just want to solve particular problem. I want to get "what I really want" (in some sense that I can't define clearly, but is more like "my reflective equilibrium values" than "my current behavioral tendencies"), and it seems plausible that "what I really want" is less complex than the information needed to tell an AI how to solve particular problems while keeping the world otherwise unchanged.
  4. I think Eliezer's "thou art godshatter" argument was meant to be generally applicable: if it was sound, then we can conclude that anyone who thinks their values are simple is wrong in an objective sense. We no longer seem to have such an argument (that is still viable) and those who are proponents for "complexity of value" perhaps have to retreat to something like "people who think they have simple values are wrong in the sense that their plans would be bad according to my values".

"thou art godshatter" argument ... We no longer seem to have such an argument (that is still viable)

How about this restatement (focus), I think you agreed previously, and your response was more subtle that denying it. IIRC, you said that complex aspects of value are present, but possibly of low importance, so if we have no choice (time), we should just go the simple way, while still capturing nontrivial amount of value. This is different from saying that value could be actually simple.

There are many possible values, and we could construct agents that optimize any one of them. Among these many values only few are simple in abstract (i.e. not referring to artifacts in the world) and explicit form. What kind of agents are human, which of the many possible values are associated with them in a sufficiently general sense of value-association that applies to humans? For humans to have specifically those few-of-many simple values, they would need to be constructed in a clean way with an explicit goal architecture that places those particular abstract goals in charge. But humans are not like that, they have a lot of accidental complexity in every joint, so there is no reason to expect them to implement that particular simple goal system.

This is an antiprediction, it argues from a priori improbability, rather than tries to shift an existing position that favors simple values. At this point, it intuitively feels to me that simple values are unprivileged, and there seems to be no known reason to expect that they would be more likely than anything else (that has more data). This kind of simplicity is not the kind from Occam's razor: it seems like expecting air molecules in the room to keep in one corner, arranged on a regular grid, as opposed to being distributed in a macroscopically more uniform, but microscopically enormously detailed configuration. We have this brain-artifact that is known to hold lots of data, and expecting all this data to amount to a simple summary doesn't look plausible to me.

I think you agreed previously, and your response was more subtle that denying it. IIRC, you said that complex aspects of value are present, but possibly of low importance, so if we have no choice (time), we should just go the simple way, while still capturing nontrivial amount of value. This is different from saying that value could be actually simple.

Yes, this post is making a different point from that one.

But humans are not like that, they have a lot of accidental complexity in every joint, so there is no reason to expect them to implement that particular simple goal system.

But why think all that complexity is relevant? Surely at least some of the complexity is not relevant (for example a Tourette sufferer's tendency to curse at random, or the precise ease with which some people get addicted to gambling). Don't you need an additional argument to conclude that at least a substantial fraction of the complexity is relevant, and that this applies generally, even to those who think otherwise and would tend to discard such complexity when they reflect on what they really want?

Don't you need an additional argument to conclude that at least a substantial fraction of the complexity is relevant, and that this applies generally, even to those who think otherwise and would tend to discard such complexity when they reflect on what they really want?

Again, it's an antiprediction, argument about what your prior should be, and not an argument that takes the presumption of simple values being plausible as a starting point and then tries to convince that it should be tuned down. Something is relevant, brains are probably relevant, this is where decision-making happens. The claim that the relevant decision-theoretic summary of that has any given rare property, like simplicity, is something that needs strong evidence, if you start from that prior. I don't see why privileging simplicity is a viable starting point, why this hypothesis deserves any more consideration than the claim that the future should contain a perfect replica of myself from the year 1995.

Why is simplicity assumed to be a rare property? There are large classes of things that tend to be "simple", like much of mathematics; don't you have to argue that "brains" or minds belong to the class of things whose members do not tend to be "simple"? (I can see obvious reasons that one would think this is obvious, and non-obvious reasons that one would think this is non-obvious, which makes me think that it shouldn't be assumed, but if I'm the only person around who could make the non-obvious arguments then I guess we're out of luck.)

When answer is robustly unattainable, it's pointless to speculate what it might be, you can only bet or build conditional plans. If "values" are "simple", but you don't know that, your state of knowledge about the "values" remains non-simple, and that is what you impart to AI. What does it matter which of these things, the confused state of knowledge or the correct answer to our confusion, do we call "values"?

If I think the correct answer to our confusion will ultimately turn out to be something complex (in the sense of Godshatter-like), then I can rule out any plans that eventually call for hard coding such an answer into an AI. This seems to be Eliezer's argument (or one of his main arguments) for implementing CEV.

On the other hand, if I think the correct answer may turn out to be simple, even if I don't know what it is now, then there's a chance I can find out the answer directly in the next few decades and then hard code that answer into an AI. Something like CEV is no longer the obvious best approach.

(Personally I still prefer a "meta-ethical" or "meta-philosophical" approach, but we'd need a different argument for it besides "thou art godshatter"/"complexity of value".)

I believe that values are complex because I observe the apparent values to be complex and because I haven't seen a simple value system which wouldn't contradict both actual human behaviour and expressed preferences in some respect.

our "actual" values to be the values we would endorse in reflective equilibrium

Are there reasons to think that there is a reflective equilibrium and that it is unique? Anyway, if I had to construct an equilibrium, I would try to make it as close to the unstable apparent values which I endorse now and since this system is complex, the approximation may be complex as well. It is certainly possible to define a simple equilibrious value system, but I would be wary to talk about human values then.

Even consequentialist or utilitarian-esque theories (supposedly among the very simplest) have many independent dimensions, and switching even one leads to dramatically different preferences. Some examples:

  • Counting the 'quantity' individuals/preferences/pains/pleasures in light of issues like those in Bostrom's duplication paper or some of your discussions
  • Deciding what to count as a preference/pain/pleasure: can storing a bigger number in a register make a being ten times more valuable? If we start with a complex AI with a utility function we think valuable, and then gradually strip out all the intelligence/memory/capability how does this diminish its moral value?
  • Treatment of infinities
  • Average vs total vs more complex functions from patterns of goods to value

In other words, it sounds like you're saying human values are complex because all values are complex. But that doesn't by itself prove that human values are much more complex than those of a pleasure maximizer, which is a claim that I take to be included in the "value is complex"/"thou art godshatter" idea.

ETA: Though your point may well be sufficient to support the particular implication that we're unlikely to get the values right by hard-coding them or having them randomly emerge.

Though your point may well be sufficient to support the particular implication that we're unlikely to get the values right by hard-coding them or having them randomly emerge.

If someone thinks it's pretty likely that their preferences (as endorsed in reflective equilibrium) can be captured by some form of utilitarianism, then their time could easily be better spent trying to narrow down the exact form of utilitarianism (with the aim of hard-coding it) instead of trying to figure out how to specify and implement something like CEV.

Utilitarians haven't made that much progress on these issues in the last 60 years, and there are a lot of them. Do you have a further argument for why this could easily be the case?

There's little progress in the sense of resolving disagreements, but if someone thinks what is important is not public consensus on a particular form of utilitarianism, but just what they would endorse in reflective equilibrium as an individual, and they already have fairly strong intuitions on what kind of utilitarianism is right (which some people seem to), then that seems sufficient to make "narrow down the exact form of utilitarianism" a potentially better plan than "implement CEV" from their perspective.

I meant that there's been little progress in the sense of generating theories precise enough to offer concrete recommendations, things that might be coded into an AI, e.g. formal criteria for identifying preferences, pains, and pleasures in the world (beyond pointing to existing humans and animals, which doesn't pin down the content of utilitronium), or a clear way to value different infinitely vast worlds (with all the rearrangement issues discussed in Bostrom's "Infinitarian Challenge" paper). This isn't just a matter of persistent moral disagreement, but a lack of any comprehensive candidates that actually tell you what to do in particular situations rather than having massive lacunae that are filled in by consideration of individual cases and local intuitions.

then that seems sufficient to make "narrow down the exact form of utilitarianism" a potentially better plan than "implement CEV" from their perspective

This seems to me more about the "C" than the "EV." I think such a utilitarian should still be strongly concerned with having at least their reflective equilibrium extrapolated. Even a little uncertainty about many dimensions means probably going wrong, and it seems that reasonable uncertainty about several of these things (e.g. infinite worlds and implications for probability and ethics) is in fact large.

I meant that there's been little progress in the sense of generating theories precise enough to offer concrete recommendations, things that might be coded into an AI, e.g. formal criteria for identifying preferences, pains, and pleasures in the world (beyond pointing to existing humans and animals, which doesn't pin down the content of utilitronium)

One could argue that until recently there has been little motivation amongst utilitarians to formulate such precise theories, so you can't really count all of the past 60 years as evidence against this being doable in the next few decades. Some of the problems weren't even identified until recently, and others, like how to identify pain and pleasure, could be informed by recent or ongoing science. And of course these difficulties have to be compared with the difficulties of EV. Perhaps I should just say that it's not nearly as obvious that "hard-coding" is a bad idea, if "complexity of value" refers to the complexity of a precise formulation of utilitarianism, for example, as opposed to the complexity of "Godshatter".

Even a little uncertainty about many dimensions means probably going wrong, and it seems that reasonable uncertainty about several of these things (e.g. infinite worlds and implications for probability and ethics) is in fact large.

Is it plausible that someone could reasonably interpret lack of applicable intuitions along some dimensions as indifference, instead of uncertainty?

In other words, it sounds like you're saying human values are complex because all values are complex.

Yes, I'm more confident of this than of the godshatter thesis. With respect to the godshatter thesis, it seems fairly likely to me that reflective equilibrium does not wash away all our desires' complexity, because many values are self-defending, e.g. desire for varied experiences, protection of loved ones, etc. They seem not much more arbitrary than any form of hedonism particular enough to answer all the questions above, and so likely to rise or fall together (under conditions of reflection and self-control).

Many beliefs are also Self-defending, but that is little reason to hold onto them. There is no Reason for the same principle not to apply to beliefs about morality, which values implicitly are. Thus their Self-defending-ness doesn't necessarily mean that reflective equilibrium doesn't just throw them out as false theories rather than updating on them as raw data, though of course they are both. Or so one perspective would argue.

Eliezer's justification for extending the "complexity of evolved motivation" to the "complexity of reflective equilibrium value", as far as I can tell, is that he knows or strongly suspects that the relevant reflective equilibrium involves people leading good lives, with the word "good" to be filled in by fun theory along the lines of what he's posted in the fun theory sequence, which in turn refers back to complex evolved motivation. That seems plausible enough to me, but I agree the aforementioned extension isn't an automatic step.

I think it would be reasonable if Eliezer thinks his own reflective equilibrium involves people leading such "good" lives, but not if he thinks everyone has similar reflective equilibria. I know he draws heavily on the notion of psychological unity of humankind, but it seems insufficient to ensure that people share the same "reflective equilibrium value" as opposed to "evolved motivation", given that people start off with different moral intuitions and the process of reflecting on those intuitions seems more divergent than convergent. Again, for people who already have a strong intuition that morality or their values should be simple, it seems quite plausible that they would end up with "reflective equilibrium value" that is quite simple compared to "Godshatter".

people who already have a strong intuition that morality or their values should be simple

Such an intuition may not survive 1) a better intuitive understanding of how human psychology can make some things seem simple even when they are not, and 2) a better intuitive understanding of how all the usual arguments for Occam's razor apply to facts but not to values.

people start off with different moral intuitions and the process of reflecting on those intuitions seems more divergent than convergent.

Why do you believe this? What sorts of examples do you have in mind?

(i.e., if we consider our "actual" values to be the values we would endorse in reflective equilibrium, instead of our current apparent values)

I don't see why such a "reflective equilibrium" should be well defined and/or unique.

I agree, and I would question the notion of reflective equilibrium myself on those grounds. Here I'm mostly using it as an example of what "actual values" might be, in contrast with "mere behavior", in order to point out that Eliezer's "Thou Art Godshatter" argument is not sufficient to show that our values are actually complex.

This is an old post, but I think I have something relevant to say:

I think that because Yvain's posts focus on the times when your "behaviors" and your "values" conflict and your behaviors end up sabotaging important values, it gives the impression that behaviors just aren't important to your conscious mind. I don't think that's entirely true. Our conscious mind sometimes works in harmony with our behaviors and sees them as terminal values.

I think a good post of Yvain's that helps illustrate my point is his essay on Wanting, Liking, and Approving. We can think of our reflective equilibrium values as "Approving" and our more primal and instinctive behaviors as "Liking" and "Wanting." I think that an essential part of CEV and AI design is that people should maintain a balance between doing things they Approve of and doing things they Like. A future where people only do things they approve of and never do anything they like seems dull and puritanical. By contrast, a future where people just do what they Like and never do anything they Approve of seems pointlessly hedonistic. (I don't think that pure Wanting that is separated from both Liking and Approving has any value, however)

Both Liking and Approving are essential parts of a fulfilled life. A world that maximizes one at the expense of the other is not a good world. And while a reflective equilibrium might be able to simplify what you Approve of, I don't think it can ever simplify what you Like.

ADDED: Thinking about the concept of reflective equilibrium, I think that it is more of a way to prioritize values and behaviors than a way to discard them. If we assign a behavior low value in a reflective equilibrium that doesn't mean the behavior has no moral value. It simply means that in cases where that behavior interferes with a value we approve of more strongly we may have to stop it in order to achieve that more important goal.

I think Yvain's main point is that sometimes our more egosyntonic goals interfere with our egodystonic ones, and that in a reflective equilibrium we would assign priority to our most egosyntonic goals. I don't think that egosyntonic and egodystonic are binary categories, however. I think they are a continuum. I have many goals that are not inherently egodystonic, but become so when they are chosen as an alternative to another, even more important goal. Sometimes we might need to suppress one of our less egosyntonic behaviors to prevent it from interfering with an even more egosyntonic one. For instance, I don't disapprove of my reading funny websites on the Internet, I just approve of it less than I approve of my writing a novel. But if I'd already gotten a good bit of work done on a novel, I'd stop not approving of reading funny websites. It would be bad to delete our less egosyntonic desires altogether, because someday in the future we might find a way to satisfy both of those goals at once.

Also, after this recent exchange I had, I'm also less sure than before that Yvain is 100% right that egodystonic, approving goals should always trump "liking" goals. I can think of many behaviors I want to suppress so I can engage more frequently in doing things I approve of. But I also remember times in the past when I decided, after some reflection, that the approving part of my mind was a huge prude and I should have more fun. Maybe the conflict between the facets of our brain isn't as clear cut as I thought.

Again, Yvain previously argued that what made something a value and not a behavior was that it was egosyntonic. So we can answer the question of whether or not value is complex by asking if many of our complex behaviors are also egosyntonic. I know that in my case they definitely are. And even when they aren't the reason they aren't is often because they are obstructing an even more egosyntonic desire, not because I don't approve of them at all.

Is it from an intuition that we should translate as many of our behaviors into preferences as possible?

I know that in my case, it's because many of these behaviors are positive experiences for me, and that it would be negative to reject them unless they seriously interfered with something else. Furthermore, many of these behaviors are things I value, I just value them a little less than other things, and wish they wouldn't keep subverting my efforts to accomplish those other things.

For those who might wish to defend the complexity-of-value thesis, what reasons do you have for thinking that human value is complex?

Humans have in their brains a lot of information about what morality is and is not, and it'd be silly to throw away all that information, especially considering that some of the simple underlying laws of morality might very well have some degree of context-sensitivity. Fundamental valuing of original contexts is one of the reasons I see that an FAI might bother to upload and run some humans or transhumans for a few septillions of years instead of only computing variations of convergent completely moral angelic computations instead.

I think a good bit of my value complexity comes from the fact that I live in a human body.

Consider valuing tasty food.

What I will find "tasty" is based on a huge number of factors, like amount and type of fat, salt, sugar, texture, temperature, weather, what I've eaten that day, what I've eaten that week, what else I'm eating, time of day, the chemical state of my body, culture, etc.

And as long as I live in this body, I think that I won't be able to self-modify enough to eliminate most of this complexity in what I like (and all else being equal, want).

[-][anonymous]12y00

Sorry for the triple post, my internet is wonky.

[This comment is no longer endorsed by its author]Reply
[-][anonymous]12y00

Sorry for the triple post, my internet is wonky.

[This comment is no longer endorsed by its author]Reply
[-][anonymous]13y00

Biology 101 says that we can approximate most creature's values by treating them as resource-limited maximisers of their own genes. Current compression reduces the human genome to about 70mb, so that's a first estimate. Humans also spend some of their time acting as maximisers of the genomes of their symbionts (foodstuffs, pets, parasites, memes) - which if anything seems likely to add to the complexity of their values.

[This comment is no longer endorsed by its author]Reply

You can have high complexity if you value more than two things (like we do) by having high-entropy exchange rates. And as evolved creatures, entropy isn't just a good idea, it's the law. :P

So pure description length is all that interesting. More interesting is "model complexity" - that is, if you drew human value as a graph of connections and didn't worry about the connection strengths, what would the complexity of the graph be? Also interesting is "ideal outcome complexity" - would the ideal universe be tiled with lots of copies of something fairly simple, or would it be complicated?

I think that ideal outcome complexity is pretty definitely high, based on introspection, fiction, the lack of one simple widely known goal, evolution generating lots of separate drives with nonlinear responses, etc. But this only implies high model complexity if I don't value something that correlates with complexity - which is possible. So I'll have to think about that one a bit.

An obvious candidate for discussion here is Clippy, who has a simple, easy to define value system. Should we be suspicious of Clippy purely on the basis of their simple moral system, or is it the paperclip related content that's a problem?

[This comment is no longer endorsed by its author]Reply

Most of our apparent values (which admittedly are complex) could easily be mere behavior [...] and therefore would be more willing to discard things that are on the fuzzy border between behaviors and values

Comparing "behaviour" and "values" as though they are the same sort of thing seems pretty strange.

It is probably best to think of those as different types of thing - and then there is no "fuzzy border".