No, not psychoactive drugs: allergy drugs.

This is my attempt to come to grips with the idea of self-modification. I'm interested to know of any flaws folks might spot in this analogy or reasoning.

Gandhi wouldn't take a pill that would make him want to kill people. That is to say, a person whose conscious conclusions agree with their moral impulses wouldn't self-modify in such a way that they no longer care about morally significant things. But, what about morally insignificant things? Specifically, is willingness to self-modify about X a good guide to whether X is morally significant?

A person with untreated pollen allergies cares about pollen; they have to. In order to have a coherent thought without sneezing in the middle of it, they have to avoid inhaling pollen. They may even perceive pollen as a personal enemy, something that attacks them and makes them feel miserable. But they would gladly take a drug that makes them not care about pollen, by turning off or weakening their immune system's response to it. That's what allergy drugs are for.

But a sane person would not shut off their entire immune system, including responses to pathogens that are actually attacking their body. Even if giving themselves an immune deficiency would stop their allergies, a sane allergy sufferer wouldn't do it; they know that the immune system is there for a reason, to defend against actual attacks, even if their particular immune system is erroneously sensitive to pollen as well as to pathogens.

My job involves maintaining computer systems. Like other folks in this sort of job, my team use an automated monitoring system that will send us an alert (by pager or SMS), waking us up at night if necessary, if something goes wrong with the systems. We want to receive significant alerts, and not receive false positives. We regularly modify the monitoring system to prevent false positives, because we don't like being woken up at night for no good reason. But we wouldn't want to turn off the monitoring system entirely; we actually want to receive true alerts, and we will take action to refine our monitoring system to deliver more accurate, more timely true alerts — because we would like to improve our systems to make them fail less often. We want to win, and false positives or negatives detract from winning.

Similarly, there are times when we conclude that our moral impulses are incorrect: that they are firing off "bad! evil! sinful!" or "good! virtuous! beneficent!" alerts about things that are not actually bad or good; or that they are failing to fire for things which are. Performing the requisite Bayesian update is quite difficult: training yourself to feel that donating to an ineffective charity is not at all praiseworthy, or that it can be morally preferable to work for money and donate it, than to volunteer; altering the thoughts that come unbidden to mind when you think of eating meat, in accordance with a decision that vegetarianism is or is not morally preferable; and so on.

A sane allergy sufferer wants to update his or her immune system to make it stop having false positives, but doesn't want to turn it off entirely; and may want to upgrade its response sometimes, too. A sane system administrator wants to update his or her monitoring tools to make them stop having false positives, but doesn't want to turn it off entirely; and sometimes will program new alerts to avoid false negatives. There is a fact of the matter of whether a particular particle is innocuous pollen or a dangerous pathogen; there is a fact of the matter of whether a text message alert coincides with a down web server; and this fact of the matter explains exactly why we would or wouldn't want to alter our immune system or our servers' monitoring system.

The same may apply to our moral impulses: to decide that something is morally significant is, if we are consistent, equivalent to deciding that we would not self-modify to avoid noticing that significance; to decide that it is morally significant is equivalent to deciding that we would self-modify to notice it more reliably.

EDIT: Thanks for the responses. After mulling this over and consulting the Sequences, it seems that the kind of self-modification I'm talking about above is summed up by the training of System 1 by System 2 discussed waaaaay back here. Self-modification for FAI purposes is a level above this. I am only an egg.

New Comment
5 comments, sorted by Click to highlight new comments since:

I don't think willingness to self-modify is reliable enough to use as a definition of morality, but it's definitely a useful test, and I've never heard the idea of using it that way before. Well spotted.

I like this post, if only because it cuts through the standard confusion between feeling as if doing something particular would be morally wrong, and thinking that. The former is an indicator like a taste or a headache, and the latter is a thought process like deciding it would be counterproductive to eat another piece of candy.

I don't know what the LW orthodoxy says on this issue; all I know is in general, it's pretty common for people to equivocate between moral feelings and moral thoughts until they end up believing something totally crazy. Nobody seems to confuse how good they think a piece of cake would taste with their idea of whether it would be an otherwise productive thing to do, but everybody seems to do that with morality. What's it like here?

Anyway, I agree. If we decide it would be morally wrong to eat meat, we would naturally prefer our feeling that a steak would really hit the spot right now to stop distracting us and depleting our precious willpower, right? Hold on. Let's analyze this situation a little deeper. It's not that you simply think it would ultimately be wrong to eat a piece of meat; it's that you think that about killing the animal. Why don't you want to eat the meat? Not for it's own sake, but because that would kill the animal.

It's an example where two conclusions contradict each other. At one moment, you feel revulsion at how you imagine somebody slaughtering a helpless cow, but at another one you feel desire for the taste of the steak. You're torn. There's a conflict of interests between your different selves from one moment to the next. One wants the steak no matter the price; the other considers the price way too steep. You might indulge in the steak for one minute, but regret it the next. Sounds like akrasia, right?

If you consciously decide it would be good to eat meat, the feeling of revulsion would be irrational; if you decide the opposite, the feeling of desire would. In the first case, you would want to self-modify to get rid of the useless revulsion, and in the second one, you would want to do so to get rid of the useless desire. Or would you? What if you end up changing your mind? Would it really be a good idea to nuke every indicator you disagree with? What about self-modifying so cake doesn't taste so good anymore? Would you do that to get into better shape?

Note: I'm just trying to work through the same issue. Please forgive me if this is a bit of a wandering post; most of them will be.

I think in asking "would I self modify to avoid/encourage x", you're really just asking "Do I want good things to remain being good?" Of course you do, they're good things.

This doesn't at all determine what it is about things that may qualify them as good, just that you desire for things you ascribe with the property of "goodness" to keep that property. But I don't think you're trying to do this--correct me if I'm wrong though.

So, if you never intended to identify moral facts in the first place, and you just wanted some heuristic for determining if something is "morally significant to some individual", you're fine as far as I can tell. Tautologically even.

... to decide that something is morally significant is equivalent to deciding that we would not self-modify to avoid noticing that significance

What do you think of the idea that we consider something a moral value rather than a typical preference if we consider that it is in danger of being modified?

For example, murder is the example often thrown about as something that is universally immoral. However, this is also something that humans do. While humans often say that they shouldn't murder, they often decide in specific circumstances that taking a life is something they should do. This value to not kill seems particularly modifiable.

(I said something similar yesterday but reading that post I see I didn't communicate well.)

As sets, we can think of all of our preferences contained within a set P. Of those preferences, we can either value them to some extent feel indifferently about them. Let these be sets 'PV' and 'P~V'. (For example, I don't care which color I prefer or whether I have allergies, so these preferences would be in 'P~V'. Not wanting people to die and wanting to eat when I'm hungry so I don't starve would be in 'PV'.) Then within 'PV' there is a further division: preferences are either stable ('PVS') or not stable ('PV~S'). Only preferences in the last category PV~S would be considered moral preferences.

I'm parsing things that you call 'moral' as 'things that your utility function assigns terminal utility.' Tell me if this is incorrect. Also, my apologies in advance if I ramble -- I'm thinking as I type.

So, Gandhi won't take a pill that would make him want to kill people because it would increase the probability of a person dying in a given time interval -- since Gandhi's utility function assigns negative utility to people dying, he won't do this because it's going to net him negative expected utility. The only reason that he would do this is if there was some cost for not taking the pill or some benefit for taking the pill that would cause (expected utility from taking the pill) to be greater than (expected utility from not taking the pill).

You assign negative utility to not having strong allergic reactions -- if you take a pill so that there is a smaller probability that you will have a strong allergic reaction in a given time interval, you taking the pill nets you positive expected utility, given that there are no costs associated with doing so that outweigh the benefits.

I'm going to assume that when you mentioned taking a drug that would make you "not care about pollen" that this drug causes you to not physically react to pollen, causing the positive utility in the previous paragraph. This wouldn't change your utility function itself; you're just optimizing the world around you so that (~allergy) is true instead of (allergy), with relation to yourself. This is different from the Gandhi scenario, because Gandhi's changing his utility function itself -- he is taking a psychoactive drug, in the strongest sense of the word.

I see the main distinction between decisions that involve taking actions that change your utility function itself and those that don't to be that changing your utility function itself is much riskier than changing the world around it. Gandhi probably wouldn't take a pill that make him kill people even if someone validly precommitted to kill twenty people if he didn't, because he's aware that the cost of him optimizing for negative utility instead of positive utility for the rest of his life is really big.

This was the question you asked:

Specifically, is willingness to self-modify about X a good guide to whether X is morally significant?

That might be a useful heuristic. However, I note that it doesn't apply at the extreme cases -- where there are enormous costs for not self-modifying. It also generates some false positives: there are a whole lot of things about myself that I just wouldn't dare mess with because I'm not sure that they wouldn't break the rest of me.