I'm writing a series of posts about replacing guilt motivation over on MindingOurWay, and I plan to post the meatier / more substantive posts in that series to LessWrong. This one is an allegory designed to remind people that they are allowed to care about the outer world, that they are not cursed to only ever care about what goes on in their heads.


Once upon a time, a group of naïve philosophers found a robot that collected trinkets. Well, more specifically, the robot seemed to collect stamps: if you presented this robot with a choice between various trinkets, it would always choose the option that led towards it having as many stamps as possible in its inventory. It ignored dice, bottle caps, aluminum cans, sticks, twigs, and so on, except insofar as it predicted they could be traded for stamps in the next turn or two. So, of course, the philosophers started calling it the "stamp collector."

Then, one day, the philosophers discovered computers, and deduced out that the robot was merely a software program running on a processor inside the robot's head. The program was too complicated for them to understand, but they did manage to deduce that the robot only had a few sensors (on its eyes and inside its inventory) that it was using to model the world.

One of the philosophers grew confused, and said, "Hey wait a sec, this thing can't be a stamp collector after all. If the robot is only building a model of the world in its head, then it can't be optimizing for its real inventory, because it has no access to its real inventory. It can only ever act according to a model of the world that it reconstructs inside its head!"

"Ah, yes, I see," another philosopher answered. "We did it a disservice by naming it a stamp collector. The robot does not have true access to the world, obviously, as it is only seeing the world through sensors and building a model in its head. Therefore, it must not actually be maximizing the number of stamps in its inventory. That would be impossible, because its inventory is outside of its head. Rather, it must be maximizing its internal stamp counter inside its head."

So the naïve philosophers nodded, pleased with this, and then they stopped wondering how the stamp collector worked.


There are a number of flaws in this reasoning. First of all, these naïve philosophers have made the homunculus error. The robot's program may not have "true access" to how many stamps were in its inventory (whatever that means), but it also didn't have "true access" to it's internal stamp counter.

The robot is not occupied by some homunculus that has dominion over the innards but not the outards! The abstract program doesn't have "true" access to the register holding the stamp counter and "fake" access to the inventory. Steering reality towards regions where the inventory has lots of stamps in it is the same sort of thing as steering reality towards regions where the stamp-counter-register has high-number-patterns in it. There's not a magic circle containing the memory but not the inventory, within which the robot's homunculus has dominion; the robot program has just as little access to the "true hardware" as it has to the "true stamps."

This brings us to the second flaw in their reasoning reasoning, that of trying to explain choice with a choice-thing. You can't explain why a wall is red by saying "because it's made of tiny red atoms;" this is not an explanation of red-ness. In order to explain red-ness, you must explain it in terms of non-red things. And yet, humans have a bad habit of explaining confusing things in terms of themselves. Why does living flesh respond to mental commands, while dead flesh doesn't? Why, because the living flesh contains Élan Vital. Our naïve philosophers have made the same mistake: they said, "How can it possibly choose outcomes in which the inventory has more stamps? Aha! It must be by choosing outcomes in which the stamp counter is higher!," and in doing so, they have explained choice in terms of choice, rather than in terms of something more basic.

It is not an explanation to say "it's trying to get stamps into its inventory because it's trying to maximize its stamp-counter." An explanation would look more like this: the robot's computer runs a program which uses sense-data to build a model of the world. That model of the world contains a representation of how many stapms are in the inventory. The program then iterates over some set of available actions, predicts how many stamps would be in the inventory (according to the model) if it took that action, and outputs the action which leads to the most predicted stamps in its possession.

We could also postulate that the robot contains a program which models the world, predicts how the world would change for each action, and then predicts how that outcome would affect some specific place in internal memory, and then selects the action which maximizes the internal counter. That's possible! You could build a machine like that! It's a strictly more complicated hypothesis, and so it gets a complexity penalty, but at least it's an explanation!

And, fortunately for us, it's a testable explanation: we can check what the robot does, when faced with the opportunity to directly increase the stamp-counter-register (without actually increasing how many stamps it has). Let's see how that goes over among our naïve philosophers…


Hey, check it out: I identified the stamp counter inside the robot's memory. I can't read it, but I did find a way to increase its value. So I gave the robot the following options: take one stamp, or take zero stamps and I'll increase the stamp counter by ten. Guess which one it took?

"Well, of course, it would choose the latter!" one of the naïve philosophers answers immediately.

Nope! It took the former.

"… Huh! That means that the stampyness of refusing to have the stamp counter tampered with must worth be more than 10 stamps!"

Huh? What is "stampyness"?

"Why, stampyness is the robot's internal measure of how much taking a certain action would increase its stamp counter."

What? That's ridiculous. I'm pretty sure it's just collecting stamps.

"Impossible! The program doesn't have access to how many stamps it really has; that's a property of the outer world. The robot must be optimizing according to values that are actually in its head."

Here, let's try offering it the following options: either I'll give it one stamp, or I'll increase its stamp counter by Ackermann(g64, g64) — oh look, it took the stamp."

"Wow! That was a very big number, so that almost surely mean that the stampyness of refusing is dependent upon how much stampyness it's refusing! It must be very happy, because you just gave it a lot of stampyness by giving it such a compelling offer to refuse."

Oh, here, look, I just figured out a way to set the stamp counter to maximum. Here, I'll try offering it a choice between either (a) one stamp, or (b) I'll set the stamp counter to maxi — oh look, it already took the stamp.

"Incredible! That must there must be some other counter measuring micro-stampyness, the amount of stampiness it gets immediately upon selecting an action, before you have a chance to modify it! Ah, yes, that's the only possible explanation for why it would refuse you setting the stamp counter to maximum, it must be choosing according to the perceived immediate micro-stampyness of each available action! Nice job doing science, my dear fellow, we have learned a lot today!"


Ahh! No! Let's be very clear about this: the robot is predicting which outcomes would follow from which actions, and it's ranking them, and it's taking the actions that lead to the best outcomes. Actions are rated according to what they achieve. Actions do not themselves have intrinsic worth!

Do you see where these naïve philosophers went confused? They have postulated an agent which treats actions like ends, and tries to steer towards whatever action it most prefers — as if actions were ends unto themselves.

You can't explain why the agent takes an action by saying that it ranks actions according to whether or not taking them is good. That begs the question of which actions are good!

This agent rates actions as "good" if they lead to outcomes where the agent has lots of stamps in its inventory. Actions are rated according to what they achieve; they do not themselves have intrinsic worth.

The robot program doesn't contain reality, but it doesn't need to. It still gets to affect reality. If its model of the world is correlated with the world, and it takes actions that it predicts leads to more actual stamps, then it will tend to accumulate stamps.

It's not trying to steer the future towards places where it happens to have selected the most micro-stampy actions; it's just steering the future towards worlds where it predicts it will actually have more stamps.


Now, let me tell you my second story:

Once upon a time, a group of naïve philosophers encountered a group of human beings. The humans seemed to keep selecting the actions that gave them pleasure. Sometimes they ate good food, sometimes they had sex, sometimes they made money to spend on pleasurable things later, but always (for the first few weeks) they took actions that led to pleasure.

But then one day, one of the humans gave lots of money to a charity.

"How can this be?" the philosophers asked, "Humans are pleasure-maximizers!" They thought for a few minutes, and then said, "Ah, it must be that their pleasure from giving the money to charity outweighed the pleasure they would have gotten from spending the money."

Then a mother jumped in front of a car to save her child.

The naïve philosophers were stunned, until suddenly one of their number said "I get it! The immediate micro-pleasure of choosing that action must have outweighed —


People will tell you that humans always and only ever do what brings them pleasure. People will tell you that there is no such thing as altruism, that people only ever do what they want to.

People will tell you that, because we're trapped inside our heads, we only ever get to care about things inside our heads, such as our own wants and desires.

But I have a message for you: You can, in fact, care about the outer world.

And you can steer it, too. If you want to.

New to LessWrong?

New Comment
14 comments, sorted by Click to highlight new comments since: Today at 9:58 AM

Warning: Nitpick ahead.

This brings us to the second flaw in their reasoning reasoning, that of trying to explain choice with a choice-thing. You can't explain why a wall is red by saying "because it's made of tiny red atoms;" this is not an explanation of red-ness. In order to explain red-ness, you must explain it in terms of non-red things. And yet, humans have a bad habit of explaining confusing things in terms of themselves. Why does living flesh respond to mental commands, while dead flesh doesn't? Why, because the living flesh contains Élan Vital. Our naïve philosophers have made the same mistake: they said, "How can it possibly choose outcomes in which the inventory has more stamps? Aha! It must be by choosing outcomes in which the stamp counter is higher!," and in doing so, they have explained choice in terms of choice, rather than in terms of something more basic.

Tautologies have value though. Math is an easy example, but I think even Elan Vital is defensible. Flesh that's dead and flesh that's alive look very much alike for the first few minutes after death. So "obviously" something about the flesh must have changed, but it's really hard to know what that something actually is, so the concept of Elan Vital makes sense. The concept does have some marginal amount of predictive power - it tells us that it is the flesh which has gone through a change rather than something else. An example of a case where this wouldn't be so would be if we were talking about shadows - an object might change color due to the environment around it changing, like when the sun sets.

Really, it's treating Elan Vital as a sufficient answer that precludes the need for any other answers which is the mistake.

It's worse than that. Only naive reductionism expercs that you can always explain am X in terms of something else. What is matter made of?

I see your overall point and agree with it as applied to this issue, but I kind of think that it is true you can always explain X in terms of something else ~X if you know enough about it. Matter can be explained in terms of mass, for example. Your point is true in that you can't particularly far without reaching a point of self reference, if you moved on to asking detailed questions about what mass is and what energy is and so on I expect you'd hit some limits, but being able to take at least one step away from "the thing in itself" is crucial if you want to talk about something actually relevant to this interconnected universe we live in.

[-][anonymous]9y10

The key term here is reductively explain. If you have to switch to some other mode of explanation when reduction bottoms out, then reductive explanation has inherent limits that aren't brought out in the chearleeding rhetoric. On the other hand, all is not lost.

[This comment is no longer endorsed by its author]Reply

The key term here is reductively explain. If you have to switch to some other mode of explanation when reduction bottoms out, then reductive explanation has inherent limits that aren't brought out in the chearleeding rhetoric. On the other hand, all is not lost.

People will tell you that humans always and only ever do what brings them pleasure.

Actually, Paul Dolan's book "Happiness by Design" offers a better theory: that humans are motivated by two broad classes of feelings that we can dub "pleasure" and "purpose", and that both are required for happiness. In practice, of course, each of these broad classes contains many, many, sub-categories of emotion.

In general, human beings are easier to understand if you don't try to treat them as utility maximizers. We don't have only one stamp counter. (Or rather, we don't have only one future-stamp-count-predictor, which is a good and correct point in your article. That is, that humans try to steer towards desirable states. I'm just pointing out that "desirable" includes a variety of metrics, many of which are better described as "purposeful" rather than pleasurable.)

This is the first time I've seen anyone on LW make this point with quite this level of explicitness, and I appreciated reading it.

Part of why it might be a useful message around these parts is that it has interesting implications for simulationist ethics, depending on how you treat simulated beings.

Caring about "the outer world" in the context of simulation links naturally to thought experiments like Nozick's Experience Machine but depending on one's approach to decision theory, it also has implications for simulated torture threats.

The decision theoretic simulated torture scenario (where your subjective experience of making the decision has lots of simulation measure via your opponent having lots of CPU, with the non-complying answer causing torture in all simulated cases) has been kicking around since at least 2006 or so. My longstanding position (if I were being threatened with simulated torture) has always been to care, as a policy, about only "the substrate" universe.

In terms of making it emotionally plausible that I would stick to my guns on this policy, I find it helpful to think about all my copies (in the substrate and in the simulation) being in solidarity with each other on this point, in advance.

Thus, my expectations are that when I sometimes end up experiencing torture for refusing to comply with such a threat, I will get the minor satisfaction of getting a decisive signal that I'm in a simulation, and "taking one for the team". Conversely, when I sometimes end up not experiencing simulated torture it increments my belief that I'm "really real" (or at least in a sim where the Demon is playing a longer and more complex game) and should really keep my eye on the reality ball so that my sisters in the sims aren't suffering in vain.

The only strong argument against doing this is for the special case where I'm being simmed with relatively high authenticity for the sake of modeling what the real me is likely to do in a hostile situation in the substrate... like a wargame sort of thing... and in that case, it could be argued that acting "normally" so the simulation is very useful is a traitorous act to the version of me that "really matters" (who all my reality-focused-copies, in the sim and in the real world, would presumably prefer to be less predictable).

For the most part I discount the wargame possibility in practice, because it is such a weirdly paranoid setup that it seems to deserve to be very discounted. (Also, it would be ironic if telling me that I might be in an enemy run wargame sim makes the me that counts the most act erratically in case she might be in the sim!)

I feel like the insight that "the outer world matters" has almost entirely healthy implications. Applying the insight to simulationist issues is fun but probably not that pragmatically productive except possibly if one is prone to schizophrenia or some such... and this seems like more of an empiric question that could be settled by psychiatrists than by philosophers ;-)

However, the fact that reality-focused value systems and related decision theories are somewhat determinative for the kinds of simulations that are worth running (and hence somewhat determinative of which simulations are likely to have measure as embeddings within larger systems) seems like a neat trick. Normally the metaphysicians claim to be studying "the most philosophically fundamental thing", but this perspective gives reason to think that the most fundamental thing (even before metaphysics?) might be how decisions about values work :-)

[-][anonymous]9y50

Nice article! I had always thought that Christianity was my only motivation for caring about the outer world. After I de-converted, I was really confused to find that I still cared. I still wanted to donate to EA charities. This brought me some happiness, but after a lot of introspection, I was pretty sure that it would bring me more happiness if I donated only half of the money, and spent the other half of the money on a trip back to Guatemala to visit close friends. Even realizing this, though, I still didn't want to, and it boggled my mind...

Then I started learning about evolution and natural selection, that altruism is partially genetic, and it all cleared up a bit for me. I came to the conclusion that goodness can be an end in itself, just like happiness... and I wrote my first ever LW article about it here. It has a lot of the same ideas as yours but isn't so well focused.

[-][anonymous]9y50

Were you trying to write about psychology or AI? For psychology, you're quite correct, but most naive folk-theories don't make this mistake anyway (because they mistake map for territory, of course). For AI, you're glossing over the yet-to-be-done work of actually designing algorithms that deliberately enforce a correspondence between map and territory with respect to their goals.

Thanks for this. One quibble: I think it's misleading to say actions are rated according to what they achieve. Consider dancing. What dancing achieves, first and foremost, is dancing. More generally, some features of actions may be valued or disvalued as terminal values, independent of any (non-tautological) consequences. It's not hard to imagine patterns of behavior for which this is the most elegant explanation.

Um no. The specif sequence of muscle contractions is the action, and the thing they try to achieve is beautiful patterns of motion with certain kinds of rhythm and elegance, and/or/typically the perception of such in an observer.

I don't disagree that humans can do actions that only benefits others, and that altruism exists. I think there is a better theory than both pleasure-maximizing and "humans are intrinsically nice to others", and that is Evolution. Also, Evolution can be understood as "gene-spread chance maximizing", so I think humans are still better modelled as internal counter maximizer.

Donating to charity can be explained by Signaling, it lets others know that you have an excess of money. Pure altruism alone cannot explain donations because we donate more when we’re being watched. (More detailed explanation of charity can be found in The Elephant in the Brain Chapter 12: Charity.)

Then a mother jumped in front of a car to save her child.

I think that this is a prototypical example in two ways:

1) Descriptive ethics. Describing what people think is right/good/moral. (Actually, I don't think that this is strictly true, but whatever.)

2) Describing how people actually act (cultural anthropology?).

Your main point in this article seems to be related to 2). "People don't only try to seek pleasure."

a) Was that your main point?

b) Do regular people debate this (I'm pretty sure they do, but I'm not positive)? Philosophers? Rationalists? My impression is that rationalists don't debate this, and so I'm not sure who this post is targeting (you did say it's a repost from your blog, so maybe there is indeed a different target audience?).

c) Does this have any implications for what you "should" do? My working conclusion is that "should requires an axiom". That terminal values are arbitrary, and you could only say that you "should" do something to the extent that it leads to a chosen terminal value (or blend of terminal values).


(If this post is only about 2), then the following is tangential, and perhaps isn't the right place for this. But anyway...)

I really don't find "terminal values are arbitrary" to be a comfortable conclusion. I'm not exactly sure why I find it to be so uncomfortable.

  • Like most people, I seek "purpose". Some sort of absolute feeling that the goal I'm pursuing is "the right goal". In other words, I have a desire to find and pursue the "right goal", even though I think/understand that terminal values are arbitrary.
  • Intellectually/logically, I think I have a good understanding of the ideas of consequentialism. I think I understand the reasoning behind the idea that terminal values are arbitrary. But maybe there are holes in my understanding that are causing the discomfort. Or better yet, maybe my conclusion that terminal values are arbitrary is wrong.
  • I'm not sure what my terminal values are. But there's a very large part of me that only cares about my own pleasure (fortunately, acting altruistically brings me a good amount of pleasure). But the implications of that are pretty scary. For example, someone who legitimately only cares about his own pleasure would chose to kill everyone in the world if it meant that he'd survive and be happy. Logically, I don't see a problem here.
    • Consider the question of "What will lead to your terminal goal?". In the hypothetical, this is already answered for us.
    • Consider the question of "Well, is that a good terminal goal?". Logically, it seems to me that there's no such thing as a "good terminal goal".
    • But emotionally, I feel like there's a huge problem here. Unfortunately, when I examine this feeling, I find that there isn't good reason behind it. If you're trying to achieve your terminal goals, then a good reason for an emotion is because it helps you achieve your terminal goals. If the emotion isn't helping you achieve your goals, then I'd say that there isn't good reason behind it. It seems to me that these emotions aren't helping to achieve the terminal goal of personal happiness, and that these emotions are the result of an imperfect brain.
    • I suppose you could argue that those sorts of emotions help you function in society and be happy. That the consequences of such feelings of guilt are decreased likelihood of being shunned and an increased likelihood of being accepted, both of which make it more likely that you survive and live happily. But that doesn't seem to be the case with my feeling guilty for (possibly/hypothetically) being selfish. If I didn't feel this guilt, I highly doubt anyone would know.
    • I suppose that you could subsequently argue that our brains aren't designed for this. That we don't get to be altruistic in the vast majority of circumstances (where other peoples' utility is linked with your own short/long-run utility), but simultaneously be able to feel no guilt for choosing your own life over everyone else's. "But why not?! Doing so seems to be the strategy that would maximize your own utility."

One point I notice here is that "value" is in the map, not the territory. The whole notion that agents have values is a way of modeling real-world things; it isn't a principle on which the world rests.

For instance, people sometimes make big, definitive-sounding claims about "human terminal values", but these are basically (pretty rough) attempts to create a map that might be useful for predicting human behavior. Insofar as that map isn't predictive, it's worth amending or discarding.