On What Selves Are - CEV sequence

The CEV Sequence Summary: The CEV sequence consists of three posts tackling important aspects of Coherent Extrapolated Volition (CEV). It covers conceptual, practical and computational problems of CEV's current form. On What Selves Are draws on analytic philosophy methods in order to clarify the concept of Self, which is necessary in order to understand whose volition is going to be extrapolated by a machine that implements the CEV procedure. Troubles with CEV part1 and Troubles with CEV part2 on the other hand describe several issues that will be faced by the CEV project if it is actually going to be implemented. Those issues are not of conceptual nature. Many of the objections shown come from scattered discussions found on the web. Finally, six alternatives to CEV are considered.


On What Selves Are Summary: We start by concurring on a Hofstadterian metaphysical view of Selves. We suggest two ways in which to divide the concept of Self, admitting Selves to be mongrel concepts, and cluster concepts. We then proceed to the identification of Selves, in particular, a proposed new method for a machine to identify Self-like entities. In the spirit of Dennettian philosophy, we then ask what we demand of Selves, to better grasp what they are. In conclusion, we present some views of Selves that are worth wanting, and claim that only considering Selves in their full complexity we can truly analyze them.

Note: A draft of the first half of On What Selves Are was published in discussion here, those who read it may want to skip straight to section "Organisms, Superorganisms and Selves".


On What Selves Are


Background: Symbols Coalesce to Form Selves


Some of what is taken for granted in this text is vividly subsumed by pg 204 and 289-290 of Hofstadter's “I Am a Strange Loop”(2007). To those who are still in the struggle relating to monism, dualism, qualia, Mary the neuroscientist, epiphenomenons and ineffable qualities, it is worth it to read through his passage to understand the background metaphysical view of the universe from which it is derived. To those on the other hand who are good willed reductionists of the non-greedy, no-skyhook, no 'design only from Above' kind may skip past this section:

[What makes and “I” come seemingly out of nowhere] is ironically, an inability - namely our [...] inability to see, feel, or sense in any way the constant frenetic, churning and roiling of micro -stuff, all the unfelt bubbling and boiling that underlies our thinking. This, our innate blindness to the world of the tiny, forces us to hallucinate a profound schism between the goal-lacking material world of balls and sticks and sounds and lights, on the one hand, and a goal-pervaded abstract world of hopes and beliefs and joys and fears, on the other, in which radically different sorts of causality seem to reign. [...]

[Your] “I” was not an a priori well-defined thing that was predestined to jump, full-fledged and sharp, in to some just-created empty physical vessel at some particular instant. Nor did your “I” suddenly spring into existence, wholly unanticipated but in full bloom. Rather, your “I” was the slowly emerging outcome of a million unpredictable events that befell a particular body and the brain housed in it. Your “I” is the self-reinforcing structure that gradually came to exist not only in that brain, but thanks to that brain. It couldn't have come to exist in this brain, because this brain went through different experiences that led to a different human being.”


We will take for granted that this is the metaphysically correct approach to thinking about mental entities. What will be discussed lies more in the domain of conceptual usage, word meaning, psychological conceptions, symbolic extension, explicit linguistic definition, and less on trying to find underlying substrates or metaphysical properties of Selves.


Selves and Persons Are Similar

On the eighth move of your weekly chess game you do what feels same as always: Reflect for a few seconds on the many layers of structure underlying the current game-state, specially regarding changes from your opponent’s last move. It seems reasonable to take his pawn with your bishop. After moving you look at him and see the sequence of expressions: doubt (Why did he do that?), distrust (He must be seeing something I'm not), inquiry (Let me double check this), schadenfreude (No, he actually failed) and finally joy (Piece of cake, I’ll win). He takes your bishop with a knight that from your perspective came out of nowhere. Still stunned, you resign. It is the second time in a row you lose the game due to a simple mistake. The excuse bursts naturally out of your mouth: “I’m not myself today”


The functional role (with plausible evolutionary reasons) of this use of the concept of Self is easy to unscramble:

1) Do not hold your model of me as responsible for these mistakes

2) Either (a) I sense something strange about the inner machinery of my mind, the algorithm feels different from the inside. Or (b) at least my now visible mistakes are reliable evidence of a difference which I detected in hindsight.

3) If there is a person watching this game, notice how my signaling and my friend’s not contesting it is reliable evidence I normally play chess better than this

A few minutes later, you see your friend yelling historically at someone in the phone, you explain to the girl who was watching: “He is not that kind of person.”

Here we have a situation where the analogous of 1 and 3 work, but there is no way for you to tell what the algorithm feels from the inside. You still know in hindsight that your friend doesn’t usually yell like that. Though 1, 2, and 3 still hold, 2(a) is not the case anymore.

I suggest the property of 2(a) that blocks interchangeability of the concepts of Self and Person is “having first person epistemic information about X”. Selves have that, people don’t. We use the term ‘person’ when we want to talk only about the epistemically intersubjective properties of someone. Self is reserved for a person’s perspective of herself, including, for instance, indexical facts.

Other than that, Self and Person seem to be interchangeable concepts. This generalization is useful because that means most of the problem of personhood and selfhood can be collapsed into one thing.

Unfortunately, the Self/Person intersection is a concept that is itself a mongrel concept, so it has again to be split apart.


Mongrel and Cluster Concepts

When a concept seems to defy easy explanability, there are two potential explanatory approaches. The first would be to assume that the disparate uses of the term ‘Self’’ in ordinary language and science can be captured by a unique, all-encompassing notion of Self. The second is to assume that different uses of ‘Self’’ reveal a plurality of notions of selfhood, each in need of a separate account. I will endorse this second assumption: Self is a mongrelconcept in need of disambiguation. (to strengthen the analogy power of thinking about mongrels, it may help to know that Information, Consciousness and Health are thought to be mongrel concepts as well).

Without using specific tags for the time being, let us assume that there will be 4 kinds of Self, 1,2,3, and 4. To say that Self is a concept that sometimes maps into 1, sometimes into 3 and so on is not to exhaustively frame the concept usage. That is because 1 and 2 themselves may be cluster concepts.

The cluster concept shape is one of the most common shapes of concepts in our mental vocabulary. Concepts are associational structures. Most of the time, instead of drawing a clear line around a set in the world inside of which all X fits, and outside of which none does, concepts present a cluster like structure with nearly all core area members belonging and nearly none farther from the core. Not all of their typical features are logically necessary. The recognition of features produces an activation, the strength of which depends not only on the degree to which the feature is present but a weighting factor. When the sum of the activations crosses a threshold, the concept becomes active and the stimulus is said to belong to that category.

Selves are mongrel concepts composed of different conceptual intuitions, each of which is itself a cluster concept, thus Selves are part of the most elusive, abstract, high-level entities entertained by minds. Whereas this may be aesthetically pleasant, presenting us as considerably complex entities, it is also a great ethical burden, for it leaves the domain of ethics, highly dependent on the concepts of selfhood and personhood, with a scattered slippery ground-level notion from which to create the building blocks of ethical theories.


Several analogies have been used to convey the concept of cluster concept, these convey images of star clusters, neural networks lighting up, and sets of properties with a majority vote. A particularly well known analogy used by Wittgenstein is the game analogy, in which language games determine prescribe normative meanings which constrict a word’s meaning, without determining a clear cut case. Wittgenstein defended that there was no clear set of necessary conditions that determine what a game is. Bernard Suits came up with a refutation of that claim, stating that there is such a definition (modified from “What is a game” 1967, Philosophy of Science Vol. 34, No. 2 [Jun., 1967], pp. 148-156):

"To play a game is to engage in activity designed to bring about a specific state of affairs, using only means permitted by specific rules, where the means permitted by the rules are more limited in scope than they would be in the absence of such rules, and where the sole reason for accepting the rules is to make possible such activity."

Can we hope for a similar soon to be found understanding of Self? Let us invoke:

The Hidden Variable Hypothesis: There is a core essence which determines the class of Selves from non-Selves, it is just not yet within our current state-of-knowledge reach.

While desirable, there are various resons to be skeptical of The Hidden Variable Hyphotesis: (1) Any plausible candidate core would have to be able to disentangle Selves from Organisms in general, Superorganisms (i.e. insect societies) and institutions (2) We clearly entertain different models of what Selves are for different purposes, as shown below in Section Varieties of Self-Systems Worth Having. (3) Design consideration: Being evolved structures which encompass several resources of a recently evolved mind, that came to being through a complex dual-inheritance evolution of several hundred thousand replicators belonging to two kinds (genes and memes), Selves are among the most complex structures known and thus unlikely to possess a core essence, due to causal design considerations independent of how untractable it would be to detect and describe this essence.

From now on then, I will be assuming as common ground that Selves are mongrel concepts, comprised of some yet undiscussed number of cluster concepts.


Organisms, Superorganisms, and Selves

To refine our notions of Selves we ought to be able to distinguish Selves from Organisms, that is, biological coalitions of cells with adaptation-execution functions, and from Superorganisms, biological coalitions of individuals with a group-level behavior that fits the adaptation-executer characterization.

Organisms, Superorganisms and Selves are composed of smaller parts that instantiate simple algorithmic behavior which, in large numbers, brings about complex behavior. One fundamental difference though is that Selves are grammatical. While ants use variegated hidrocarbons to signal things to other ants of the same Superorganism, and cells communicate through potassium and sodium exchanges, we use phonemes composing words composing sentences, we have thoughs which compose our deliberations. Selves are thus different in that we exhibit grammaticality and semantic abstraction capacities unseen in the organismic and superorganismic levels of organization.


Persons, the Evidence for Other Selfs

How could we teach a machine to identify people? This is the underlying question that has led me to write this text, and it is a question of utter importance if we are to believe the current cutting edge guesses about when is artificial intelligence going to surpass human intelligence. We have to make sure that what passes the test is not an ant family, nor is it a panda. Luckily, this test has been established already by Alan Turing, the infamous Turing test. While the Turing test was originally thought to establish when a machine has achieved human intelligence, there is no reason to deny it a secondary purpose once a machine has already achieved human intelligence. Once such machine exists, it could use its own human-like intelligence to test other entities and classify them as human-like or not human-like. This would give us a non-personhood indicator, as demanded by Yudkowsky.

This may appear to be a deus ex machina in that I am assuming that the turing test performed by this machine will be able to grasp the essence of humanity, and capture it. Not so. What we should expect of Selves and people is not an essence, as prescribed by The Hidden Variable Hipothesis. We should expect a mongrel built of clusters of identifyable data, with its shape not well delineated on the borders, and we should expect more than a single simple structure. Exactly the kind of thing that is able to pass a turing test, which, itself, is not established with absolute precision, but relies in our linguistic, empathic, commonsensical and conversational skills to be performed.


Selves as Utility Increaser Unnatural Clusters

Thusfar we have considered Selves as non-essence-bearing, sets of clusters of linguistic, grammatical entities, but this is missing one important aspect of selfhood, intentionality. Language is mostly intentional, that is, about things that are not themselves, and brains are mostly intentional, that is, integrated into the world in such a way that a convoluted mapping happens between its internal content and the worlds external facts.

The particularity that makes Selves different from Superorganisms and Organisms at this level is that Selves are utility increasing, they have goals, desires, ideals, and thrive to achieve them. Selves act as functions, by rearranging the physical world of which they are a part of from low-utility local configuration to high utility local configuration.These goals, desires and ideals change from time to time without change of Self. This is a naturally occuring process in many cluster concepts. To be a cluster concept includes being the kind of concept that remains same despite change, and possibly dramatic change, as long as this change is “softened” by happening one bit at a time. A Self's goals may shift strongly in ten years, but at any particular time, the goals, desires, grammaticality and intentionality are the defining features of that Self, of that person.

What do We Demand of Selves

A per our chess example above, we demand stability from Selves. We also demand honor, respectability, resilience, accountability. When I say you owe me that money, it implicitly implies that you are the same person as the one to whom I lent that money. When I invite you for a duel, I expect to kill the same you who is listening to the invitation, even if a few days later. Part of our models of people are evolved from the need for accountability. An evolutionary guess: We incorporate a notion of sameness over time for a person because this holds the person accountable. Reciprocal altruism, a form of altruism belonging to many complex social species of animals relies on the assumption that one will pay back, and paying back is only possible if the original giver is still there to receive his payment.

Has our notion of Self followed our demands for accountability or did it happen the other way around? This is a chicken and egg sort of question. Just like eggs obviously came first because dinosaurs layed eggs, accountability came first because many other animals exhibit reciprocal altruism. Yet, just as we can reshape the chicken and egg question in such way that both seem to be determining each other, we can also reshape our accountability question in such way: Has our model of selfhood reinforced our tendencies to demand accountability of others or has our need for accountability created a demand for stronger, stable Selves? Probably both have happened, they are self reinforcing in both directions, in psychological jargon, they perform transactional reinforcement.

Besides sheer accountability, our notions of honor and respect also rely on sameness over time, they are just a bit more convoluted and sophisticated, but this topic is tangent to our interests here.


Varieties of Self-Systems Worth Having

Not all animals have a notion of Self (From Varieties of Self Systems Worth Having):

“According to Povinelli and colleagues, one possibility is that a sense of the embodiment of Self—as opposed to mere proprioception—a sense of ownership of one's own body, may have evolved in some primates as a consequence of arboreal locomotion (Barth et al., 2004). Orangutans need subtle appreciation of their own body position, posture, and weight to brachiate and support themselves on flimsy branches. It is not as though they can navigate by trial and error, since a fall will likely prove fatal. The behavior and the required capacity are less developed in chimpanzees and even less in gorillas. This would suggest a complicated history for this kind of Self-representation, having been lost by the primate branch that led to chimpanzees, and developed in the hominine lineage.

“We speak of ‘‘Self-systems worth having’’ to reflect four characteristics of the recent literature on the Self. First, most models imply that the Self is supported by a federation of specialized processes rather than a single integrated cognitive function. Second, most researchers think that the phenomenology of selfhood results from the aggregate of the functions performed by these different information-processing devices. Third, most of the information-processing is construed as sub-personal, hence inaccessible to conscious inspection. Fourth, we talk about systems worth having to emphasize that there is nothing inevitable about the functioning of any of these systems.”

“Neisser made conceptual and empirical distinctions between five domains of Self-knowledge, namely: an ecological Self, a sense of ones own location in and distinctness from the environment; an interpersonal Self, a sense of oneself as a locus of emotion and social interaction; an extended Self, a sense of oneself as an individual existing over time; a private Self, a sense of oneself as the subject of introspectively accessible experience; and a conceptual Self, comprising all those representations that constitute a Self-image, including representations of one's social role and personal autobiography (Neisser, 1988)“

The ecological Self is our notion of our location, both as a whole (hippocampus) and proprioception, that is, the relative position and movement of our body parts (frontal lobe). The interpersonal Self is salient in our blushing and teasing, laughing and crying. The extended Self is widely discussed in the philosophical literature, most famously by Derek Parfit in Reasons and Persons;, it is that which remains when time elapses, the sense of constancy and of sameness that one feels. The private Self talks inside our heads all the time, it is the nagging inner voice that remains active when we introspect and look inwards. The conceptual Self is an honorable, respectable individual, with all the special abilities we know ourselves to have, from lawful to honorable, from noble to the example above: Don't hold me responsible for act X, claims the conceptual Self, I'm not myself today.

Neisser's analysis is a fine grained one, distinct from a coarse grained one like Gallaghers:

Gallagher distinguishes broadly between the ‘‘minimal’’ and the ‘‘narrative’’ Self. The former supplies the ecological sense of bodily ownership and agency associated with active behavior, while the latter supports the Self-image that associates our identity with various episodes (Gallagher, 2000).”

The analysis of selfhood, or of personhood can be done in other ways too, after all, we are dealing with a strange construction. We are trying to carve reality at its joints, but the joints of mongrel cluster concepts are a fuzzy structure, and we are given many choices on how to carve them, any analysis of Selves is going to look at least as complex as this one, and we should learn to abandon physics envy, stop thinking that Selves come in one sentence, and learn to deal with the full complexities involved.






http://the-mouse-trap.com/2009/11/01/five-kinds-of-selfself-knowledge/ (comes from Neisser 1988)

http://www.scholarpedia.org/article/Self_models (Kept by Thomas Metzinger)


Conscious Cogn. 2005 Dec;14(4):647-60. Epub 2005 Oct 27.


Varieties of self-systems worth having. Boyer P, Robbins P, Jack AI.


16 comments, sorted by
magical algorithm
Highlighting new comments since Today at 9:14 PM
Select new highlight date

yelling historically at someone in the phone



I'm unable to post to discussion. Hi, this is unrelated to the above comment. But since I can't post to discussion, I can't post in discussion that I can't post in discussion. So I'm doing it here.
I'm not sure what happened, but if I go to the submission page and click submit, a message in red appears right of the 'submit' button saying "submitting" then it stops and nothing happens.
Can anyone help me solve this?

But since I can't post to discussion, I can't post in discussion that I can't post in discussion. So I'm doing it here.

Note: "here" is located in Discussion, so you just did post in Discussion that you can't post in Discussion.

Ok, I cannot post big posts to discussion. I can post comments to posts in discussion. Nice catch.

Read the about section: http://lesswrong.com/about/ Specifically, you need 2 karma to post a top-level posts in discussion. Until then, make relevant comments or post your thoughts in the open thread. Read the rest of the about section, too.

I read the whole of it and it doesn't apply to my case.

I have 198 Karma. I have sent Main posts before, and several discussion posts. One of them I sent yesterday! Is there a rule like "you are not allowed to post twice in less than 48 hours?"

I have found this, while somewhat better formatted, just as impenetrable as your previous attempts. The "summary" was not very helpful, either. I wish someone who understands your points could summarize what you mean for the simpletons like me.

Sure, why not?

Understanding what a self (and a volition) is matters; CEV relies on extrapolating the volition of selves, and therefore on understanding selves/volitions. But there's no reason to think that there's a unique reduction of "self"; indeed, there's almost certainly not (Diego gives various examples). Also, there's various other things constraining our intuitive definition, like that they be utility maximizing.

One way out for CEV is that the Turing Test is reliable means of identifying a subset of selves; once we can identify an AGI as a self via the Turing Test, it can then itself use the Turing Test to identify (some) other selves.

I enjoyed this quite a bit. I find myself agreeing that any useful conception of personhood will probably be a complicated, fuzzy thing. I also agree that this fuzziness isn't a reason to not attempt to clarify the matter.

My main hesitation comes from the claim that the primary salient distinction between an "organism" and a "self" is, basically, language. How do you know that ant colonies aren't processing abstract reflective concepts via complex chemical signaling?

Also, it seems like any human under six years old would stand a chance of not being classified as human by your proposed scheme. An important part of the Turing test is that the humans who are being tricked into believing that the AI is a person are not so skeptical that they will classify a significant fraction of actual people as being AIs. In other words, I think the Turing test is a terrible personhood test.

Yes, there plenty of people who don't pass the Turing test — e.g., those who don't speak the right language. For this reason, the Turing test, with a human or machine judge, is not a good nonperson predicate, contrary to the OP. But it can be taken as a person predicate. That is, if something passes a strict enough Turing test, it's reasonable to regard that thing as a person.

But then this defeats the whole purpose: if we don't want the AI to be a person, then it won't be able to pass the Turing test, and then it is unclear whether it would be able to use the test to tell people from non-people.

I appreciate the degree of care you take in your conceptual analysis, Diego.

You're pinning down a certain concept of "self", and I'm not quite sure what it's going to be used for:

  • If "selves" are to be the things alive today whose volition will be extrapolated, then as long as the set of selves contains me or people like me, I'll be happy.

  • If "selves" is supposed to refer to the set of all things whose volition ought to be extrapolated, then we ought to be careful to exclude Babyeaters and the like.

I would recommend adding a small paragraph summarizing the properties we would like a self would have according to your propostal (even though it is a cluster concept).

Diego, when you say: "We also demand honor, respectability, resilience, accountability." By honor, respectability and resilience you mean them just as desirable moral properties, right ? (i.e. not properties one must have to be considered a self)

I'm saying we measure how much self someone has with basis on various heuristics, amongst them are social demands such as respectability, resilience and honor. It's more about what is in the eye of the beholder.