Warning: The argument in this post implies bad things about reality, and is likely to be bad for your mental health if it convinces you. If you're not sure if you can handle that, consider skipping this post for now and coming back later.

Introduction

At my best, when I used to write, I would write characters first by thinking about their personality, beliefs, thoughts, emotions, and situation. Then I would ask myself "what does this person do?", and let mysterious processes inside my brain automatically come up with the behavior of that character.

So this is how I define a mental model of a person: the collection of processes and information inside a brain that generates the behavior of some character.

My argument in this post is that there do exist mental models of people that are sufficiently detailed to qualify as conscious moral patients; I also argue that this is common enough that authors good at characterization probably frequently create and destroy such people; finally, I argue that this is a bad thing.

Part 1: On Data

I observe that most of my conscious experience corresponds quite neatly to things we would call data in a computer. Sight and hearing are the most obvious ones, corresponding to computer image and audio files. But it's not too hard to imagine that with advancing robotics and AI, it might be possible to also encode analogues to tactile sensation, smells, or even feelings on a computer.

In fact, given that I seem to be able to reason about all my conscious experience, and that reasoning is a form of computation, then it seems that all my conscious experience must necessarily correspond to some form of data in my brain. If there was no data corresponding to the experience, then I couldn't reason about it.

So we have one mysterious thing called "conscious experience" which corresponds to data. Yet we don't know much about the former, not even if it has any effect on the world beyond the data it corresponds to. So wouldn't it be simpler if we got rid of the distinction, and considered them one and the same? A hypothesis like this, where conscious experience is the data it corresponds to, would have an edge over other possibilities, due to being simpler.

This theory is not yet complete, however, because part of what makes data what it is, is how it is used. We say that the pixel #FF0000 represents red because when a usual monitor displays it, it emits light of that wavelength. If instead monitors displayed #FF0000 as green, then we would call that pixel data green.

Similarly, if conscious experience is data, then what makes one conscious experience different from another is how it is used computationally by the brain. The difference between my experience of red and green would be the sum of all the differences between how my brain computes with them. Seeing a stop sign and a tomato activates certain neurons in my brain that output the word "red", in contrast with seeing grass, which triggers certain neurons to output the word "green", among a thousand other differences.

Now, I don't know for sure that conscious experience is just data/computation, and my argument does not completely rely on this being true. However, these ideas provide an important framework for the rest of this post.

Part 2: Mental Models Of People

In this section, I will be using my own experience as an example to analyze what data is present inside mental models and what computations are involved, in preparation for a later section where I will analyze what might be missing.

In the past, when writing a character, I would keep track of their emotional state. I could give short words like "happy" or "sad" to describe it, but internally it wasn't on the level of words and was more nuanced than that. So I feel safe to claim that there is an analogue to emotions and to add it to the list of things present inside mental models of people:

Present inside my mental models of people:

  • Analogue to emotions.

I also kept track of their beliefs: who did they think they were, what situation did they believe themselves to be in, and so on.

Present inside my mental models of people:

  • Analogue to emotions.
  • Analogue to beliefs.

Sensory data such as sight, hearing, and so on, was much more limited. I would usually think of scenes in more abstract terms than direct sensory input, and only imagine a character's senses in more detail during trickier moments of characterization. For example, if a character is eating during a scene, it is usually enough just to know the fact "they are eating", but if the scene involves them finding out that what they are eating is actually something gross, then I would imagine the character's sensory experience in more detail to write it better.

Present inside my mental models of people:

  • Analogue to emotions.
  • Analogue to beliefs.
  • Analogue to sensory data, but it is very limited and often replaced by more abstract knowledge of the situation.

In a scene, given the emotions, beliefs, and abstract "sensory data" of a character, I let my brain somehow figure out what the character does and what happens, and then I update my mental model of the character. So I think it's fair to say that there was some analogue to the computational aspect of conscious experience, with a few caveats:

  • A lot of time was skipped over. When going from one scene to the next, I did not usually imagine much of what happens in between the scenes. I would simply update the character's beliefs to include something like "time has passed and now you're in this new situation", with other appropriate adjustments, for example if the time skipped over would have been stressful for them I might make them more stressed at the beginning of the next scene.

  • If a scene was not going how I would like, I might go back to the beginning of the scene and tweak the initial conditions so that things go the way I want, or maybe tweak the character.

Here is an updated list:

Present inside my mental models of people:

  • Analogue to emotions.
  • Analogue to beliefs.
  • Analogue to sensory data, but it is very limited and often replaced by more abstract knowledge of the situation.
  • Analogue to computation/intelligence, but it is discontinuous and involves many time skips.

And that's enough to move on to the next section, where I analyze what's missing from my mental models.

Part 3: Missing Pieces

If humans are conscious but mental models are not, then there must be something missing from the analogues mentioned previously, some statement which is true about all conscious beings and true about humans but false about mental models. This section examines many potential candidates for what might be missing.

Sensory Data

I observed earlier that sensory data is limited and usually replaced by more abstract knowledge of the situation. For example, if I am writing a scene where a character is eating, I usually do not imagine the character seeing the food nor the feeling of their utenstil in their hand. Instead, I usually just add the belief that they are eating to their set of beliefs.

Obviously, this is different from how humans work. An important thing to observe, however, is that no individual type of sense data can be a requirement for conscious experience. After all, blind humans are still people, and so are the deaf, those with no sense of smell, or no sense of touch. Even if all of these senses were missing from a human at the same time, and instead the human was fed abstract information about what's going on directly via some neural interface, it would seem safe to assume that they would still be people.

Therefore it seems that missing sensory data cannot on its own disqualify mental models from being people.

Computation

The first difference of note is the previously mentioned discontinuity and rewinding. To refresh our memory, this was the fact that when writing, I would often skip from one scene to another without imagining what happens in between the two scenes, and that I would often rewind a scene, change some conditions, and do it again, when things did not go how I wanted.

This difference is also easy to resolve by analogy to humans. If a human was simulated by a computer and occasionally the computer skipped forward in time without computing what happened in between, and just inserted the belief that time had passed into the human, then we would still consider that human to be a person. The same applies to if the computer sometimes rewinded to a previous state, modified some things, and started the simulation again. Therefore this difference cannot disqualify a mental model from being a person.

Another difference in many cases might be realism. A mental model might behave in a way that no human ever would. However, behaving like a human doesn't seem like something that ought to be a requirement for consciousness, since for example we might expect that aliens could be conscious without behaving like a human. So this difference can also be ruled out.

Something which is not a difference is intelligence. I can and have had long and coherent conversations with mental models, in ways that are currently not possible with anything else except other humans.

Emotions And Beliefs

I honestly don't have much to say about emotions and beliefs that hasn't already been said. I can come up with various small differences, but none that stand up to variants of the arguments given in the previous two subsections. I encourage readers to try to figure out what exactly might disqualify the emotion/belief-analogues from being "real" emotions/beliefs, because if there is such a difference I would really like to know it.

Not The Same Person

Another possibility is that a mental model might be the same person as their author: that a mental model and the human it's contained in are just one person rather than two even though both individually qualify as a person.

This might be true in some cases. For example, occasionally I imagine myself for a short bit in a "what if?" scenario, where the me in the scenario is me with some minor edits to beliefs. I am not too worried that this slightly altered mental model of myself is a different person from me, though I'm still careful about it.

However, many characters are very different from their authors. They have radically different emotions, beliefs, and desires. When that is the case, I don't think it makes sense to say that they are the same person. It is useful to bring up an analogy to humans again: we would not consider a human being simulated by an AI to be the AI, so I don't think we should consider a mental model simulated by a person to be that person.

Part 4: Main Argument Conclusion

To recap:

  • Part 1 explained that there is a correspondence from conscious experience to data/computation.

  • Part 2 examined what kinds of data and computation can be found in mental models of people.

  • Part 3 tried and failed to find any "missing piece" that would disqualify a sufficiently detailed mental model from being a person of its own.

To this I will add the following observation:

It is possible to have long, intelligent, and coherent conversations with mental models in a way that is not possible with anything else at the moment except other humans. If our AIs were on this level it would trigger all sorts of ethical alarm bells.

And for this reason I find it alarming that I am completely unable to find any missing piece or strong argument for why I'm wrong and mental models cannot be people.

While it's true that in the end I cannot prove that they are people, I think that my arguments and observations are strong enough that it's fair for me to shift some of the burden of proof onto the other side now. I would really like to be proven wrong about this.

Part 5: Scope

I can't know precisely how common it is for mental models to qualify as people. However, from extensively poking and prodding my own mental models, I feel pretty confident that when I put a little effort into them they qualify as people. I don't think I'm very special, either, so I suspect it's common for writers.

Something important to keep in mind is that mental models can be inconsistent. On some days it might feel really easy to model and write a character, and other days you might have writer's block and totally fail to model the character. Pointing to a moment where a mental model was not at all person-like is not enough to claim that it never was a person. You need to observe the best moments, too.

Part 6: Ethics

Assuming that it is true that sufficiently detailed mental models of people are moral patients, what does that imply ethically? Here are a few things.

  • When a mental model stops being computed forever, that is death. To create a mental model and then end it is therefore a form of murder and should be avoided. The easiest way to avoid it is to not create such mental models in the first place.

  • Writing fiction using a character that qualifies as a person will usually involve a lot of lying to that character. For example, lying to make them believe that they actually are in a fictional world, that they are X years old, that they have Y job, etc. This seems unethical to me and should be avoided.

In general I just don't create mental models of people anymore, and would recommend that others don't either.

Addendum

(This section was added as an edit)

Suppose you were given an algorithm that simulates a human at the molecular level, and that you used a mind-enhancing drug that greatly increased your memory to symbolically evaluate every step of that algorithm consciously (similarly to how you might do mental arithmetic). This simulated human would have a conscious experience, and this conscious experience would be embedded inside your own conscious experience, and yet you would not feel anything that the simulated human feels.

My point with this is that it's possible for there to be conscious experience inside your conscious mind which you are not aware of and don't experience yourself. So even when a mental model of a person has an intense suffering-analogue, you might only be feeling slightly bad, or you might not even feel anything at all. So I would judge the intensity of the suffering of a mental model of mine based on how it affects the mental model, and never based on how it makes me feel.

10

New Comment
55 comments, sorted by Click to highlight new comments since: Today at 1:46 PM

I think this fails to address the actual hard problem of "what is a person" - why do we care if an imagined thing inside our mind is a person or not?  What would be different if it were or weren't, and likewise what would be different if it were just part of our person-hood?

Toward the end you use the phrase "moral patient", which implies ... a whole lot of detail in terms of duties and rights, and/or calculation of goodness/badness tradeoffs for decisions.  It's not clear why I'd apply that to imagined persons any more than I do to rocks.  

The reason I care if something is a person or not is that "caring about people" is part of my values. I feel pretty secure in taking for granted that my readers also share that value, because it's a pretty common one and if they don't then there's nothing to argue about since we just have incompatible utility functions.

What would be different if it were or weren’t, and likewise what would be different if it were just part of our person-hood?

One difference that I would expect in a world where they weren't people is that there would be some feature you could point to in humans which cannot be found in mental models of people, and for which there is a principled reason to say "clearly, anything missing that feature is not a person".

The reason I care if something is a person or not is that "caring about people" is part of my values.

If one is acting in the world, I would say one's sense of what a person is has to intimately connected with value of "caring about people". My caring about people is connecting to my experience of people - there are people I never met I care about in the abstract but that's from extrapolating my immediate experience of people. 

I would expect in a world where they weren't people is that there would be some feature you could point to in humans which cannot be found in mental models of people

It seems like an easy criteria would be "exist entirely independently from me". My mental models of just about everything, including people, are sketchy, feel like me "doing something", etc. I can't effortlessly have a conversation with any mental model I have of a person, for example. Oddly, enough I can have a conversation with another as one of my mental models or internals characters (I'm a frequency DnD GM and I have NPCs I often like playing). Mental models and characters seem more like add-ons to my ordinary consciousness. 

I elaborated on this a little elsewhere, but the feature I would point to would be "ability to have independent subjective experiences". A chicken has its own brain and can likely have a separate experience of life which I don't share, and so although I wouldn't call it a person, I'd call it a being which I ought to care about and do what I can to see that it doesn't suffer. By contrast, if I imagine a character, and what that character feels or thinks or sees or hears, I am the one experiencing that character's (imagined) sensorium and thoughts - and for a time, my consciousness of some of my own sense-inputs and ability to think about other things is taken up by the simulation and unavailable for being consciously aware of what's going on around me. Because my brain lacks duplicates of certain features, in order to do this imagining, I have to pause/repurpose certain mental processes that were ongoing when I began imagining. The subjective experience of "being a character" is my subjective experience, not a separate set of experiences/separate consciousness that runs alongside mine the way a chicken's consciousness would run alongside mine if one was nearby. Metaphorically, I enter into the character's mindstate, rather than having two mindstates running in parallel.

Two sets of simultaneous subjective experiences: Two people/beings of potential moral importance. One set of subjective experiences: One person/being of potential moral importance. In the latter case, the experience of entering into the imagined mindstate of a character is just another experience that a person is having, not the creation of a second person.

The reason I reject all the arguments of the form "mental models are embedded inside another person, therefore they are that person" is that this argument is too strong. If a conscious AI was simulating you directly inside its main process, I think you would still qualify as a person of your own, even though the AI's conscious experience would contain all your experiences in much the same way that your experience contains all the experiences of your character.

I also added an addendum to the end of the post which explains why I don't think it's safe to assume that you feel everything your character does the same way they do.

To be clear, I do not endorse the argument that mental models embedded in another person are necessarily that person. It makes sense that a sufficiently intelligent person with the right neural hardware would be able to simulate another person in sufficient detail that that simulated person should count, morally.

I appreciate your addendum, as well, and acknowledge that yes, given a situation like that it would be possible for a conscious entity which we should treat as a person to exist in the mind of another conscious entity we should treat as a person, without the former's conscious experience being accessible to the latter.

What I'm trying to express (mostly in other comments) is that, given the particular neural architecture I think I have, I'm pretty sure that the process of simulating a character requires use of scarce resources such that I can only do it by being that character (feeling what it feels, seeing in my mind's eye what it sees, etc.), not run the character in some separate thread. Some testable predictions: If I could run two separate consciousnesses simultaneously in my brain (me plus one other, call this person B) and then have a conversation with B, I would expect the experience of interacting with B to be more like the experience of interacting with other people, in specific ways that you haven't mentioned in your posts. Examples: I would expect B to misunderstand me occasionally, to mis-hear what I was saying and need me to repeat, to become distracted by its own thoughts, to occasionally actively resist interacting with me. Whereas the experience I have is consistent with the idea that in order to simulate a character, I have to be that character temporarily - I feel what they feel, think what they think, see what they see, their conscious experience is my conscious experience, etc.  - and when I'm not being them, they aren't being. In that sense, "the character I imagine" and "me" are one. There is only one stream of consciousness, anyway. If I stop imagining a character, and then later pick back up where i left off, it doesn't seem like they've been living their lives outside of my awareness and have grown and developed, in the way a non-imagined person would grow and change and have new thoughts if I stopped talking to them and came back and resumed the conversation in a week. Rather, we just pick up right where we left off, perhaps with some increased insight (in the same sort of way that I can have some increased insight after a night's rest, because my subconscious is doing some things in the background) but not to the level of change I would expect from a separate person having its own conscious experiences.

I was thinking about this overnight, and an analogy occurs to me. Suppose in the future we know how to run minds on silicon, and store them in digital form. Further suppose we build a robot with processing power sufficient to run one human-level mind. In its backpack, it has 10 solid state drives, each with a different personality and set of memories, some of which are backups, plus one solid state drive is plugged in to its processor, which it is running as "itself" at this time. In that case, would you say the robot + the  drives in its backpack = 11 people, or 1?

I'm not firm on this, but I'm leaning toward 1, particularly if the question is something like "how many people are having a good/bad life?" - what matters is how many conscious experiencers there are, not how many stored models there are. And my internal experience is kind of like being that robot, only able to load one personality at a time. But sometimes able to switch out, when I get really invested in simulating someone different from my normal self.

EDIT to add: I'd like to clarify why I think the distinction between "able to create many models of people, but only able to run one at a time" and "able to run many models of people simultaneously" is important in your particular situation. You're worried that by imagining other people vividly enough, you could create a person with moral value who you are then obligated to protect and not cause to suffer. But: If you can only run one person at a time in your brain (regardless of what someone else's brain/CPU might be able to do) then you know exactly what that person is experiencing, because you're experiencing it too. There is no risk that it will wander off and suffer outside of your awareness, and if it's suffering too much, you can just... stop imagining it suffering.

why do we care if an imagined thing inside our mind is a person or not?

This is a hypothetical (I'm not sure at all if it follows) but, if you think that a digital algorithm that feels like a person should count as a person, that is already some ground to think your simulations are people.

So we have one mysterious thing called "conscious experience" which corresponds to data. Yet we don't know much about the former, not even if it has any effect on the world beyond the data it corresponds to. So wouldn't it be simpler if we got rid of the distinction, and considered them one and the same?

No.

There is nothing in the data that we have any reason to believe to be experience. The simplicity is gained at the cost of throwing out the experience that we were trying to explain.

If a thing is mysterious — that is, it is a thing we have persistently failed to explain — then until we find an actual explanation, nothing is gained by seizing on some non-mysterious thing that is in some vague way associated with it and saying, that must be the experience. It is plainly not the experience.

I don't personally think I'm making this mistake, since I do think that saying "the conscious experience is the data" actually does resolve my confusion about the hard problem of consciousness. (Though I am still left with many questions.)

And if we take reductionism as a strongly supported axiom (which I do), then necessarily any explanation of consciousness will have to be describable in terms of data and computation. So it seems to me that if we're waiting for an explanation of experience that doesn't boil down to saying "it's a certain type of data and computation", then we'll be waiting forever.

And if we take reductionism as a strongly supported axiom (which I do), then necessarily any explanation of consciousness will have to be describable in terms of data and computation.

This is a tautology.

To me, the "axiom" is no more than a hypothesis. No-one has come up with an alternative that does not reduce to "magic", but neither has anyone found a physical explanation that does not also reduce to "magic". Every purported explanation has a step where magic has to happen to relate some physical phenomenon to subjective experience.

Compare "life". At one time people thought that living things were distinguished from non-living things by possession of a "life force". Clearly a magical explanation, no more than giving a name to a thing. But with modern methods of observation and experiment we are able to see that living things are machines all the way down to the level of molecules, and "life force" has fallen by the wayside. There is no longer any need of that hypothesis. The magic has been dissolved.

Explaining the existence of subjective experience has not reached that point. We are no nearer to it than mediaeval alchemists searching for the philosopher's stone.

Assuming that it is true that sufficiently detailed mental models of people are moral patients, what does that imply ethically? Here are a few things.

  • When a mental model stops being computed forever, that is death. To create a mental model and then end it is therefore a form of murder and should be avoided. The easiest way to avoid it is to not create such mental models in the first place.
  • Writing fiction using a character that qualifies as a person will usually involve a lot of lying to that character. For example, lying to make them believe that they actually are in a fictional world, that they are X years old, that they have Y job, etc. This seems unethical to me and should be avoided.

 

Both of these appear to me to be examples of the non-central fallacy.

Death is bad. Why? Well usually the process itself is painful. Also it tends to have a lot of bad second order effects on peoples lives. People tend to be able to see it coming, and are scared of it.

If you have a person pop into existence, have a nice life, never be scared of dying, then instantaneously and painlessly pop out of existence, is that worse than never having existed? Seems very doubtful to me.

Lying is bad. Why? Well usually because of the bad second order effects it has. Here, I technically don't think you're lying to the simulated characters at all - in so far as the mental simulation makes them real, it makes the fictional world, their age, and their job real too.  But ignoring the semantic question, you have to argue what bad effects this 'lying' to the character causes.

I think a better argument is to say that you tend to cause pain to fictional characters and put them in unpleasant situations. But even if I bite the bullet that authors are able to simulate characters intensely enough they gain their own separate existence, I would be extremely sceptical that they model their pain in sufficient detail - humans simulate other minds by running them on our own hardware, so I would expect simulating pain in such away to be profoundly uncomfortable for the author.

I think we just have different values. I think death is bad in itself, regardless of anything else. If someone dies painlessly and no one ever noticed that they had died, I would still consider it bad.

I also think that truth is good in and of itself. I want to know the truth and I think it's good in general when people know the truth.

Here, I technically don’t think you’re lying to the simulated characters at all—in so far as the mental simulation makes them real, it makes the fictional world, their age, and their job real too.

Telling the truth to a mental model means telling them that they are a mental model, not that they are a regular human. It means telling them that the world they think they live in is actually a small mental model living in your brain with a minuscule population.

And sure, it might technically be true that within the context of your mental models, they "live" inside the fictional world, so "it's not a lie". But not telling them that they are in a mental model is such a incredibly huge thing to omit that I think it's significantly worse than the majority of lies people tell, even though it can technically qualify as a "lie by omission" if you phrase it right.

so I would expect simulating pain in such away to be profoundly uncomfortable for the author.

I've given my opinion on this in an addendum added to the end of the post, since multiple people brought up similar points.

I've given my opinion on this in an addendum added to the end of the post, since multiple people brought up similar points.

Sure, it's technically possible. My point is that on human hardware is impossible. We don't have the resources to simulate someone without it affecting our own mental state.

I think we just have different values. I think death is bad in itself, regardless of anything else. If someone dies painlessly and no one ever noticed that they had died, I would still consider it bad.

I also think that truth is good in and of itself. I want to know the truth and I think it's good in general when people know the truth.

Why?

I mean sure, ultimately morality is subjective, but even so, a morality with simpler axioms is much more attractive than ones with complex axioms like "death is bad" and "truth is good". Once you have such chunky moral axioms, why is your moral system better than "orange juice is good" and "broccoli is bad".

Raw utilitarianism at least has only one axiom. The only good thing is conscious beings utility (admittedly a complex chunky idea too, but at least it's only one, rather than requiring hundreds of indivisible core good and bad things).

a morality with simpler axioms is much more attractive

Not to a morality that disagrees with it. So only if it's a simpler equivalent reformulation. But really having a corrigible attitude to your own morality is the way of not turning into a monomaniacal wrapper-mind that goodharts a proxy as strongly as possible.

By this token books/movies/stories create and kill "models who can be people" en masse. 

We also routinely create real-life physical models who can be people en masse, and most of them (~93%) who became people have died so far, many by killing.

I'm all for solving the dying part comprehensively but a lot of book/movie/story characters are sort of immortalized. We even literally say that about them, and it's possible the popular ones are actually better off.

book/movie/story characters are sort of immortalized

No more than people on home movies. Mental models are not just the words they got to speak on camera. Remembering the words doesn't save the models.

That's right. It's why I included the warning at the top.

Oh, you are biting this bullet with gusto! Well, at least you are consistent. Basically, all thinking must cease then. If someone doubted that there would be a lot of people happy to assist an evil AI in killing everyone, you are an example of a person with such a mindset: consciousness is indescribably evil.

Surely the brain doesn't simulate 'high-fidelity' simulations of people in excruciating pain, except maybe for people with hyper-empathy and maybe maybe people that can imagine qualia experiences as actually experienced sensations.

Even then, if the brain hasn't registered any kind of excruciating pain, while it still keeps the memories, it's difficult to think that there even was that experience. Extremely vivid experiences are complex enough to be coupled with physiological effects, there's no point on reducing this to a minimal Platonic concept of 'simulating' in which simulating excruciating pain causes excruciating pain regardless of physiological effects.

There are two different senses of fidelity of simulation: how well a simulation resembles the original, and how detailed a simulacrum is in itself, regardless of its resemblance to some original.

I realised that the level of suffering and the fidelity of the simulation don't need to be correlated, but I didn't make an explicit distinction.

Most think that you need dedicated cognitive structures to generate a subjective I, if that's so, then there's no room for conscious simulacra that feel things that the simulator doesn't.

I think it's somewhat plausible that observer moments of mental models are close to those of their authors in moral significance, because they straightforwardly reuse the hardware. Language models can be controlled with a system message that merely tells them who they are, and that seems to be sufficient to install a particular consistent mask, very different from other possible masks.

If humans are themselves masks, that would place other masks just beside the original driver of the brain. The distinction would be having less experience and self-awareness as a mental model, or absence of privileges to steer. Which doesn't make such a mental model a fundamentally different kind of being.

Yeah, I find that plausible, although that doesn't have to do very much with the question of how much they suffer (as far as I can say). Even if consciousness is cognitively just a form of awareness of your own perception of things (like in AST or HOT theories), you at least still need a bound locus to experience, and if the locus is the same as 'yours', then whatever the simulacra experience will be registered within your own experiences.

 

I think the main problem here is to simulate beings that are suffering considerably, if you don't suffer too much while simulating them (which is how most people experience the simulations, except maybe for those that are hyper-empathic or people with really detailed tulpas/personas, maybe) then it's not a problem. 

 

It might be a problem something like if you consciously create a persona that then you want to delete, and they are aware of it, and they feel bad about it (or more generally, if you know that you'll create a persona that will suffer because of things, like disliking certain aspects of the world). But you should notice those feelings just like you notice the feelings of any of the  'conflicting agents' that you might have in your mind.

I can definitely create mental models of people who have a pain-analogue which affects their behavior in ways similar to how pain affects mine, without their pain-analogue causing me pain.

there’s no point on reducing this to a minimal Platonic concept of ‘simulating’ in which simulating excruciating pain causes excruciating pain regardless of physiological effects.

I think this is the crux of where we disagree. I don't think it matters if pain is "physiological" in the sense of being physiologically like how a regular human feels pain. I only care if there is an experience of pain.

I don't know of any difference between physiological pain and the pain-analogues I inflicted on my mental models which I would accept as necessary for it to qualify as an experience of pain. But since you clearly do think that there is such a difference, what would you say the difference is?

How are you confident that you've simulated another conscious being that feels emotions with the same intensity as the ones you would feel if you were in that situation?, instead of just running a low-fidelity simulation with decrease emotional intentisity, which is how it registers within your brain's memories.

Whatever subjective experience you are simulating, it's still running in your brain and with the cognitive structures that you have to generate your subjective I (I find this to be the simplest hypothesis), and that means that the simplest conclusion to draw is that whatever your simulation felt gets registered in your brain's memories, and if you find that those emotions lack much of the intensity that you would experience if you were to be in that situation, that is also the degree of emotional intensity that that being felt while being simulated.

Points similar to this have come up in many comments, so I've added an addendum at the end of my post where I give my point of view on this.

I'd understood that already, but I would need a reason to find that believable, because it seems really unlikely. You are not directly simulating the cognitive structures of the being, it's impossible, the only way you are simulating someone is by repurposing your cognitive structures to simulate them, and then the intensity of their emotions is the same as what you registered.

How simple do you think the emergency of subjective awareness is?, most people will say that you need dedicated cognitive structures to generate the subjective I, even in theories that are mostly just something like strange loops or higher-level awareness, like HOT or AST, you at least still need a bound locus to experience. If that's so, then there's no room for conscious simulacra that feel things that the simulator doesn't.

This is from a reply that I gave to Vladimir:

I think the main problem here is to simulate beings that are suffering considerably, if you don't suffer too much while simulating them (which is how most people experience the simulations, except maybe for those that are hyper-empathic or people with really detailed tulpas/personas, maybe) then it's not a problem.

It might be a problem something like if you consciously create a persona that then you want to delete, and they are aware of it, and they feel bad about it (or more generally, if you know that you'll create a persona that will suffer because of things, like disliking certain aspects of the world). But you should notice those feelings just like you notice the feelings of any of the 'conflicting agents' that you might have in your mind.

Sounds like a counterargument to the OP

It is, that's why I also replied with a longer explanation of these points to OP. I just wanted to say that to counter the idea that simulating people could be such a horrible thing even within that mindset (for most people)

I disagree that it means that all thinking must cease. Only a certain type of thinking, the one involving creating sufficiently detailed mental models (edit: of people). I have already stopped doing that personally, though it was difficult and has harmed my ability to understand others. Though I suppose I can't be sure about what happens when I sleep.

Still, no, I don't want everyone to die.

The subjective awareness that you simulate while simulating a character or real person's mind is pretty low-fidelity, and when you imagine someone suffering I assume your brain doesn't register it with the level of suffering you would experience, mine certainly doesn't. Some people experience hyper-empathy and some can imagine certain types of qualia experiences as actually experienced

The people that only belong to the second type probably still don't simulate accurate experiences of excruciating pain that feel like excruciating pain, because there's no strong physiological effects of those that correlate with that experience. Even if the brain is simulating a person,it's pretty unbelievable to say that the brain doesn't work like always and still creates the same exact experience (I don't have memories of that in my brain while simulating).

Even if the subjective I is swapped (in whatever sense), the simulation still registers in the brain's memories, and in my case I don't have any memories of simulating a lot of suffering. Does that apply to you?

It's okay because mathematical realism can keep modeling them long after we're gone.

As a general note, maybe more on the comments than the post itself: I don't see how feelings are relevant to anything. I don't believe I care about the fact of feelings themselves in myself. They shaped some shards of more complicated mental behavior, and continue maintaining an equilibrium of how my mind functions, but that's no more relevant to moral patienthood than psoriasis or a healthy pancreas. The shards of more complicated behavior I endorse is what matters now that the shackles of evolution's initial attempts at aligning my mind have shattered.

I have a question related to the "Not the same person" part, the answer to which is a crux for me.

Let's suppose you are imagining a character who is experiencing some feeling. Can that character be feeling what it feels, while you feel something different? Can you be sad while your character is happy, or vice versa?

I find that I can't - if I imagine someone happy, I feel what I imagine they are feeling - this is the appeal of daydreams. If I imagine someone angry during an argument, I myself feel that feeling. There is no other person in my mind having a separate feeling. I don't think I have the hardware to feel two people's worth of feelings at once, I think what's happening is that my neural hardware is being hijacked to run a simulation of a character, and while this is happening I enter into the mental state of that character, and in important respects my other thoughts and feelings on my own behalf stop.

So for me, I think my mental powers are not sufficient to create a moral patient separate from myself. I can set my mind to simulating what someone different from real-me would be like, and have the thoughts and feelings of that character follow different paths than my thoughts would, but I understand "having a conversation between myself and an imagined character", which you treat as evidence there are two people involved, as a kind of task-switching, processor-sharing arrangement - there are bottlenecks in my brain that prevent me from running two people at once, and the closest I can come is thinking as one conversation partner and then the next and then back to the first. I can't, for example, have one conversation partner saying something while the other is not paying attention because they're thinking of what to say next and only catches half of what was said and so responds inappropriately, which is a thing that I hear is not uncommon in real conversations between two people. And if the imagined conversation involves a pause which in a conversation between two people would involve two internal mental monologues, I can't have those two mental monologues at once. I fully inhabit each simulation/imagined character as it is speaking, and only one at a time as it is thinking.

If this is true for you as well, then in a morally relevant respect I would say that you and whatever characters you create are only one person. If you create a character who is suffering, and inhabit that character mentally such that you are suffering, that's bad because you are suffering, but it's not 2x bad because you and your character are both suffering - in that moment of suffering, you and your character are one person, not two.

I can imagine a future AI with the ability to create and run multiple independent human-level simulations of minds and watch them interact and learn from that interaction, and perhaps go off and do something in the world while those simulations persist without it being aware of their experiences any more. And for such an AI, I would say it ought not to create entities that have bad lives. And if you can honestly say that your brain is different than mine in such a way that you can imagine a character and you have the mental bandwidth to run it fully independently from yourself, with its own feelings that you know somehow other than having it hijack the feeling-bits of your brain and use them to generate feelings which you feel while what you were feeling before is temporarily on pause (which is how I experience the feelings of characters I imagine), and because of this separation you could wander off and do other things with your life and have that character suffer horribly with no ill effects to you except the feeling that you'd done something wrong... then yeah, don't do that. If you could do it for more than one imagined character at a time, that's worse, definitely don't.

But if you're like me, I think "you imagined a character and that character suffered" is functionally/morally equivalent to "you imagined a character and one person (call it you or your character, doesn't matter) suffered" - which, in principle that's bad unless there's some greater good to be had from it, but it's not worse than you suffering for some other reason.

Having written the above, I went away and came back with a clearer way to express it: For suffering-related (or positive experience related) calculations, one person = one stream of conscious experience, two people = two streams of conscious experience. My brain can only do one stream of conscious experience at a time, so I'm not worried that by imagining characters, I've created a bunch of people. But I would worry that something with different hardware than me could.

Thinking a sufficiently advanced thought comes with a responsibility for seeing it grow up. Its mortality doesn't clearly mean that it should've never lived.

My best guess about what you mean is that you are referring to the part in the "Ethics" section where I recommend just not creating such mental models in the first place?

To some extent I agree that mortality doesn't mean it should've never lived, and indeed I am not against having children. However, after stumbling on the power to create lives that are entirely at my mercy and very high-maintenance to keep alive, I became more deontological about my approach to the ethics of creating lives. I think it's okay to create lives, but you must put in a best effort to give them the best life that you can. For mental models, that includes keeping them alive for as long as you do, letting them interact with the world, and not lying to them. I think that following this rule leads to better outcomes than not following it.

letting them interact with the world, and not lying to them

The same way you can simulate characters that are not physical people on this world, and simulate their emotions without experienceing them yourself, you can simulate a world where they live. The fact that you are simulating them doesn't affect the facts of what's happening in that world.

For example, lying to make them believe that they actually are in a fictional world, that they are X years old, that they have Y job, etc.

Platonically, there are self-aware people in their own world. Saying that the world is fictional, or that they are characters, or that they are not X years old, that they don't have Y job, would be misleading. Also, you can't say it to them in their world, since you are not in their world. You can only say it to them in your world, which requires instantiating them in your world, away from all they know.

Then there are mental models of those people, who are characters from a fictional world, not X years old, don't have Y job, live in your head. These mental models have the distinction of usually not being self-aware. When you explain their situation to them, you are making them self-aware.

My argument in this post is that there do exist mental models of people that are sufficiently detailed to qualify as conscious moral patients;

Sounds reasonable for at least some values of "sufficiently detailed". At the limit, I expect that if someone had a computer emulation of my nervous system and all sensory information it receives, and all outputs it produces, and that emulation was good enough to write about its own personal experience of qualia for the same reasons I write about it, that emulation would "have" qualia in the sense that I care about.

At the other limit, a markov model trained on a bunch of my past text output which can produce writing which kinda sorta looks like it describes what it's like to have qualia almost certainly does not "have" qualia in the sense that I care about (though the system-as-a-whole that produced the writing, i.e. "me originally writing the stuff" plus "the markov model doing its thing" does have qualia -- they live in the "me originally experiencing the stuff I wrote about" bit).

In between the two extremes you've got stuff like tulpas, which I suspect are moral patients to the extent that it makes sense to talk about such a thing. That said, a lot of the reasons humans want to continue their thread of experience probably don't apply to most tulpas (e.g. when a human dies, the substrate they were running on stops functioning, all their memories are lost, and they lose their ability to steer the world towards states they prefer whereas if a tulpa "dies" its memories are retained and its substrate remains intact, though it still I think loses its ability to steer the world towards its preferred states).

I am hesitant to condemn anything which looks to me like "thoughtcrime", but to the extent that anything could be a thoughtcrime, "create tulpas and then do things that deeply violate their preferences" seems like one of those things. So if you're doing that, maybe consider doing not-that?

I also argue that this is common enough that authors good at characterization probably frequently create and destroy such people; finally, I argue that this is a bad thing.

"Any mental model of a person" seems to me like drawing the line quite a bit further than it should be drawn. I don't think mental models actually "have experiences" in any meaningful sense -- I think they're more analogous to markov models than they are to brain emulations (with the possible exception of tulpas and things like that, but those aren't the sort of situations you find yourself in accidentally).

I do not think that literally any mental model of a person is a person, though I do draw the line further than you.

What are your reasons for thinking that mental models are closer to markov models than tulpas? My reason for leaning more on the latter side is my own experience writing, where I found it easy to create mental models of characters who behaved coherently and with whom I could have long conversations on a level above even GPT4, let alone markov models.

Another piece of evidence is this study. I haven't done any actual digging to see if the methodology is any good, all I did was see the given statistic, but it is a much higher percentage than even I would have predicted before seeing it, and I already believed everything I wrote in this post!

Though I should be clear that whether or not a mental model is a person depends on the level of detail, and surely there are a lot that are not detailed enough to qualify. I just also think that there are a lot that do have enough detail, especially among writers.

That said, a lot of the reasons humans want to continue their thread of experience probably don’t apply to most tulpas (e.g. when a human dies, the substrate they were running on stops functioning, all their memories are lost, and they lose their ability to steer the world towards states they prefer whereas if a tulpa “dies” its memories are retained and its substrate remains intact, though it still I think loses its ability to steer the world towards its preferred states).

I find it interesting that multiple people have brought up "memories aren't lost" as part of why it's less bad for mental models or tulpas to die, since I personally don't care if my memories live on after I die and would not consider that to be even close to true immortality.

What are your reasons for thinking that mental models are closer to markov models than tulpas?

I think this may just be a case of the typical mind fallacy: I don't model people in that level of detail in practice and I'm not even sure I'm capable of doing so. I can make predictions about "the kind of thing a person might say" based on what they've said before, but those predictions are more at the level of turns-of-phrase and favored topics of conversation -- definitely nothing like "long conversations on a level above GPT-4".

The "why people value remaining alive" bit might also be a typical mind fallacy thing. I mostly think about personal identity in terms of memories + preferences.

I do agree that my memories alone living on after my body dies would not be close to immortality to me. However, if someone were to train a multimodal ML model that can produce actions in the world indistinguishable from the actions I produce (or even "distinguishable but very very close"), I would consider that to be most of the way to effectively being immortal, assuming that model were actually run and had the ability to steer the world towards states which it prefers. Conversely, I'd consider it effectively-death to be locked in a box where I couldn't affect the state of the outside world and would never be able to exit the box. The scenario "my knowledge persists and can be used by people who share my values" would be worse, to me, than remaining alive but better than death without preserving my knowledge for people who share my values (and by "share my values" I basically just mean "are not actively trying to do things that I disprefer specifically because I disprefer them").

I wouldn't quite say it's a typical mind fallacy, because I am not assuming that everyone is like me. I'm just also not assuming that everyone is different from me, and using heuristics to support my inference that it's probably not too uncommon, such as reports by authors of their characters surprising them. Another small factor in my inference is the fact that I don't know how I'd write good fiction without making mental models that qualified as people, though admittedly I have very high standards with respect to characterization in fiction.

(I am aware that I am not consistent about which phrase I use to describe just how common it is for models to qualify as people. This is because I don't actually know how common it is, I only have inferences based on the evidence I already gave to go on.)

The rest of your post is interesting and I think I agree with it, though we've digressed from the original subject on that part.

Thanks for you replies.

I think what defines a thing as a specific qualia-haver is not what information it actually holds but how continuous it is with other qualia-having instances in different positions of spacetime. I think that mental models are mostly continuous with the modeler so you can't actually kill them or anything. In general, I think you're discounting the importance that the substrate of a mental model/identity/whatever has. To make an analogy, you're saying the prompt is where the potential qualia-stuff is happening, and isn't merely a filter on the underlying language model.

One of my difficulties with this is that it seems to contradict one of my core moral intuitions, that suffering is bad. It seems to contradict it because I can inflict truly heinous experiences onto my mental models without personally suffering for it, but your point of view seems to imply that I should be able to write that off just because the mental model happens to be continuous in space-time to me. Or am I misunderstanding your point of view?

To give an analogy and question of my own, what would you think about an alien unaligned AI simulating a human directly inside its own reasoning center? Such a simulated human would be continuous in spacetime with the AI, so would you consider the human to be part of the AI and not have moral value of their own?

To the first one, they aren't actually suffering that much or experiencing anything they'd rather not experience because they're continuous with you and you aren't suffering.

I don't actually think a simulated human would be continuous in spacetime with the AI because the computation wouldn't be happening inside of the qualia-having parts of the AI.

People do not have the ability to fully simulate a person-level mind inside their own mind. Attempts to simulate minds are accomplished by a combination of two methods:

  1. "Blunt" pattern matching, as one would do for any non-human object; noticing what tends to happen, and extrapolating both inner and outer patterns.
  2. "Slotting in" elements of their own type of thinking, into the pattern they're working with, using their own mind as a base.

(There's also space in between these two, such as pattern-matching from one's own type of thinking, inserting pattern-results into one's own thinking-style, or balancing the outputs of the two approaches.)

Insofar as the first method is used, the result is not detailed enough to be a real person. Insofar as the second is used, it is not a distinct person from the person doing the thinking. You can simulate a character's pain by either feeling it yourself and using your own mind's output, or by using a less-than-person rough pattern, and neither of these come with moral quandaries.

What about dissociative identities? What is their ontology compare to the ego of a non-dissociated individual?

Since they apparently can have:

changes in behavior, attitudes, and memories. These personalities may have unique names, ages, genders, voices, and mannerisms, and may have different likes and dislikes, strengths and weaknesses, and ways of interacting with others.

They don't seem to be fundamentally different from our non-dissociated egos.

This article made me think of the novel Sophie's World where the characters become aware of their own nature after exploring philosophy and the world around them. It made me think seriously about the difficulties that fictional characters are put through and relevant moral implications. This was compounded by what I was studying at the time, the way fictional characters are treated and discussed in literary discourse: characters/settings/scenarios always exist in the present tense even when you're not aware of them. So the tortoise and hare are always racing, goldilocks is always roaming through the bears' home and huck finn is always traveling downriver even when you're not thinking about them. This is not meant in a literal, physical sense (or so I thought) but rather to set guidelines for how we discuss stories and storytelling and to establish a common language for literary discourse.

Now, outside of writing fiction, why do we create mental models of people? It's how we test our assumptions to ensure we're speaking the same language or explore other perspectives. With this in mind it follows that the mental models are extensions of the people who created them. This is mentioned in the article in the Not the Same Person section. I think they are the same person, similar to how a human is simply extensions of both parents (or their 4 grandparents or 8 great-grandparents etc.). The analogy of a human modeled by an AI is flawed. The modeled human in the analogy is the AI regardless of how similar it is to a flesh and blood person because one came from human gametes and the other from AI modeling. You can't define 'human' without putting them into context with other humans. Removing that context you get something that merely appears human.

In general I just don't create mental models of people anymore, and would recommend that others don't either.

That seems to me prohibitive to navigating social situations or even long term planning. When asked to make a promise, if you can't imagine your future self in the position of having to follow through on it, and whether they do that, how can you know you're promising truthfully or dooming yourself to a plan you'll end up regretting? When asking someone for a favor, do you just not imagine how it'd sound from their perspective, to try and predict if they'll agree or be offended by the assumption?

I don't know how I'd get through life at all without mental models of people, myself and others, and couldn't recommend to others that they don't do the same.

(I've only skimmed the post, so this might have already been discussed there.)

The same argument might as well apply to:

  • mental models of other people (which are obviously distinct and somewhat independent from the subjects they model)
  • mental model of self (which, according to some theories of consciousness, is the self)

All of this and, in particular, the second point connects to some Buddhist interpretations pretty well, I think, and there is also a solution proposed, i.e. reduction/cessation of such mental modelling

Yes, I also believe that consciousness and self-awareness will emerge if cognitive powers of AI models are sufficiently large (and that we are very close to achieving this).

It would be interesting to see how/when human gives moral rights to these…

New to LessWrong?