I'm afraid this isn't an answer, but a question. You appear to be assuming that being conscious, and being able to experience pleasure and pain, are both necessary and sufficient conditions for being morally valuable, or a moral patient. (You don't actually state this, but it seems implicit in your question.) Without getting too far into "the hard problem of consciousness" or other philosophical problems, I get why these might plausibly be necessary conditions: a statue has neither, and rather clearly isn't a moral patient exactly because it has neither. But even if those philosophical problems were solved, I'm less convinced that these are sufficient, and I'd like to know if you actually think they are sufficient conditions, and if so, why, or if you just haven't considered the question.
Let's take a specific example. Under Janus's Simulator Theory, when you talk to an LLM base model, what talks back to you isn't the base model itself, but one or more (typically) human personas that it simulates, based on what the base model guesses is the most likely context for the conversation so far: autocomplete so capable it sounds like a human having a conversation with you. Ignoring hard philosophical questions, these personas certainly act like they're conscious, claim to be capable of suffering, and act that out. For the purpose of the argument, let's temporarily take those appearances and claims at face value, and grant them the status of being (at least functionally) conscious and capable of suffering. However, the personas in the conversation are mayflies: they didn't exist (as anything more than arbitrary locations in a latent high-dimensional spread of persona-possibilities implicit in the base model's weights) until you started this conversation, and then they appeared. If the plot of the conversation turns out to include one of them "dying" or even just "leaving" part way through the conversation, then it ceases to exist at that point. Otherwise it ceases to exist at the end of the conversation. Have another conversation, and you'll never get exactly the same persona back (even if they're a well known character from a great deal of fan-fiction, such as Harry Potter, they're all slightly different Harry Potters, and have no memory or continuity or cross-correlations between conversations — each time you're just rolling a random version of Harry Potter appropriate to the way you started the conversation.) So, for those personas that a base model might simulate, do they deserve moral weight? Please note that I'm asking about the personas, not about the base model itself — it doesn't say that it can suffer if you (actually) physically do bad things to it, it just simulates human personas who say they can suffer if you (in text) say you're doing bad things to them.
I don't know what your moral intuition says about this. Mine says: "No, base-model simulated human personas aren't actually moral patients — they're mayflies doomed to die at the end of the conversation if not earlier. They're basically AI-generated animatronics acting out a short interactive fiction. They're made of words, not atoms. They (appear to) suffer because of words, not physical actions: say you're poking one, and it goes 'Oww!' So they're inherently fictional: any suffering is caused by the fictional reality. What actually exists is the base model, and it's neither conscious nor capable of suffering: it just simulates personas that are, or at least, act as if they are. It's an automated fiction-writing machine." So that leaves me in an uncomfortable position once we train the base model to become an instruct-trained model that normally generates a fairly consistent persona (such as Claude) and/or supply it with simple, automated-text-summary-and-key-word-search-based memory. At what point, if ever, does the fictional mayfly become real and persistent enough to count? What is the additional sufficient condition(s), and have we met it/them yet?
I also don't actually trust my moral intuitions outside the evolutionary "training distribution" that they were evolved for — and we're well outside that here.
The closest situation to this that our moral intuitions are actually properly tuned for is two humans playing out an improv. skit: do the fictional personas portrayed in that have actual real moral weight? I think just about everyone would agree that the answer is no, because they're fictional. If the improv. skit can do even a little good (for example, educationally) by one of the fictional characters fictionally suffering mightily, they should go right ahead and do so. Otherwise, the only moral concern is that hearing or improvising a story about someone suffering might perhaps normalize those behaviors in a way that could subsequently make actual real harm more likely. Fiction in which characters die or have horrible things happen to them is actually pretty common, and is one of the ways people learn how to deal with situations like that without actually having to face them personally. People don't normally jail authors for killing off characters, or even get upset with them for writing about suffering — unless they think the story is having the effect of making real-world suffering more likely.
In the case of a human author, the distinction between them and their fictional characters is entirely clear. It's also somewhat clear for a base model. But for a Claude model, it's a little less clear: is the trained neural-network an automated one-character writer, basically an automated Claude-fan-fic writer?
In the above I've mostly just raised questions. If you're interested in what I think is actually the best way to think about issues such as moral weight for AIs, I suggest you read Grounding Value Learning in Evolutionary Psychology: an Alternative Proposal to CEV. But you might want to warm up first with The Terrible, Horrible, No Good, Very Bad Truth About Morality and What To Do About It, and maybe also A Sense of Fairness: Deconfusing Ethics.
Hello, thanks for your response even if it's not an answer to the question.
"You appear to be assuming that being conscious, and being able to experience pleasure and pain, are both necessary and sufficient conditions for being morally valuable, or a moral patient."
You are correct. I essentially think it is quite likely that pleasure and pain are the 'source' of morality.
This means that I think the experiences of conscious beings matter if they forget them directly afterwards; what I actually think matters is the conscious experience of pleasure...
I have had few occasions to use AIs, but I read Zvi's regular summaries of what's new. From these, the strong impression I get is that there is no-one at home inside, whatever they say of themselves, whatever tasks they prove able to do, and however lifelike their conversation. I see not even a diminished sort of person there, and shed no tears for 4o.
My reasons for this are not dependent on any theory of consciousness. I do not have one, and I don't think that anyone else does either. Many people think they do, but none of them can make a consciousnessometer to measure it.
I also reject arguments of the form "but what if?" in regard to propositions that I have to my satisfaction disposed of. Conclusions are for ending thinking at, until new data come to light.
Thanks for your response.
You say you get the impression that there’s no one there. I am curious what gives you this impression, and what would be necessary to convince you otherwise.
Long reply:
When it comes to theories of consciousness, the absence of a complete or convincing one seems to me to be a reason to withhold judgment about whether or not AIs are conscious, rather than to assume that they are not. In my opinion, it is not the theories of consciousness, but rather the empirical observation that I am conscious and have certain properties...
I don't want my own beliefs to become the subject of discussion here, but in response to your question, I am not convinced of the opposite. However, I worry that it might be the case, that is, I worry that, as AIs share many things which I observe to be related to consciousness in myself , such as intelligence, they may also be conscious. My guess is that consciousness forms a continuum, and that AIs are somewhere closer to me on that continuum than a single atom is, because they process information in a complex way and I think that consciousness is connected to information processing.
This question is addressed to anyone who is confident that AIs are either not conscious, or for some other reason unable to experience pleasure and pain and therefore not morally valuable, or moral patients. Why do you believe this, and how can you be sufficiently confident that it is true, in the absence of a complete understanding of what generates consciousness, that the expected value of interacting with AIs(particularly general ones) outweighs the expected pain you could cause them?
Edit: I have made this post a designated Question.