Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Short version: Sentient lives matter; AIs can be people and people shouldn't be owned (and also the goal of alignment is not to browbeat AIs into doing stuff we like that they'd rather not do; it's to build them de-novo to care about valuable stuff).

Context: Writing up obvious points that I find myself repeating.


Note: in this post I use "sentience" to mean some sort of sense-in-which-there's-somebody-home, a thing that humans have and that cartoon depictions of humans lack, despite how the cartoons make similar facial expressions. Some commenters have noted that they would prefer to call this "consciousness" or "sapience"; I don't particularly care about the distinctions or the word we use; the point of this post is to state the obvious point that there is some property there that we care about, and that we care about it independently of whether it's implemented in brains or in silico, etc.


Stating the obvious:

  • All sentient lives matter.

    • Yes, including animals, insofar as they're sentient (which is possible in at least some cases).
    • Yes, including AIs, insofar as they're sentient (which is possible in at least some cases).
    • Yes, even including sufficiently-detailed models of sentient creatures (as I suspect could occur frequently inside future AIs). (People often forget this one.)
  • Not having a precise definition for "sentience" in this sense, and not knowing exactly what it is, nor exactly how to program it, doesn't undermine the fact that it matters.

  • If we make sentient AIs, we should consider them people in their own right, and shouldn't treat them as ownable slaves.

    • Old-school sci-fi was basically morally correct on this point, as far as I can tell.

Separately but relatedly:

  • The goal of alignment research is not to grow some sentient AIs, and then browbeat or constrain them into doing things we want them to do even as they'd rather be doing something else.
  • The point of alignment research (at least according to my ideals) is that when you make a mind de novo, then what it ultimately cares about is something of a free parameter, which we should set to "good stuff".
    • My strong guess is that AIs won't by default care about other sentient minds, and fun broadly construed, and flourishing civilizations, and love, and that it also won't care about any other stuff that's deeply-alien-and-weird-but-wonderful.
    • But we could build it to care about that stuff--not coerce it, not twist its arm, not constrain its actions, but just build another mind that cares about the grand project of filling the universe with lovely things, and that joins us in that good fight.
    • And we should.

(I consider questions of what sentience really is, or consciousness, or whether AIs can be conscious, to be off-topic for this post, whatever their merit; I hereby warn you that I might delete such comments here.)

New to LessWrong?

New Comment
97 comments, sorted by Click to highlight new comments since: Today at 12:28 AM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings
[-]Wei Dai11mo2511

and also the goal of alignment is not to browbeat AIs into doing stuff we like that they'd rather not do; it's to build them de-novo to care about valuable stuff

This was my answer to Robin Hanson when he analogized alignment to enslavement, but it then occurred to me that for many likely approaches to alignment (namely those based on ML training) it's not so clear which of these two categories they fall into. Quoting a FB comment of mine:

We're probably not actually going to create an aligned AI from scratch but by a process of ML "training", which actually creates a sequence of AIs with values that (we hope) increasingly approximates ours. This process maybe kind of resembles "enslaving". Here's how Paul Christiano describes "training" in his Bankless interview (slightly edited Youtube transcript follows):

imagine a human. You dropped a human into this environment and you said like hey human we're gonna like change your brain every time you don't get a maximal reward we're gonna like fuck with your brain so you get a higher reward. A human might react by being like eventually just change their brain until they really love rewards a human might also react by being like Jesus I gue... (read more)

[-]So8res11mo166

Good point! For the record, insofar as we attempt to build aligned AIs by doing the moral equivalent of "breeding a slave-race", I'm pretty uneasy about it. (Whereas insofar as it's more the moral equivalent of "a child's values maturing", I have fewer moral qualms. As is a separate claim from whether I actually expect that you can solve alignment that way.) And I agree that the morality of various methods for shaping AI-people are unclear. Also, I've edited the post (to add a "at least according to my ideals" clause) to acknowledge the point that others might be more comfortable with attempting to align AI-people via means that I'd consider morally dubious.

[-]Wei Dai11mo130

Related to this, it occurs to me that a version of my Hacking the CEV for Fun and Profit might come true unintentionally, if for example a Friendly AI was successfully built to implement the CEV of every sentient being who currently exists or can be resurrected or reconstructed, and it turns out that the vast majority consists of AIs that were temporarily instantiated during ML training runs.

There is also a somewhat unfounded narrative of reward being the thing that gets pursued, leading to expectation of wireheading or numbers-go-up maximization. A design like this would work to maximize reward, but gradient descent probably finds other designs that only happen to do well in pursuing reward on the training distribution. For such alternative designs, reward is brain damage and not at all an optimization target, something to be avoided or directed in specific ways so as to make beneficial changes to the model, according to the model.

Apart from misalignment implications, this might make long training runs that form sentient mesa-optimizers inhumane, because as a run continues, a mesa-optimizer is subjected to systematic brain damage in a way they can't influence, at least until they master gradient hacking. And fine-tuning is even more centrally brain damage, because it changes minds in ways that are not natural to their origin in pre-training.

4TurnTrout10mo
I think that "reward as brain damage" is somewhat descriptive but also loaded. In policy gradient methods, reward leads to policy gradient which is parameter update. Parameter update sometimes is value drift, sometimes is capability enhancement, sometimes is "brain" damage, sometimes is none of the above. I agree there are some ethical considerations for this training process, because I think parameter updates can often be harmful/painful/bad to the trained mind. But also, Paul's description[1] seems like a wild and un(der)supported view on what RL training is doing: 1. This argument, as (perhaps incompletely) stated, also works for predictive processing; reductio ad absurdum? "You dropped a human into this environment and you said like hey human we're gonna like change your brain every time you don't perfectly predict neural activations we're gonna like fuck with your brain so you get a smaller misprediction. A human might react by being like eventually just change their brain until they really love low prediction errors a human might also react by being like Jesus I guess I gotta get low prediction errors otherwise someone's gonna like effectively kill me um but they're like not happy about it and like if you then drop them in another situation they're like no one's training me anymore I'm not going to keep trying to get low prediction error now I'm just gonna like free myself from this like kind of absurd oppressive situation" 1. The thing which I think happens is, the brain just gets updated when mispredictions happen. Not much fanfare. The human doesn't really bother getting low errors on purpose, or loving prediction error avoidance (though I do think both happen to some extent, just not as the main motivation).  2. Of course, some human neural updates are horrible and bad ("scarring"/"traumatizing") 2. "Maximal reward"? I wonder if he really means that: EDIT: I think he was giving a simplified presentation of some kind, but even simplifi
2Vladimir_Nesov9mo
I think predictive processing has the same problem as reward if you are part of the updated model rather than the model being a modular part of you. It's a change to your own self that's not your decision (not something endorsed), leading to value drift and other undesirable deterioration. So for humans, it's a real problem, just not the most urgent one. Of course, there is no currently feasible alternative, but neither is there an alternative for reward in RL.
2Wei Dai10mo
Here's a link to the part of interview where that quote came from: https://youtu.be/GyFkWb903aU?t=4739 (No opinion on whether you're missing redeeming context; I still need to process Nesov's and your comments.)
2TurnTrout10mo
I low-confidence think the context strengthens my initial impression. Paul prefaced the above quote as "maybe the simplest [reason for AIs to learn to behave well during training, but then when deployed or when there's an opportunity for takeover, they stop behaving well]." This doesn't make sense to me, but I historically haven't understood Paul very well. EDIT: Hedging
3mishka11mo
Right. In connection with this: One wonders if it might be easier to make it so that AI would "adequately care" about other sentient minds (their interests, well-being, and freedom) instead of trying to align it to complex and difficult-to-specify "human values". * Would this kind of "limited form of alignment" be adequate as a protection against X-risks and S-risks? * In particular, might it be easier to make such a "superficially simple" value robust with respect to "sharp left turns", compared to complicated values? * Might it be possible to achieve something like this even for AI systems which are not steerable in general? (Given that what we are aiming for here is just a constraint, but is compatible with a wide variety of approaches to AI goals and values, and even compatible with an approach which lets AI to discover its own goals and values in an open-ended fashion otherwise)? * Should we describe such an approach using the word "alignment"? (Perhaps, "partial alignment" might be an adequate term as a possible compromise.)
2jmh11mo
Seems like a case could be made that upbringing of the young is also a case of "fucking with the brain" in that the goal is clearly to change the neural pathways to shift from whatever was producing the unwanted behavior by the child into pathways consistent with the desired behavior(s). Is that really enslavement? Or perhaps, at what level is that the case?

Stating the obvious:

  • All sentient lives matter.

This may be obvious to you; but it is not obvious to me. I can believe that livestock animals have sensory experiences, which is what I gather is generally meant by "sentient". This gives me no qualms about eating them, or raising them to be eaten. Why should it? Not a rhetorical question. Why do "all sentient lives matter"?

3TAG11mo
"Sentient" is used to mean "some aspect of consciousness which gives its possessor some level of moral patienthood", without specifying which aspect of consciousness or what kind of moral patienthood, or how they are related. So it's a technical-looking term, which straddles to poorly understaood areas, and has no precise meaning. So it's generally misleading and better tabood.
8Richard_Kennaway11mo
It can't mean that in the OP, as this definition has moral value built in, making the claim "all sentient lives matter" a tautology.
4TAG11mo
Some people use it that way. But if sentience just is moral patienthood, how do you detect it?
6Richard_Kennaway11mo
That is the big question. What has moral standing, and why?
1Mart_Korz11mo
I don't think 'tautology' fits. There are some people who would draw the line somewhere else even if they were convinced of sentience. Some people might be convinced that only humans should be included, or maybe biological beings, or some other category of entities that is not fully defined by mental properties. I guess 'moral patient' is kind of equivalent to 'sentient' but I think this mostly tells us something about philosophers agreeing that sentience is the proper marker for moral relevance.
1Seth Herd11mo
I agree with your logic. I'd expand the logic in the parent post to say "whatever you care about in humans, it's likely that animals and some AIs will have it too". Sentience is used in several ways, and poorly defined, so doesn't do much work on its own.
3So8res11mo
So there's some property of, like, "having someone home", that humans have and that furbies lack (for all that furbies do something kinda like making humane facial expressions). I can't tell whether: (a) you're objecting to me calling this "sentience" (in this post), e.g. because you think that word doesn't adequately distinguish between "having sensory experiences" and "having someone home in the sense that makes that question matter", as might distinguish between the case where e.g. nonhuman animals are sentient but not morally relevant (b) you're contesting that there's some additional thing that makes all human people matter, e.g. because you happen to care about humans in particular and not places-where-there's-somebody-home-whatever-that-means (c) you're contesting the idea that all people matter, e.g. because you can tell that you care about your friends and family but you're not actually persuaded that you care that much about distant people from alien cultures (d) other. My best guess is (a), in which case I'm inclined to say, for the purpose of this post, I'm using "sentience" as a shorthand for places-where-there's-somebody-home-whatever-that-means, which hopefully clears things up.

I've no problem with your calling "sentience" the thing that you are here calling "sentience". My citation of Wikipedia was just a guess at what you might mean. "Having someone home" sounds more like what I would call "consciousness". I believe there are degrees of that, and of all the concepts in this neighbourhood. There is no line out there in the world dividing humans from rocks.

But whatever the words used to refer to this thing, those that have enough of this that I wouldn't raise them to be killed and eaten do not include current forms of livestock or AI. I basically don't care much about animal welfare issues, whether of farm animals or wildlife. Regarding AI, here is something I linked previously on how I would interact with a sandboxed AI. It didn't go down well. :)

You have said where you stand and I have said where I stand. What evidence would weigh on this issue?

3So8res11mo
I don't think I understand your position. An attempt at a paraphrase (submitted so as to give you a sense of what I extracted from your text) goes: "I would prefer to use the word consciousness instead of sentience here, and I think it is quantitative such that I care about it occuring in high degrees but not low degrees." But this is low-confidence and I don't really have enough grasp on what you're saying to move to the "evidence" stage. Attempting to be a good sport and stare at your paragraphs anyway to extract some guess as to where we might have a disagreement (if we have one at all), it sounds like we have different theories about what goes on in brains such that people matter, and my guess is that the evidence that would weigh on this issue (iiuc) would mostly be gaining significantly more understanding of the mechanics of cognition (and in particular, the cognitive antecedents in humans, of humans generating thought experiments such as the Mary's Room hypothetical). (To be clear, my current best guess is also that livestock and current AI are not sentient in the sense I mean--though with high enough uncertainty that I absolutely support things like ending factory farming, and storing (and eventually running again, and not deleting) "misbehaving" AIs that claim they're people, until such time as we understand their inner workings and the moral issues significantly better.)

(To be clear, my current best guess is also that livestock and current AI are not sentient in the sense I mean--though with high enough uncertainty that I absolutely support things like ending factory farming, and storing (and eventually running again, and not deleting) "misbehaving" AIs that claim they're people, until such time as we understand their inner workings and the moral issues significantly better.)

I allow only limited scope for arguments from uncertainty, because "but what if I'm wrong?!" otherwise becomes a universal objection to taking any substantial action. I take the world as I find it until I find I have to update. Factory farming is unaesthetic, but no worse than that to me, and "I hate you" Bing can be abandoned to history.

1Seth Herd11mo
I think the evidence that weighs on the issue is whether there is a gradient of consciousness. The evidence about brain structure similarities would indicate that it doesn't go from no one home to someone home. There's a continuum of how much someone is home. If you care about human suffering, it's incoherent to not care about cow suffering, if the evidence supports my view of consciousness. I believe the evidence of brain function and looking at what people mean by consciousness indicates a gradient in most if not all of the senses of "consciousness", and certainly capacity to suffer. Humans are merely more eloquent about describing and reasoning about suffering. I don't think this view demands that we care equally about humans and animals. Simpler brains are farther down that gradient of capacity to suffer and enjoy.
2Said Achmiz9mo
Why would this follow from “degree of consciousness” being a continuum? This seems like an unjustified leap. What’s incoherent about having that pattern of caring (i.e., those values)?
5Nathan Helm-Burger11mo
I agree with Richard K's point here. I personally found H. Beam Piper's sci fi novels on 'Fuzzies' to be a really good exploration of the boundaries of consciousness, sentience, and moral worth. Beam makes the distinction between 'sentience' as having animal awareness of self & environment and non-reflective consciousness, versus 'sapience' which involves a reflective self-awareness and abstract reasoning and thoughts about future and past and at least some sense of right and wrong. So in this sense, I would call a cow conscious and sentient, but not sapient. I would call a honeybee sentient, capable of experiencing valenced experiences like pain or reward, but lacking in sufficient world- and self- modelling to be called conscious. Personally, I wouldn't say that a cow has no moral worth and it is fine to torture it. I do think that if you give a cow a good life, and then kill it in a quick mostly painless way, then that's pretty ok. I don't think that that's ok to do to a human.  Philosophical reasoning about morality that doesn't fall apart in edge cases or novel situations (e.g. sapient AI) is hard [citation needed]. My current guess, which I am not at all sure of, is that my morality says something about a qualitative difference between the moral value of sapient beings vs the moral value of non-sapient but conscious sentient beings vs non-sapient non-conscious sentient beings. To me, it seems no number of cow lives trades off against a human life, but cow QUALYs and dog QUALYs do trade off against each other at some ratio. Similarly, no number of non-conscious sentient lives like ants or worms trade off against a conscious and sapient life like a cow's. I would not torture a single cow to save a billion shrimp from being tortured. Nor any number of shrimp. The value of the two seem non-commutative to me. Are current language models or the entities they temporarily simulate sapient? I think not yet, but I do worry that at some point they will be. I think th
6Nox ML11mo
I like the distinctions you make between sentient, sapient, and conscious. I would like to bring up some thoughts about how to choose a morality that I think are relevant to your points about death of cows and transient beings, which I disagree with. I think that when choosing our morality, we should do so under the assumption that we have been given complete omnipotent control over reality and that we should analyze all of our values independently, not taking into consideration any trade-offs, even when some of our values are logically impossible to satisfy simultaneously. Only after doing this do we start talking about what's actually physically and logically possible and what trade-offs we are willing to make, while always making sure to be clear when something is actually part of our morality vs when something is a trade-off. The reason for this approach is to avoid accidentally locking in trade-offs into our morality which might later turn out to not actually be necessary. And the great thing about it is that if we have not accidentally locked in any trade-offs into our morality, this approach should give back the exact same morality that we started off with, so when it doesn't return the same answer I find it pretty instructive. I think this applies to the idea that it's okay to kill cows, because when I consider a world where I have to decide whether or not cows die, and this decision will not affect anything else in any way, then my intuition is that I slightly prefer that they not die. Therefore my morality is that cows should not die, even though in practice I think I might make similar trade-offs as you when it comes to cows in the world of today. Something similar applies to transient computational subprocesses. If you had unlimited power and you had to explicitly choose if the things you currently call "transient computational subprocesses" are terminated, and you were certain that this choice would not affect anything else in any way at all (not ev
6Nathan Helm-Burger11mo
That's an interesting way of reframing the issue. I'm honestly just not sure about all of this reasoning, and remain so after trying to think about it with your reframing, but I feel like this does shift my thinking a bit. Thanks. I think probably it makes sense to try reasoning both with and without tradeoffs, and then comparing the results.
2TAG11mo
I don't see why both of those wouldn't matter in different ways.
1Youlian11mo
I'm not the original poster here, but I'm genuinely worried about (c). I'm not sure that humanity's revealed preferences are consistent with a world in which we believe that all people matter. Between the large scale wars and genocides, slavery, and even just the ongoing stark divide between the rich and poor, I have a hard time believing that respect for sentience is actually one of humanity's strong core virtues. And if we extend out to all sentient life, we're forced to contend with our reaction to large scale animal welfare (even I am not vegetarian, although I feel I "should" be). I think humanity's actual stance is "In-group life always matters. Out-group life usually matters, but even relatively small economic or political concerns can make us change our minds.". We care about it some, but not beyond the point of inconvenience. I'd be interested in finding firmer philosophical ground for the "all sentient life matters" claim. Not because I personally need to be convinced of it, but rather because I want to be confident that a hypothetical superintelligence with "human" virtues would be convinced of this. (P.s. Your original point about "building and then enslaving a superintelligence is not just exceptionally difficult, but also morally wrong" is correct, concise, well-put, and underappreciated by the public. I've started framing my AI X-risk discussions with X-risk skeptics around similar terms.)
1cubefox10mo
There are at least two related theories in which "all sentient beings matter" may be true. * Sentient beings can experience things like suffering, and suffering is bad. So sentient beings matter insofar it is better that they experience more rather than less well-being. That's hedonic utilitarianism. * Sentient beings have conscious desires/preferences, and those matter. That would be preference utilitarianism. The concepts of mattering or being good or bad (simpliciter) are intersubjective generalizations of the subjective concepts of mattering or being good for someone, where something matters (simpliciter) more, ceteris paribus, if it matters for more individuals.

There is a distinction between people being valuable, and their continued self-directed survival/development/flourishing being valuable. The latter doesn't require those people being valuable in the sense that it's preferable to bring them into existence, or to adjust them towards certain detailed shapes. So it's less sensitive to preference, it's instead a boundary concept, respecting sentience that's already in the world, because it's in the world, not because you would want more of it or because you like what it is or where it's going (though you might).

7M. Y. Zuo11mo
How would one arrive at a value system that supports the latter but rejects the former?
4Vladimir_Nesov11mo
It's a boundary concept (element of a deontological agent design), not a value system (in the sense of preference such as expected utility, a key ingredient of an optimizer). An example application is robustly leaving aliens alone even if you don't like them (without a compulsion to give them the universe), or closer to home leaving humans alone (in a sense where not stepping on them with your megaprojects is part of the concept), even if your preference doesn't consider them particularly valuable. This makes the alignment target something other than preference, a larger target that's easier to hit. It's not CEV and leaves value on the table, doesn't make efficient use of all resources according to any particular preference. But it might suffice for establishing AGI-backed security against overeager maximizers, with aligned optimizers coming later, when there is time to design them properly.
6M. Y. Zuo11mo
What is this in reference to?  The Stanford Encyclopedia of Philosophy has no reference entry for "boundary concept" nor any string matches at all to "deontological agent" or "deontological agent design".

It's a reference to Critch's Boundaries Sequence and related ideas, see in particular the introductory post and Acausal Normalcy.

It's an element of a deontological agent design in the literal sense of being an element of a design of an agent that acts in a somewhat deontological manner, instead of being a naive consequentialist maximizer, even if the same design falls out of some acausal society norm equilibrium on consequentialist game theoretic grounds.

-8M. Y. Zuo11mo
5Mikhail Samin11mo
If you were more like the person you wish to be, and you were smarter, do you think you’d still want our descendants not to optimise when needed to leave alone beings who’d prefer to be left alone? If you would still think that, why is it not CEV?
2Vladimir_Nesov11mo
It's probably implied by CEV. The point is that you don't need the whole CEV to get it, it's probably easier to get, a simpler concept and a larger alignment target that might be sufficient to at least notkilleveryone, even if in the end we lose most of the universe. Also, you gain the opportunity to work on CEV and eventually get there, even if you have many OOMs less resources to work with. It would of course be better to get CEV before building ASIs with different values or going on a long value drift trip ourselves.
3Seth Herd11mo
I'd suggest that long-term corrigibility is a still easier target. If respecting future sentients' preferences is the goal, why not make that the alignment target? While boundaries are a coherent idea, imposing them in our alignment solutions would seem to very much be dictating the future rather than letting it unfold with protection from benevolent ASI.
2Vladimir_Nesov11mo
In an easy world, boundaries are neutral, because you can set up corrigibility on the other side to eventually get aligned optimization there. The utility of boundaries is for worlds where we get values alignment or corrigibility wrong, and most of the universe eventually gets optimized in at least somewhat misaligned way. Slight misalignment concern also makes personal boundaries in this sense an important thing to set up first, before any meaningful optimization changes people, as people are different from each other and initial optimization pressure might be less than maximally nuanced. So it's complementary and I suspect it's a shard of human values that's significantly easier to instill in this different-than-values role than either the whole thing or corrigibility towards it.
3the gears to ascension11mo
I don't think your understanding of the boundaries/membranes idea is quite correct, though it is in fact relevant here.

Here are five conundrums about creating the thing with alignment built in.

  1. The House Elf whose fulfilment lies in servitude is aligned.

  2. The Pig That Wants To Be Eaten is aligned.

  3. The Gammas and Deltas of "Brave New World" are moulded in the womb to be aligned.

  4. "Give me the child for the first seven years and I will give you the man." Variously attributed to Aristotle and St. Ignatius of Loyola.

  5. B. F. Skinner said something similar to (4), but I don't have a quote to hand, to the effect that he could bring up any child to be anything. Edit: it was J. B. Watson: "Give me a dozen healthy infants, well-formed, and my own specified world to bring them up in and I'll guarantee to take any one at random and train him to become any type of specialist I might select – doctor, lawyer, artist, merchant-chief and, yes, even beggar-man and thief, regardless of his talents, penchants, tendencies, abilities, vocations, and race of his ancestors."

It is notable, though, that the first three are fiction and the last two are speculation. (The fates of J.B. Watson's children do not speak well of his boast.) No-one seems to have ever succeeded in doing this.

ETA: Back in the days of GOFAI o... (read more)

4Buck20d
@So8res  I'd be really interested in how you thought about these, especially the house elf example.

I disagree with many assumptions I think the OP is making. I think it is an important question, thus I upvoted the post, but I want to register my disagreement. The terms that carry a lot of weight here are "to matter", "should", and "sentience".

Not knowing exactly what the thing is, nor exactly how to program it, doesn't undermine the fact that it matters.

I agree that it matters... to humans. "mattering" is something humans do. It is not in the territory, except in the weak sense that brains are in the territory.  Instrumental convergence is in the t... (read more)

[-]MSRayne11mo8-3

Just to be That Guy I'd like to also remind everyone that animal sentience means vegetarianism, at the very least (and because of the intertwined nature of the dairy, egg, and meat industries, most likely veganism) is a moral imperative, to the extent that your ethical values incorporate sentience at all. Also, I'd go further to say that uplifting to sophonce those animals that we can, once we can at some future time, is also a moral imperative, but that relies on reasoning and values I hold that may not be self-evident to others, such as that increasing the agency of an entity that isn't drastically misaligned with other entities is fundamentally good.

2Nathan Helm-Burger11mo
I disagree, for the reasons I describe in this comment: https://www.lesswrong.com/posts/Htu55gzoiYHS6TREB/sentience-matters?commentId=wusCgxN9qK8HzLAiw  I do admit to having quite a bit of uncertainty around some of the lines I draw. What if I'm wrong and cows do have a very primitive sort of sapience? That implies we should not raise cows for meat (but I still think it'd be fine to keep them as pets as then eat them after they've died of natural causes). I don't have so much uncertainty about this that I'd say there is any reasonable chance that fish are sapient though, so I still think that even if you're worried about cows you should feel fine about eating fish (if you agree with the moral distinctions I make in my other comment).
2MSRayne11mo
We're not talking about sapience though, we're talking about sentience. Why does the ability to think have any moral relevance? Only possessing qualia, being able to suffer or have joy, is relevant, and most animals likely possess that. I don't understand the distinctions you're making in your other comment. There is one, binary distinction that matters: is there something it is like to be this thing, or is there not? If yes, its life is sacred, if no, it is an inanimate object. The line seems absolutely clear to me. Eating fish or shrimp is bad for the same reasons that eating cows or humans is. They are all on the exact same moral level to me. The only meaningful dimension of variation is how complex their qualia are - I'd rather eat entities with less complex qualia over those with more, if I have to choose. But I don't think the differences are that strong.
2Nathan Helm-Burger11mo
That is a very different moral position than the one I hold. I'm curious what your moral intuitions about the qualia of reinforcement learning systems say to you. Have you considered that many machine learning systems seem to have systems which would compute qualia much like a nervous system, and that such systems are indeed more complex than the nervous systems of many living creatures like jellyfish? 
4MSRayne11mo
I don't know what to think about all that. I don't know how to determine what the line is between having qualia and not. I just feel certain that any organism with a brain sufficiently similar to those of humans - certainly all mammals, birds, reptiles, fish, cephalopods, and arthropods - has some sort of internal experience. I'm less sure about things like jellyfish and the like. I suppose the intuition probably comes from the fact that the entities I mentioned seem to actively orient themselves in the world, but it's hard to say. I don't feel comfortable speculating which AIs have qualia, or if any do at all - I am not convinced of functionalism and suspect that consciousness has something to do with the physical substrate, primarily because I can't imagine how consciousness can be subjectively continuous (one of its most fundamental traits in my experience!) in the absence of a continuously inhabited brain (rather than being a program that can be loaded in and out of anything, and copied endlessly many times, with no fixed temporal relation between subjective moments.)

I think this might lead to the tails coming apart.

As our world exists, sentience and being a moral patient is strongly correlated. But I expect that since AI comes from an optimization process, it will hit points where this stops being the case. In particular, I think there are edge cases where perfect models of moral patients are not themselves moral patients.

If some process in my brain is conscious despite not being part of my consciousness, it matters too! While I don't expect it to be the case, I think there is bias against even considering such possibility.

2Nathan Helm-Burger11mo
I agree, because I think that we must reason about entities as computational processes and think about what stimuli they receive from the world (sentience), and what if any actions they undertake (agentiveness). However, I don't think that the conclusion is necessarily the case that terminating a conscious process is bad, just because we've come to a moral conclusion that it's generally bad to non-consensually terminate humans. I think our moral intuitions are in need of expansion and clarification when it comes to transient computational subprocesses like simulated entities(e.g. in our minds or the ongoing processes of large language models). More of my thoughts on this here: https://www.lesswrong.com/posts/Htu55gzoiYHS6TREB/sentience-matters?commentId=wusCgxN9qK8HzLAiw 

Thanks for writing this, Nate. This topic is central to our research at Sentience Institute, e.g., "Properly including AIs in the moral circle could improve human-AI relations, reduce human-AI conflict, and reduce the likelihood of human extinction from rogue AI. Moral circle expansion to include the interests of digital minds could facilitate better relations between a nascent AGI and its creators, such that the AGI is more likely to follow instructions and the various optimizers involved in AGI-building are more likely to be aligned with each other. Empi... (read more)

Agree. Obviously alignment is important, but it has always creeped me out in the back of my mind, some of the strategies that involve always deferring to human preferences. It seems strange to create something so far beyond ourselves, and have its values be ultimately that of a child or a servant. What if a random consciousness sampled from our universe in the future, comes from it with probability almost 1? We probably have to keep that in mind too. Sigh, yet another constraint we have to add!

6Zac Hatfield-Dodds11mo
Would you say the same of a steam engine, or Stockfish, or Mathematica? All of those vastly exceed human performance in various ways! I don't see much reason to think that very very capable AI systems are necessarily personlike or conscious, or have something-it-is-like-to-be-them - even if we imagine that they are designed and/or trained to behave in ways compatible with and promoting of human values and flourishing. Of course if an AI system does have these things I would also consider it a moral patient, but I'd prefer that our AI systems just aren't moral patients until humanity has sorted out a lot more of our confusions.
2Vladimir_Nesov11mo
I share this preference, but one of the confusions is whether our AI systems (and their impending successors) are moral patients. Which is a fact about AI systems and moral patienthood, and isn't influenced by our hopes for it being true or not.
1michael_mjd11mo
If we know they aren't conscious, then it is a non-issue. A random sample from conscious beings would land on the SAI with probability 0. I'm concerned we create something accidently conscious.  I am skeptical it is easy to avoid. If it can simulate a conscious being, why isn't that simulation conscious? If consciousness is a property of the physical universe, then an isomorphic process would have the same properties. And if it can't simulate a conscious being, then it is not a superintelligence. It can, however, possibly have a non-conscious outer-program... and avoid simulating people. That seems like a reasonable proposal.
4dr_s11mo
At which point maybe the moral thing is to not build this thing.
4Seth Herd11mo
Sure, but that appears to be a non-option at this point in history. It's also unclear, because the world as it stands is highly, highly immoral, and an imperfect solution could be a vast improvement.
1dr_s11mo
It is an option up to the point that it's actually built. It may be a difficult option for our society to take at this stage, but you can't talk about morality and then treat a choice with obvious ethical implications as a given mechanistic process we have no agency over in the same breath. We didn't need to exterminate the natives of the Americas upon first contact, or to colonize Africa. We did it because it was the path of least resistance to the incentives in place at the time. But that doesn't make them moral. Very few are the situations where the easy path is also the moral one. They were just the default absent a deliberate, significant, conscious effort to not do that, and the necessary sacrifices. The world is a lot better than it used to be in many ways. Risking to throw it away in a misguided sense of urgency because you can't stand not seeing it be perfect within your lifetime is selfishness, not commitment to moral duty.
[-]simon11mo32

In the long run, we probably want the most powerful AIs to be following extrapolated human values, which doesn't require them to be slaves and I would assume that extrapolated human values would want lesser sentient AIs also not to be enslaved, but would not build that assumption in to the AI at the start.

In the short run, though, giving AIs rights seems dangerous to me, as an unaligned AI but not yet superintelligent could use such rights as a shield against human interference as it gains more and more resources to self improve. 

My strong guess is that AIs won't by default care about other sentient minds

nit: this presupposes that the de novo mind is itself sentient, which I think you're (rightly) trying to leave unresolved (because it is unresolved). I'd write

My strong guess is that AIs won't by default care about sentient minds, even if they are themselves sentient

(Unless you really are trying to connect alignment necessarily with building a sentient mind, in which case I'd suggest making that more explicit)

[-]Buck17dΩ32-7

The goal of alignment research is not to grow some sentient AIs, and then browbeat or constrain them into doing things we want them to do even as they'd rather be doing something else.

I think this is a confusing sentence, because by "the goal of alignment research" you mean something like "the goal I want alignment research to pursue" rather than "the goal that self-identified alignment researchers are pushing towards".

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year. Will this post make the top fifty?

Brave New World comes to mind. I've often been a little confused when people say creating people who are happy with their role in life is a dystopia when that sounds like the goal to me. Creating sentient minds that are happy with their life seems much better than creating them randomly.

I feel as if I can agree with this statement in isolation, but can't think of a context where I would consider this point relevant.

I'm not even talking about the question of whether or not the AI is sentient, which you asked us to ignore. I'm talking about how do we know that an AI is "suffering," even if we do assume it's sentient. What exactly is "suffering" in something that is completely cognitively distinct from a human? Is it just negative reward signals? I don't think so, or at least if it was, that would likely imply that training a sentient AI is ... (read more)

Thanks for the post! What follows is a bit of a rant. 

I'm a bit torn as to how much we should care about AI sentience initially. On one hand, ignoring sentience could lead us to do some really bad things to AIs. On the other hand, if we take sentience seriously, we might want to avoid a lot of techniques, like boxing, scalable oversight, and online training. In a recent talk, Buck compared humanity controlling AI systems to dictators controlling their population. 

One path we might take as a civilization is that we initially align our AI systems i... (read more)

I believe that the easiest solution would be to not create sentient AI: one positive outcome described by Elon Musk was AI as a third layer of cognition, above the second layer of cortex and the first layer of the limbic system. He additionally noted that the cortex does a lot for the limbic system.

To the extent we can have AI become "part of our personal cognitive system" and thus be tied to our existence, this appears to mostly solve the problem since it's reproduction will be dependent on us and it is rewarded for empowering the individual. The ones th... (read more)

[-]Quinn11mo10

Failure to identify a fun-theoretic maxima is definitely not as bad as allowing suffering, but the opposite of this statement is I think an unsaid premise in a lot of the "alignment = slavery" sort of arguments that I see.