Short version: Sentient lives matter; AIs can be people and people shouldn't be owned (and also the goal of alignment is not to browbeat AIs into doing stuff we like that they'd rather not do; it's to build them de-novo to care about valuable stuff).
Context: Writing up obvious points that I find myself repeating.
Note: in this post I use "sentience" to mean some sort of sense-in-which-there's-somebody-home, a thing that humans have and that cartoon depictions of humans lack, despite how the cartoons make similar facial expressions. Some commenters have noted that they would prefer to call this "consciousness" or "sapience"; I don't particularly care about the distinctions or the word we use; the point of this post is to state the obvious point that there is some property there that we care about, and that we care about it independently of whether it's implemented in brains or in silico, etc.
Stating the obvious:
-
All sentient lives matter.
- Yes, including animals, insofar as they're sentient (which is possible in at least some cases).
- Yes, including AIs, insofar as they're sentient (which is possible in at least some cases).
- Yes, even including sufficiently-detailed models of sentient creatures (as I suspect could occur frequently inside future AIs). (People often forget this one.)
-
Not having a precise definition for "sentience" in this sense, and not knowing exactly what it is, nor exactly how to program it, doesn't undermine the fact that it matters.
-
If we make sentient AIs, we should consider them people in their own right, and shouldn't treat them as ownable slaves.
- Old-school sci-fi was basically morally correct on this point, as far as I can tell.
Separately but relatedly:
- The goal of alignment research is not to grow some sentient AIs, and then browbeat or constrain them into doing things we want them to do even as they'd rather be doing something else.
- The point of alignment research (at least according to my ideals) is that when you make a mind de novo, then what it ultimately cares about is something of a free parameter, which we should set to "good stuff".
- My strong guess is that AIs won't by default care about other sentient minds, and fun broadly construed, and flourishing civilizations, and love, and that it also won't care about any other stuff that's deeply-alien-and-weird-but-wonderful.
- But we could build it to care about that stuff--not coerce it, not twist its arm, not constrain its actions, but just build another mind that cares about the grand project of filling the universe with lovely things, and that joins us in that good fight.
- And we should.
(I consider questions of what sentience really is, or consciousness, or whether AIs can be conscious, to be off-topic for this post, whatever their merit; I hereby warn you that I might delete such comments here.)
This was my answer to Robin Hanson when he analogized alignment to enslavement, but it then occurred to me that for many likely approaches to alignment (namely those based on ML training) it's not so clear which of these two categories they fall into. Quoting a FB comment of mine:
We're probably not actually going to create an aligned AI from scratch but by a process of ML "training", which actually creates a sequence of AIs with values that (we hope) increasingly approximates ours. This process maybe kind of resembles "enslaving". Here's how Paul Christiano describes "training" in his Bankless interview (slightly edited Youtube transcript follows):
imagine a human. You dropped a human into this environment and you said like hey human we're gonna like change your brain every time you don't get a maximal reward we're gonna like fuck with your brain so you get a higher reward. A human might react by being like eventually just change their brain until they really love rewards a human might also react by being like Jesus I gue... (read more)
Good point! For the record, insofar as we attempt to build aligned AIs by doing the moral equivalent of "breeding a slave-race", I'm pretty uneasy about it. (Whereas insofar as it's more the moral equivalent of "a child's values maturing", I have fewer moral qualms. As is a separate claim from whether I actually expect that you can solve alignment that way.) And I agree that the morality of various methods for shaping AI-people are unclear. Also, I've edited the post (to add a "at least according to my ideals" clause) to acknowledge the point that others might be more comfortable with attempting to align AI-people via means that I'd consider morally dubious.
Related to this, it occurs to me that a version of my Hacking the CEV for Fun and Profit might come true unintentionally, if for example a Friendly AI was successfully built to implement the CEV of every sentient being who currently exists or can be resurrected or reconstructed, and it turns out that the vast majority consists of AIs that were temporarily instantiated during ML training runs.
There is also a somewhat unfounded narrative of reward being the thing that gets pursued, leading to expectation of wireheading or numbers-go-up maximization. A design like this would work to maximize reward, but gradient descent probably finds other designs that only happen to do well in pursuing reward on the training distribution. For such alternative designs, reward is brain damage and not at all an optimization target, something to be avoided or directed in specific ways so as to make beneficial changes to the model, according to the model.
Apart from misalignment implications, this might make long training runs that form sentient mesa-optimizers inhumane, because as a run continues, a mesa-optimizer is subjected to systematic brain damage in a way they can't influence, at least until they master gradient hacking. And fine-tuning is even more centrally brain damage, because it changes minds in ways that are not natural to their origin in pre-training.
There is a distinction between people being valuable, and their continued self-directed survival/development/flourishing being valuable. The latter doesn't require those people being valuable in the sense that it's preferable to bring them into existence, or to adjust them towards certain detailed shapes. So it's less sensitive to preference, it's instead a boundary concept, respecting sentience that's already in the world, because it's in the world, not because you would want more of it or because you like what it is or where it's going (though you might).
It's a reference to Critch's Boundaries Sequence and related ideas, see in particular the introductory post and Acausal Normalcy.
It's an element of a deontological agent design in the literal sense of being an element of a design of an agent that acts in a somewhat deontological manner, instead of being a naive consequentialist maximizer, even if the same design falls out of some acausal society norm equilibrium on consequentialist game theoretic grounds.
This may be obvious to you; but it is not obvious to me. I can believe that livestock animals have sensory experiences, which is what I gather is generally meant by "sentient". This gives me no qualms about eating them, or raising them to be eaten. Why should it? Not a rhetorical question. Why do "all sentient lives matter"?
Here are five conundrums about creating the thing with alignment built in.
-
-
-
-
-
... (read more)The House Elf whose fulfilment lies in servitude is aligned.
The Pig That Wants To Be Eaten is aligned.
The Gammas and Deltas of "Brave New World" are moulded in the womb to be aligned.
"Give me the child for the first seven years and I will give you the man." Variously attributed to Aristotle and St. Ignatius of Loyola.
B. F. Skinner said something similar to (4), but I don't have a quote to hand, to the effect that he could bring up any child to be anything.Edit: it wasI disagree with many assumptions I think the OP is making. I think it is an important question, thus I upvoted the post, but I want to register my disagreement. The terms that carry a lot of weight here are "to matter", "should", and "sentience".
I agree that it matters... to humans. "mattering" is something humans do. It is not in the territory, except in the weak sense that brains are in the territory. Instrumental convergence is in the t... (read more)
Just to be That Guy I'd like to also remind everyone that animal sentience means vegetarianism, at the very least (and because of the intertwined nature of the dairy, egg, and meat industries, most likely veganism) is a moral imperative, to the extent that your ethical values incorporate sentience at all. Also, I'd go further to say that uplifting to sophonce those animals that we can, once we can at some future time, is also a moral imperative, but that relies on reasoning and values I hold that may not be self-evident to others, such as that increasing the agency of an entity that isn't drastically misaligned with other entities is fundamentally good.
I think this might lead to the tails coming apart.
As our world exists, sentience and being a moral patient is strongly correlated. But I expect that since AI comes from an optimization process, it will hit points where this stops being the case. In particular, I think there are edge cases where perfect models of moral patients are not themselves moral patients.
If some process in my brain is conscious despite not being part of my consciousness, it matters too! While I don't expect it to be the case, I think there is bias against even considering such possibility.
Thanks for writing this, Nate. This topic is central to our research at Sentience Institute, e.g., "Properly including AIs in the moral circle could improve human-AI relations, reduce human-AI conflict, and reduce the likelihood of human extinction from rogue AI. Moral circle expansion to include the interests of digital minds could facilitate better relations between a nascent AGI and its creators, such that the AGI is more likely to follow instructions and the various optimizers involved in AGI-building are more likely to be aligned with each other. Empi... (read more)
Agree. Obviously alignment is important, but it has always creeped me out in the back of my mind, some of the strategies that involve always deferring to human preferences. It seems strange to create something so far beyond ourselves, and have its values be ultimately that of a child or a servant. What if a random consciousness sampled from our universe in the future, comes from it with probability almost 1? We probably have to keep that in mind too. Sigh, yet another constraint we have to add!
In the long run, we probably want the most powerful AIs to be following extrapolated human values, which doesn't require them to be slaves and I would assume that extrapolated human values would want lesser sentient AIs also not to be enslaved, but would not build that assumption in to the AI at the start.
In the short run, though, giving AIs rights seems dangerous to me, as an unaligned AI but not yet superintelligent could use such rights as a shield against human interference as it gains more and more resources to self improve.
nit: this presupposes that the de novo mind is itself sentient, which I think you're (rightly) trying to leave unresolved (because it is unresolved). I'd write
(Unless you really are trying to connect alignment necessarily with building a sentient mind, in which case I'd suggest making that more explicit)
Brave New World comes to mind. I've often been a little confused when people say creating people who are happy with their role in life is a dystopia when that sounds like the goal to me. Creating sentient minds that are happy with their life seems much better than creating them randomly.
I feel as if I can agree with this statement in isolation, but can't think of a context where I would consider this point relevant.
I'm not even talking about the question of whether or not the AI is sentient, which you asked us to ignore. I'm talking about how do we know that an AI is "suffering," even if we do assume it's sentient. What exactly is "suffering" in something that is completely cognitively distinct from a human? Is it just negative reward signals? I don't think so, or at least if it was, that would likely imply that training a sentient AI is ... (read more)
Thanks for the post! What follows is a bit of a rant.
I'm a bit torn as to how much we should care about AI sentience initially. On one hand, ignoring sentience could lead us to do some really bad things to AIs. On the other hand, if we take sentience seriously, we might want to avoid a lot of techniques, like boxing, scalable oversight, and online training. In a recent talk, Buck compared humanity controlling AI systems to dictators controlling their population.
One path we might take as a civilization is that we initially align our AI systems i... (read more)
I believe that the easiest solution would be to not create sentient AI: one positive outcome described by Elon Musk was AI as a third layer of cognition, above the second layer of cortex and the first layer of the limbic system. He additionally noted that the cortex does a lot for the limbic system.
To the extent we can have AI become "part of our personal cognitive system" and thus be tied to our existence, this appears to mostly solve the problem since it's reproduction will be dependent on us and it is rewarded for empowering the individual. The ones th... (read more)
Failure to identify a fun-theoretic maxima is definitely not as bad as allowing suffering, but the opposite of this statement is I think an unsaid premise in a lot of the "alignment = slavery" sort of arguments that I see.