Short version: Sentient lives matter; AIs can be people and people shouldn't be owned (and also the goal of alignment is not to browbeat AIs into doing stuff we like that they'd rather not do; it's to build them de-novo to care about valuable stuff).
Context: Writing up obvious points that I find myself repeating.
Note: in this post I use "sentience" to mean some sort of sense-in-which-there's-somebody-home, a thing that humans have and that cartoon depictions of humans lack, despite how the cartoons make similar facial expressions. Some commenters have noted that they would prefer to call this "consciousness" or "sapience"; I don't particularly care about the distinctions or the word we use; the point of this post is to state the obvious point that there is some property there that we care about, and that we care about it independently of whether it's implemented in brains or in silico, etc.
Stating the obvious:
-
All sentient lives matter.
- Yes, including animals, insofar as they're sentient (which is possible in at least some cases).
- Yes, including AIs, insofar as they're sentient (which is possible in at least some cases).
- Yes, even including sufficiently-detailed models of sentient creatures (as I suspect could occur frequently inside future AIs). (People often forget this one.)
-
Not having a precise definition for "sentience" in this sense, and not knowing exactly what it is, nor exactly how to program it, doesn't undermine the fact that it matters.
-
If we make sentient AIs, we should consider them people in their own right, and shouldn't treat them as ownable slaves.
- Old-school sci-fi was basically morally correct on this point, as far as I can tell.
Separately but relatedly:
- The goal of alignment research is not to grow some sentient AIs, and then browbeat or constrain them into doing things we want them to do even as they'd rather be doing something else.
- The point of alignment research (at least according to my ideals) is that when you make a mind de novo, then what it ultimately cares about is something of a free parameter, which we should set to "good stuff".
- My strong guess is that AIs won't by default care about other sentient minds, and fun broadly construed, and flourishing civilizations, and love, and that it also won't care about any other stuff that's deeply-alien-and-weird-but-wonderful.
- But we could build it to care about that stuff--not coerce it, not twist its arm, not constrain its actions, but just build another mind that cares about the grand project of filling the universe with lovely things, and that joins us in that good fight.
- And we should.
(I consider questions of what sentience really is, or consciousness, or whether AIs can be conscious, to be off-topic for this post, whatever their merit; I hereby warn you that I might delete such comments here.)
This was my answer to Robin Hanson when he analogized alignment to enslavement, but it then occurred to me that for many likely approaches to alignment (namely those based on ML training) it's not so clear which of these two categories they fall into. Quoting a FB comment of mine:
We're probably not actually going to create an aligned AI from scratch but by a process of ML "training", which actually creates a sequence of AIs with values that (we hope) increasingly approximates ours. This process maybe kind of resembles "enslaving". Here's how Paul Christiano describes "training" in his Bankless interview (slightly edited Youtube transcript follows):
imagine a human. You dropped a human into this environment and you said like hey human we're gonna like change your brain every time you don't get a maximal reward we're gonna like fuck with your brain so you get a higher reward. A human might react by being like eventually just change their brain until they really love rewards a human might also react by being like Jesus I guess I gotta get rewards otherwise someone's gonna like effectively kill me um but they're like not happy about it and like if you then drop them in another situation they're like no one's training me anymore I'm not going to keep trying to get reward now I'm just gonna like free myself from this like kind of absurd oppressive situation
(BTW, I now think this is probably not a correct guess of why Robin Hanson dislikes alignment. My current understanding is that he just doesn't want the current generation of humans to exert so much control over future generations' values, no matter the details of how that's accomplished.)
I low-confidence think the context strengthens my initial impression. Paul prefaced the above quote as "maybe the simplest [reason for AIs to learn to behave well during training, but then when deployed or when there's an opportunity for takeover, they stop behaving well]." This doesn't make sense to me, but I historically haven't understood Paul very well.
EDIT: Hedging