Summary

I’ll walk through my take on how consciousness works (basically endorsing “illusionism”) and why I think computer programs can be conscious, and how that relates to moral and ethical obligations towards conscious entities in general and AIs in particular.

Up to here, everything I say will be pretty popular (albeit far from universal) opinions within the rationality / EA communities, from what I gather.

…But then the last section is my (I think) unusual-in-this-community take that we’re probably going to get conscious AGIs, even if we don’t want them and are trying to avoid them. I will argue that human-like consciousness is pretty closely entangled with the powerful capabilities that we need for a system to do things like autonomously invent technologies and solve hard problems.

Suggested audience: People who are curious about what I think about this topic, and/or want to talk about it with me, for whatever reason, despite the fact that I’m really not up on the literature compared to other people.

(If you’re just generally interested in AGI consciousness / sentience, a better starting point would probably be any of the articles here.)

Quick points before starting

  • I think the Blake Lemoine thing was very stupid. (I’ll explain why below.) I hope that people will not permanently associate “thinking about AGI sentience” with “that stupid Blake Lemoine thing”.
  • I feel sorely tempted to say “Don’t we have our hands full thinking about the impact of AGI on humans?” But on reflection, I actually don’t want to say that. AGI sentience is an important topic and we should be thinking about it.
  • This post will NOT discuss what to do about AGI sentience right now (if anything). That’s a complicated question, tied up with every other aspect of AGI deployment. For example, maybe the AGIs will wind up in charge, in which case we hardly need to be advocating for their well-being—they can take care of themselves! Anyway, I’ll talk about that kind of stuff in a different upcoming post, hopefully.

My take on the philosophy of consciousness / sentience, in a nutshell

(Side note: I haven’t read much philosophy of consciousness, sorry. I’m open to discussion.)

As background: when I say (sincerely) “I’m looking at my wristwatch, and it says it’s 8:00”, then somewhere in the chain of causation that led to me saying that, is my actual physical wristwatch (which reflects photons towards my eyes, which in turn stimulate my retinal photoreceptors, etc.). After all, if the actual wristwatch had nothing to do with that chain of causation, then how would I be able to accurately describe it??

By the same token, when I say (sincerely) “I’m conscious right now, let me describe the qualia…”, then somewhere in the chain of causation that led to me saying that, are “consciousness” and “qualia”, whatever those are. After all, if consciousness and qualia had nothing to do with that chain of causation, then how would I be able to accurately describe it??

I feel like I already understand, reasonably well, the chain of causation in my brain that leads to me saying the thing in the previous paragraph, i.e. “I’m conscious right now, let me describe the qualia…” See my Book Review: Rethinking Consciousness.

And it turns out that there is nothing whatsoever in that chain of causation that looks like what we intuitively expect consciousness and qualia to look like.

Therefore, I need to conclude that either consciousness and qualia don’t exist, or that consciousness and qualia exist, but that they are not the ontologically fundamental parts of reality that they intuitively seem to be. (These two options might not be that different—maybe it’s just terminology?)

As I understand it, here I’m endorsing the “illusionism” perspective, as advocated (for example) by Keith Frankish, Dan Dennett, and Michael Graziano.

Next, if a computer chip is running similar algorithms as a human philosopher, expressing a similar chain of causation, that leads to that chip emitting similar descriptions of consciousness and qualia as human philosophers emit, for similar underlying reasons, then I think we have to say that whatever consciousness and qualia are (if anything), this computer chip has those things just as much as the human does.

(Side note: Transformer-based self-supervised language models like GPT-3 can emit human-sounding descriptions of consciousness, but (I claim) they emit those descriptions for very different underlying reasons than brains do—i.e., as a result of a very different chain of causation / algorithm. So, pace Blake Lemoine, I see those outputs as providing essentially no evidence one way or the other on whether today’s language models are sentient / conscious.)

How does that feed into morality?

(Side note: I haven’t read much philosophy of morality / ethics, sorry. I’m open to discussion.)

The fact that consciousness and suffering are not ontologically fundamental parts of reality is, umm, weird. (I believe illusionism, as above, but I do not grok illusionism.) If anything, thinking hard about the underlying nature of consciousness and suffering kinda tempts me towards nihilism!

However, nihilism is not decision-relevant. Imagine being a nihilist, deciding whether to spend your free time trying to bring about an awesome post-AGI utopia, vs sitting on the couch and watching TV. Well, if you're a nihilist, then the awesome post-AGI utopia doesn't matter. But watching TV doesn't matter either. Watching TV entails less exertion of effort. But that doesn't matter either. Watching TV is more fun (umm, for some people). But having fun doesn't matter either. There's no reason to throw yourself at a difficult project. There's no reason NOT to throw yourself at a difficult project! So nihilism is just not a helpful decision criterion!! What else is there?

I propose a different starting point—what I call Dentin’s prayerWhy do I exist? Because the universe happens to be set up this way. Why do I care (about anything or everything)? Simply because my genetics, atoms, molecules, and processing architecture are set up in a way that happens to care.

If it’s about caring, well, I care about people, and I care about not behaving in a way that I’ll later regret (cf. future-proof ethics), which also entails caring about intellectual consistency, among other things.

So we wind up at plain old normal ethical and moral reasoning, where we think about things, probe our intuitions by invoking analogies and hypotheticals, etc.

When I do that, I wind up feeling pretty strongly that if an AGI can describe joy and suffering in a human-like way, thanks to human-like underlying algorithmic processes, then I ought to care about that AGI’s well-being.

For AIs that are very unlike humans and animals, I don’t really know how to think about them. Actually, even for nonhuman animals, I don’t really know how to think about them. Here’s where I’m at on that topic:

I vaguely think there are a couple ways to start with the phenomenon of human consciousness and extract “the core of the thing that I care about”, and I think that some of those possible “extracted cores” are present in (more or fewer) nonhuman animals, and that others are maybe even uniquely human. I don’t know which of the possible “extracted cores” is the one I should really care about. I would need to think about it more carefully before I have a solid opinion, let alone try to convince other people.

As of this writing, I think that I don’t care about the well-being of any AI that currently exists, at least among the ones that I’ve heard of. Indeed, for almost all existing AIs, I wouldn’t even know how to define its “well-being” in the first place! More specifically, I think that for any of the plausible choices of “the core of the thing that I care about”, today’s AIs don’t have it.

But this is a bit of a moot point for this post, because I mostly want to talk about powerful AGIs, and those systems will not (I believe) be very unlike humans and animals, as discussed next:

Why I expect AGIs to be sentient / conscious, whether we wanted that or not

I think I differ from most people in AGI safety / alignment in expecting that when we eventually figure out how to build powerful AGIs that can do things like invent technology and autonomously solve scientific research problems, it will turn out that those AGIs will be just as sentient / conscious / whatever as adult humans are. I don’t think this is a choice; I think we’re going to get that whether we want it or not. This is somewhat tied up in my expectation that future AGI will be brain-like AGI (basically, a version of model-based RL). I won’t try to fully convey my perspective in this post, but here are some relevant points:

First, one popular perspective in the community is that we can make AI that’s kinda like a “tool”, lacking not only self-awareness but also any form of agency, and that this is a path to safe & beneficial AGI. I agree with the first part—we can certainly make such AIs, and indeed we are already doing so. But I don’t know of any plausible permanent solution to the problem of people also trying to make more agent-y AGIs. For more on why I think that, see my discussion of “RL on thoughts” in Section 7.2 here. Anyway, these more agent-y AGIs are the ones that I want to talk about.

Second, we’re now switched over to the topic of agent-y AGIs. And an important question becomes: what aspects of the system are in the source code, versus in the learned weights of a trained model? Eliezer Yudkowsky has been at both extreme ends of this spectrum. In his older writings, he often talked about putting rational utility-maximization into the AGI source code, more or less. Whereas in his more recent writings, he often talks about the Risks From Learned Optimization model, wherein not only rationality but everything about the agent emerges inside the weights of a trained model.

By contrast, I’m right in the middle of those two extremes! In my expected development scenario (brain-like model-based RL), there are some aspects of agency and planning in the source code, but the agent by default winds up as a mess of mutually-contradictory preferences and inclinations and biases etc. I just don’t see how to avoid that, within a computationally-bounded agent. However, within that architecture, aspects of rational utility-maximization might (or might not) emerge via (meta)learning.

Why is that important? Because if the agent has to (meta)learn better and better strategies for things like brainstorming and learning and planning and understanding, I think this process entails the kind of self-reflection which comprises full-fledged self-aware human-like consciousness. In other words, coming up with better brainstorming strategies etc. entails the AGI introspecting on its own attention schema, and this same process from the AGI’s perspective would look like the AGI being aware of its own conscious experience. So the AGI would able to describe its consciousness in a human-like way for human-like reasons.

So I don’t even think the AGI would be in a gray area—I think it would be indisputably conscious, conscious according to any reasonable definition.

But I could be wrong! Happy to chat in the comments. :)

37

New Comment
38 comments, sorted by Click to highlight new comments since: Today at 8:01 AM

I'll chime in with the same thing I always say: the label "consciousness" is a convenient lumping-together of a bunch of different properties that co-occur in humans, but don't have to all go together a priori.

For example, the way I can do reasoning via subverbal arguments that I can later remember and elaborate if I attend to them. Or what it's like to remember things via associations and some vague indications of how long ago they were, rather than e.g. a discrete system of tags and timestamps. My sense of aversion to noxious stimuli, which can in turn be broken down into many subroutines.

Future AIs are simply going to have some of these properties, to some extent, but not all of them. Asking whether that's consciousness is somewhat useless - what matters is how much of a moral patient we want to treat them as, based on a more fine-grained consideration of their properties.

I agree with the general points and especially "It should be obvious by now that AGI will necessarily be brain-like (in the same ways that DL is brain-like but a bit moreso), and necessarily conscious in human-like ways, as that is simply what intelligence demands".

However, there is much interesting grey area when we start comparing the consciousness of humans with specific types of brain damage to current large transformer AI.

Transformers (even in their more agentic forms) are missing many components of the brain, but their largest deficit is the lack of strong recurrence and medium term memory capacity. Transformer LLMs like GPT3 have an equivalent conscious experience of essentially waking up from scratch, reading around a thousand tokens (in parallel), thinking about those for just a few hundred steps (equivalent to a dozen seconds or so of human thought), and then reset/repeat. They have reasonably extensive short term memory (attention), and very long term memory (standard weights), but not much in between, and they completely lack brain style RNN full recurrence.

In DL terminology brains are massive RNNs with a full spectrum of memory (roughly 10GB equiv in activations, and then a truly massive capacity of 10TB equiv or more in synapses that covers a full spectrum of timescales).

But some humans do have impairments to their medium term memory systems that is perhaps more comparable to LLM transformers - humans with missing/damaged hippocampus/EC regions like the infamous HM. Still conscious, but not in the same way.

Seems like consciousness is rather similar to "the Global Neuronal Workspace", and qualia are rather similar to "what's currently in the Workspace". Is there a reason to reject this way of thinking?

Hmm, my point here was that “your introspective model of your own consciousness” is more-or-less the same as “your internal model of your Global Neuronal Workspace”.

But I’m very hesitant to take the seemingly-obvious next step and say “therefore, your own consciousness is your Global Neuronal Workspace”.

The thing is, an internal model of X doesn’t have to have much in common with X:

  • In the moving-Mario optical illusion, there’s an internal model in which Mario is moving, but that’s not a veridical reflection of the thing that it’s nominally modeling—Mario is not in fact moving.
  • Another example (I think from Graziano’s book) is that we have an internal model of “pure whiteness”, but that’s not a veridical reflection of the thing that it’s nominally modeling, because actual white light is a mix of different colors, not “pure”.

I think the consciousness case is an extreme case of that. Out of the various things that people say when describing their own phenomenal consciousness, I think only a very small fraction could be taken to be a veridical description of aspects of their Global Neuronal Workspace.

And when you have features of an internal model of X that are not veridical reflections of features of the actual X, we call that an “illusion”.

(Another thing is: I also think that different people in different cultures can have rather different internal models of their Global Neuronal Workspace, cf. Buddhists rejecting “self” and Julian Jaynes claiming a massive cultural shift in self-models around 1500-500BC.)

Saying that qualia aren't veridical representations of the properties of external objects, doesnt make them nonveridical in the sense of a hallucination....or nonexistent.

Saying that qualia aren't veridical representations of the brain , doesnt make nonexistent either.

In fact, both claims strengthen the case for qualia. Them first claim is a rejection of naive realism, and naive realists don't need qualia.

It seems access consciousness is almost tautologically accounted for by global workspace, and other aspects or meanings of consciousness aren't addressed by it all.

I feel like I already understand, reasonably well, the chain of causation in my brain that leads to me saying the thing in the previous paragraph, i.e. “I’m conscious right now, let me describe the qualia…” See my Book Review: Rethinking Consciousness.

You only have evidence that understand a chain of causation. You don't have evidence that no alternative account is possible.

…And it turns out that there is nothing whatsoever in that chain of causation that looks like what we intuitively expect consciousness and qualia to look like.

If you look at a brain from the outside, its qualia aren't visible. Equally, if you look at your brain from the inside, you see nothing but qualia...you do not see neural activity as such.

And your internal view of causality is that your pains cause you ouches.

Therefore, I need to conclude that either consciousness and qualia don’t exist, or that consciousness and qualia exist, but that they are not the ontologically fundamental parts of reality that they intuitively seem to be.

I don't think qualia seem to be fundamental.

As I understand it, here I’m endorsing the “illusionism” perspective, as advocated (for example) by Keith Frankish, Dan Dennett, and Michael Graziano.

Illusionism is the claim that qualia don't exist at all, not the claim they are merely non-fundamental. An emergentist could agree that they are non-fundamental.

Next, if a computer chip is running similar algorithms as a human philosopher, expressing a similar chain of causation, that leads to that chip emitting similar descriptions of consciousness and qualia as human philosophers emit, for similar underlying reasons, then I think we have to say that whatever consciousness and qualia are (if anything), this computer chip has those things just as much as the human does

  1. That isn't illusionism. The most an illusionist would say is that a computer would be subject to the same illusions/delusions.

  2. You have bypassed the possibility that what causes qualia to emerge is not computation, but the concrete physics of the brain.... something that can only be captured by a physical description.

(Side note: Transformer-based self-supervised language models like GPT-3 can emit human-sounding descriptions of consciousness, but (I claim) they emit those descriptions for very different underlying reasons than brains do—i.e., as a result of a very different chain of causation / algorithm

Different chain of physical causation , or different algorithm? It's quite possible for the same algorithm to be implemented in physically different ways...and it's quite possible for emergent consciousness to supervene on physics.

However, nihilism is not decision-relevant

Nihilism about what, and why? I don't think you have a theory that consciousness doesn't exist or that qualia don't exist. And even if you did, I don't see how it implies the non existence of values, or preferences or selves or purposes... or whatever else it takes to undermine decision theory.

When I do that, I wind up feeling pretty strongly that if an AGI can describe joy and suffering in a human-like way, thanks to human-like underlying algorithmic processes, then I ought to care about that AGI’s well-being.

Because they have the qualia, or because qualia don't matter?

Because if the agent has to (meta)learn better and better strategies for things like brainstorming and learning and planning and understanding, I think this process entails the kind of self-reflection which comprises full-fledged self-aware human-like consciousness.

Meaning that qualia aren't even one component of human consciousness? Or one possible meaning of "consciousness"?

So I don’t even think the AGI would be in a gray area—I think it would be indisputably conscious, conscious according to any reasonable definition

Illusionist don't think humans are conscious, for some definitions of consciousness.

Thanks!

Illusionism is the claim that qualia don't exist at all, not the claim they are merely non-fundamental. An emergentist could agree that they are non-fundamental.

I’m unclear on this part. It seems like maybe just terminology to me. Suppose

  • Alice says “Qualia are an illusion, they don’t exist”,
  • Bob says “Qualia are an illusion. And they exist. They exist as an illusion.”

…I’m not sure Alice and Bob are actually disagreeing about anything of substance here, and my vague impression is that you can find self-described illusionists on both sides of that (non?)-dispute. For example, Frankish uses Alice-type descriptions, whereas Dennett and Graziano use Bob-type descriptions, I think.

Analogy: in the moving-Mario optical illusion, Alice would say “moving-Mario does not exist”, and Bob would say “there is an illusion (mental model) of moving-Mario, and it’s in your brain, and that illusion definitely exists, how else could I be talking about it?”

And if you’re on the Bob side of the dispute here, that would seem to me to be a form of emergentism, right??

You have bypassed the possibility that what causes qualia to emerge is the concrete physics of the brain, something that can only be captured by a physical description.

I don’t think I understand this part. According to the possibility that you have in mind, does the computer chip emit similar descriptions of consciousness and qualia as the human philosopher? Or not?

And then follow-up questions:

  • If yes, then do you agree that (on this possibility) actual consciousness and qualia are not involved in the chain of causation in your brain that leads to your describing your own consciousness and qualia? After all, presumably the chain of causation is the same in the computer chip, right?
  • If no, then does this possibility require that it’s fundamentally impossible to simulate a brain on a computer, such that the simulation and the actual brain emit the same outputs in the same situations?

Therefore, I need to conclude that either consciousness and qualia don’t exist, or that consciousness and qualia exist, but that they are not the ontologically fundamental parts of reality that they intuitively seem to be.

Illusionism is the claim that qualia don’t exist at all, not the claim they are merely non-fundamental. An emergentist could agree that they are non-fundamental

I’m unclear on this part. It seems like maybe just terminology to me

I don't think so because "fundamental" and "illusory" are not obvious antonyms.

Alice says “Qualia are an illusion, they don’t exist”,

Bob says “Qualia are an illusion. And they exist. They exist as an illusion.”

The Bob version runs into a basic problem with illusionism, which is that it is self contradictory: an illusion is a false appearance a false appearance is an appearance and an appearance is a quale

The Bob version could be rectified as

Charlie says “Qualia are a delusion. People have a false belief that they have them , but don't have them.

And some illusionists believe that, but don't call it delusionism.

[Edit I think the Charlie claim is Dennets position.]

[Edit: I think I understand your position much better after having read your reply to Mitchell. Must exist, since neither brain states nor perceived objects have their properties, but only in a virtual sense...? ]

And if you’re on the Bob side of the dispute here, that would seem to me to be a form of emergentism, right??

Only Bob's (or Robs's ) self-defeating form of illusionism. Basically, illusionists are trying to deny qualia, and if they let them in by the back door, that's probably a mistake. Also, they don't believe in the full panoply of qualia anyway, only the one responsible for the illusion.

I don’t think I understand this part. According to the possibility that you have in mind, does the computer chip emit similar descriptions of consciousness and qualia as the human philosopher

I'm taking that as true by hypothesis.

If yes, then do you agree that (on this possibility) actual consciousness and qualia are not involved in the chain of causation in your brain that leads to your describing your own consciousness and qualia? After all, presumably the chain of causation is the same in the computer chip, right?

The chain of causation is definitely different because silicon isn't protoplasm. By hypothesis , the computation is the same but computation isn't causation. Computation is essentially a lossy, high level description of the physical behaviour.

If no, then does this possibility require that it’s fundamentally impossible to simulate a brain on a computer, such that the simulation and the actual brain emit the same outputs in the same situations

No, but that says nothing about qualia. It's possible for qualia to depend on some aspects of the physics that isn't captured the computational description ...which means that out of two systems running the same algorithm on different hardware,one could have qualia , but the other not. The other is a kind of zombie, but not a p-zombie because of the physical difference.

And since that is true , the GAZP is false.

I strongly disagree with "computation is a lossy high-level description". For what we're talking about, I think computation is a lossless description. I believe the thing we are calling 'qualia' is equivalent to a python function written on a computer. It is not a 'real' function on the computer it is written on, but a 'zombie' function when run on a different computer. If the computation is exactly the same, the underlying physical process that produced it is irrelevant. It is the same function.

Computation in general is a lossy high level description, but not invariably.

For what we’re talking about, I think computation is a lossless description.

And what we are talking about is the computational theory of consciousness

If the computational theory of consciousness is correct, then computation is a lossless description.

But that doesn't prove anything relevant, because it doesn't show that computational theory is actually or necessarily correct. It is possibly wrong , so computational zombies are still possible.

I believe the thing we are calling ‘qualia’ is equivalent to a python function written on a computer

Can you state the function?

[+][comment deleted]5mo 20

I feel like I already understand, reasonably well, the chain of causation in my brain that leads to me saying the thing in the previous paragraph

The feeling of understanding, and actual understanding, are very different things. Astrology will give people a feeling of understanding. Popsci books give people a feeling of understanding. Repeating the teacher's password gives a feeling of understanding. Exclaiming "Neurons!" gives people a feeling of understanding. Stories of all sorts give people a feeling of understanding.

One of the signs of real understanding is doing real things with it. If I can build a house that stays up and doesn't leak, I have some understanding of how to build a house. If I can develop a piece of software that performs some practical task, then I have some understanding of software development. If I can help people live better and more fulfilled lives, I have some understanding of people.

Therefore, I need to conclude that either consciousness and qualia don’t exist, or that consciousness and qualia exist, but that they are not the ontologically fundamental parts of reality that they intuitively seem to be.

Or the real explanation is something we have not even thought of yet.

As I understand it, here I’m endorsing the “illusionism” perspective

I don't see how you make the jump from "not ontologically fundamental" to "illusion". For that matter, it's not clear to me what you count as being ontologically fundamental or why it matters.

I don't see how you make the jump from "not ontologically fundamental" to "illusion". For that matter, it's not clear to me what you count as being ontologically fundamental or why it matters.

An ontologically fundamental property is a property that is fundamental to every other property. It also can't be reduced to any other property. A great example is the superforce proposed in Theories of Everything would essentially symmetry-break into the 4 known fundamental forces: Weak and Strong Nuclear forces, Gravity, and Electromagnetism.

BTW, my credences in the following general theories of consciousness are the following:

Ontologically fundamental consciousness is less than 1%.

Non-ontologically fundamental consciousness is around 10-20% credence.

And the idea that consciousness is an illusion is probably 80-90% in my opinion.

I put illusionism at effectively 0 (i.e. small enough to ignore in all decision-making). Ontological fundamentality, as you describe it, is something that one could only judge in hindsight, after finding a testable and tested Theory of Everything Including Consciousness. We don't yet have even a testable and tested Theory of Everything Excluding Consciousness.

The Standard Model of Particle Physics plus perturbative general relativity (I wish it was better-known and had a catchier name) appears sufficient to explain everything that happens in the solar system, and has been extremely rigorously tested. It can’t explain everything that happens in the universe—in particular, it can’t make any predictions about microscopic black holes or the big bang, unfortunately. All signs point to some version of string theory eventually filling in those gaps as a true Theory of Everything, although of course one can’t be certain until the physicists actually find the right vacuum for our universe and do all the calculations etc.

I have very high confidence that, when that process is complete, and we understand the fundamental laws of the universe, the laws which hold everywhere with no exceptions, we will have learned nothing whatsoever new or helpful about consciousness. I think fundamental physics is just not going to help us here :)

[Sorry if I’m misunderstanding your point.]

I completely agree with that. So far we only have speculations towards a TOE (excluding consciousness), and when we have one, there will still be all of the way to go to explain consciousness.

Can you say more about the “non-ontologically fundamental consciousness” that you like? Or provide a link to something I could read?

Honestly Illusionism is just really hard to take seriously. Whatever consciousness is, I have better evidence it exists than anything else since it is the only thing I actually experience directly. I should pretend it isn't real...why exactly? Am I talking to slightly defective P-zombies?


If the computer emitted it for the same reasons...is a clear example of a begging the question fallacy. If a computer claimed to be conscious because it was conscious, then it logically has to be conscious, but that is the possible dispute in the first place. If you claim consciousness isn't real, then obviously computers can't be conscious. Note, that you aren't talking about real illusionism if you don't think we are p-zombies. Only the first of the two possibilities you mentioned is Illusionism if I recall correctly.


You seem like one of the many people trying to systematize things they don't really understand. It's an understandable impulse, but leads to an illusion of understanding (which is the only thing that leads to a systemization like Illusionism seems like frustrated people claiming there is nothing to see here.)
If you want a systemization of consciousness that doesn't claim things it doesn't know, then assume consciousness is the self-reflective and experiential part of the mind that controls and directs large parts of the overall mind. There is no need to state what causes it.


If a machine fails to be self-reflective or experiential then it clearly isn't conscious. It seems pretty clear that modern AI is neither. It probably fails the test of even being a mind in any way, but that's debatable.

Is it possible for a machine to be conscious? Who knows. I'm not going to bet against it, but current techniques seem incredibly unlikely to do it.

Whatever consciousness is, I have better evidence it exists than anything else since it is the only thing I actually experience directly.

In an out-of-body experience, you can “directly experience” your mind floating on the other side of the room. But your mind is not in fact floating on the other side of the room.

So what you call a “direct experience”, I call a “perception”. And perceptions can be mistaken—e.g. optical illusions.

So, write down a bulleted list of properties of your own consciousness. Every one of the items on your list is a perception that you have made about your own consciousness. How many of those bulleted items are veridical perceptions—perceiving an aspect of your own consciousness as it truly is—and how many of them are misperceptions? If you say “none is a misperception”, how do you know, and why does it differ from all other types of human perception in that respect, and how do you make sense of the fact that some people report that they were previously mistaken about properties of their own consciousness (e.g. “enlightened” Buddhists reflecting on their old beliefs)?

Or if you allow that some of the items on your bulleted list may be misperceptions, why not all of them??

It seems pretty clear that modern AI is neither

To be clear, this post is about AGI, which doesn’t exist yet, not “modern AI”, which does.

So what you call a “direct experience”, I call a “perception”. And perceptions can be mistaken—e.g. optical illusions.

This comment:

Whatever consciousness is, I have better evidence it exists than anything else since it is the only thing I actually experience directly.

...could have been phrased as:

Whatever consciousness is, I have better evidence it exists than anything else since it is the only thing I experience everything else with.

I do agree with your rephrasing. That is exactly what I mean (though with a different emphasis.).

Why I expect AGIs to be sentient / conscious, whether we wanted that or not

I think it could be worse than that. 

As you speculate, we will earlier or later figure out how to engineer consciousness. And I mean your strong version of consciousness where people agree that the thing is conscious, e.g., not just because it responds like a conscious agent but because we can point out how it happens, observe the process, and generally see the analogy to how we do it. If we can engineer it, we can optimize it and reduce it to the minimum needed components and computational power to admit consciousness in this sense. Humans got consciousness because it was evolutionary useful or even a side-effect. It is not the core feature of the human brain, and most of the processing and learning of the brain deals with other things. Therefore I conjecture that not that much CPU/RAM will be needed for pure consciousness. I would guess that my laptop has enough for that. What will happen if some evil actor engineers small consciousnesses into their devices or even apps? May we reformat or uninstall them? 

As you speculate, we will earlier or later figure out how to engineer consciousness.

I think we're much further away from this than we are other problems with AGI.  I agree that we will, at some point, be able to define consciousness in a way that will be accepted by those we currently agree are conscious by dint of their similarity to ourselves).  I don't know that it will match my or your intuitions about it today, and depending on what that agreement is, we may or may not be able to engineer it very much.

I strongly expect that as we progress in understanding, we'll decide that it's not sacred, and it's OK to create and destroy some consciousnesses for the convenience of others.  Heck, we don't spend very much of our personal energy in preventing death of distant human strangers, though we try not to be personally directly responsible for deaths.  I'm certainly not going to worry about reformatting a device that has a tiny consciousness any more than I worry about killing an ant colony that's too close to my house.  I may or may not worry about longevity of a human-sized consciousness, if it's one of billions that are coming and going all the time.  I have no intuitions about giant consciousnesses - maybe they're utility monsters, maybe they're just blobs of matter like the rest of us.

I strongly expect that as we progress in understanding, we'll decide that it's not sacred, and it's OK to create and destroy some consciousnesses for the convenience of others.

That might be an outcome. In that case, we might decide that the sacredness of life is not tied to consciousness but something else.

Creating or preventing conscious experiences from happening has a moral valence equivalent to how that conscious experience feels. I expect most "artificial" conscious experiences created by machines to be neutral with respect to the pain-pleasure axis, for the same reason that randomly generated bitmaps rarely depict anything.

I expect most "artificial" conscious experiences created by machines to be neutral with respect to the pain-pleasure axis, for the same reason that randomly generated bitmaps rarely depict anything.

What if the machine is an AGI algorithm, and right now it’s autonomously inventing a new better airplane design? Would you still expect that?

The space of possible minds/algorithms is so vast, and that problem is so open-ended, that it would be a remarkable coincidence if such an AGI had a consciousness that was anything like ours. Most details of our experience are just accidents of evolution and history.

Does an airplane have a consciousness like a bird? "Design an airplane" sounds like a more specific goal, but in the space of all possible minds/algorithms that goal's solutions are quite undetermined, just like flight.

My airplane comment above was a sincere question, not a gotcha or argument or anything. I was a bit confused about what you were saying and was trying to suss it out. :)  Thanks.

I do disagree with you though. Hmm, here’s an argument. Humans invented TD learning, and then it was discovered that human brains (and other animals) incorporate TD learning too. Similarly, self-supervised learning is widely used in both AI and human brains, as are distributed representations and numerous other things.

If our expectation is “The space of possible minds/algorithms is so vast…” then it would be a remarkable coincidence for TD learning to show up independently in brains & AI, right? How would you explain that?

I would propose instead an alternative picture, in which there are a small number of practical methods which can build intelligent systems. In that picture (which I subscribe to, more or less), we shouldn’t be too surprised if future AGI has a similar architecture to the human brain. Or in the most extreme version of that picture, we should be surprised if it doesn’t! (At least, they’d be similar in terms of how they use RL and other types of learning / inference algorithms; I don’t expect the innate drives a.k.a. reward functions to be remotely the same, at least not by default.)

I agree with Stephen's point about convergent results from directed design (or evolution in the case of animals). I don't agree that consciousness and moral valence are closely coupled such that it would incur a performance loss to decouple them. Therefore, I suspect it will be a nearly costless choice to make morally relevant vs irrelevant AGI, and that we very much morally ought to choose to make morally-irrelevant AGI. To do otherwise would be possible, as Gunnar describes, but morally monstrous. Unfortunately some people do morally monstrous things sometimes. I am unclear on how to prevent this particular form of monstrosity.

My own point of view is similar to that of Luke Melhausser's opinion in section 4.1 of "2017 report on consciousness and moral patienthood." Physicalism seems like it is true. But the global neural work space + attention schema, and other physicalist models of consciousness, just don't feel like they explain enough of the phenomena, and seem so simple that were they true I'd expect consciousness to be pretty much everywhere. Like, what exactly is reflectivity? What counts as a reflective model for the purpose of generating a computation that outputs thoughts like "I have qualia"? 

  1. ^

    Though I suppose where I differ is that I've got philisophical issues with ascribing a computation to a physical process. It seems like you can ascribe many computations to e.g. a rock or a waterfall. And I don't know a philosophical principle which doesn't give you an insane answer and seems justified.  

     

I'm not quite convinced that illusionism is decision-irrelevant in the way you propose. If it's true that there is no such thing as 1st-person experience, then such experience cannot disclose your own values to you. Instead, you must infer your values indirectly through some strictly 3rd-person process. But all external probing of this sort, because it is not 1st-person, will include some non-zero degree of uncertainty.

One paradox that this leads to is the willingness to endure vast amounts of (purportedly illusory) suffering in the hope of winning, in exchange, a very small chance of learning something new about your true values. Nihilism is no help here, because you're not a nihilist; you're an illusionist. You do believe that you have values, instantiated in 3rd-person reality.

Instead, you must infer your values indirectly through some strictly 3rd-person process

Or some other first person process.

Can you elaborate what such a process would be? Under illusionism, there is no first person perspective in which values can be disclosed (namely, for hedonic utilitarianism).

Ilusionism denies the reality of qualia, not personhood.

Personhood is a separate concept. Animals that may lack a personal identity conception may still have first person experiences, like pain and fear. Boltzmann brains supposedly can instantiate brief moments of first person experience, but they lack personhood.

The phrase "first person" is a metaphor borrowed from the grammatical "first person" in language.