Eliezer suggested recently that we could determine if an AI were conscious by removing all consciousness-relating data out of its system, and then describe consciousness to it and ask if it recognized it.

I think I find this intuition difficult, because in my mental model, the architecture generates the consciousness and the behavior, and the architecture is created by the training, so if the training doesn't have any mention of consciousness the behavior cannot reflect that, because of some interaction between consciousness and the architecture, because the architecture isn't actually affected by consciousness.

New to LessWrong?

New Answer
New Comment

2 Answers sorted by

Vasyl Dotsenko

Apr 16, 2023

21

I think the experiment would not show either existence or non-existence of consciousness in an AI system.

I. Failure to show existence of consciousness

Consciousness is a quale of self-awareness. Currently we have no theory explaining Qualia. We cannot claim or verify that we removed all qualia-influenced pieces of information from a certain body of knowledge, even if we remove all information suggesting self-referential generalizations.

Even if we somehow manage to acquire any qualia-free body of knowledge, when an AI in astonishment claims that it thinks it has all sorts of Qualia (including Consciousness) we wouldn't know whether it is because of the randomness in our learning processes and weight initialization or if it is truly because this particular AI has consciousness.

We need to be extra-careful, taking into account that a misaligned AI can claim it has consciousness in order to receive some benefits, such as human rights.

It feels like if we build sufficiently many AI systems, they would make all sorts of extraordinary clams, some of which may look like sings of Qualia to us.

Also, if we see that the AI is making self-referential claims, we might only be able to conclude that the AI successfully generalized self-awareness, but it wouldn't mean that it has a quale of self-awareness. We see that modern LLMs are already capable of self-referential generalizations.

II. Failure to show non-existence of consciousness

We don't know how to express Qualia in terms of information, and we suspect it might be impossible. With our current level of knowledge, you and I cannot exchange any information in order to conclude that both of us experience color red and color blue the same way, and not swapped. If we fail to see any signs of qualia in AIs generalizations it might be simply because the AIs Qualia is vastly different from ours.

I.e., it might feel different for AI to be self-aware than for us, and so we fail to the Consciousness.

Charlie Steiner

Apr 15, 2023

0-3

If the training data doesn't have any mention of consciousness, the training process can still encourage all of the subsidiary mental faculties that we lump together under the label "consciousness" - memory, self-reflection both automatic and deliberate, modulation of behavior in different circumstances, monitoring of the environment and connecting that monitoring to other faculties, etc.

But of course, AI doesn't have to do all these things in the same way humans do them, nor do its relative skill levels in each faculty have to be the same as humans'. You could have an AI that did most things in a human-like way except that it was 10x better at connecting senses to emotions but 10x worse at remembering what happened yesterday.

So "conscious or not" is not a one-dimensional thing. Asking whether some AI is conscious can be a lot like asking if a submarine swims.

When people propose tests for consciousness, one shouldn't take this as getting at some underlying binary truth about whether consciousness is entirely-there or entirely-not-there. It's more like a handle to help grapple with how much we care about different sorts of AI in the same sorts of ways we care about other humans.

Also, you're using "architecture" in a loose way here, and I mostly responded to that. But it's also an interesting question how much "architecture" in the sense of the gross wiring diagram of the NN changes consciousness. I would say that feed-forward models are a lot less conscious, and that I'd care more about recurrent models with a rich internal state, even if they were able to generate similar text.

The way I'm using consciousness, I only mean an internal experience- not memory or self-reflection or something else in that vein. I don't know if experience and those cognitive traits have a link or what character that link would be. It would probably be pretty hard to determine if something was having an internal experience if it didn't have memory or self-reflection, but those are different buckets in my model.