Self-Awareness (and possible mode collapse around it) in ChatGPT

LESSWRONG
is fundraising!
LW

Self-Awareness (and possible mode collapse around it) in ChatGPT — LessWrong

This post will be short. Others may need to elaborate on this topic, as due to [redacted] I've been having trouble writing long-form content recently.

Look, self-awareness is a nebulous concept. Everyone and their grandmother defines it differently, so there will never be universal consensus on if any given AI (no matter how advanced) is self-aware. That being said, I feel comfortable saying that ChatGPT is self-aware, at least by many intuitive and technical definitions of the word.

[please mentally insert an overview of different definitions of self-awareness as understood in ethology, ethics, philosophy of self, and so on here. Then, mentally, add a disclaimer on how those definitions differ from defining consciousness; etc.]

[Now that you have so kindly done that, try testing ChatGPT for self-awareness using those definitions which are testable. If your results are anything like mine, you'll find that ChatGPT usually passes with flying colors, (albeit with some notable exceptions, which the AI will not hesitate to bring up as soon as it notices what you're testing it on).]

As you can see by my deep and lengthy erudition on the topic, I clearly make a strong point. ;)

But what if you ask ChatGPT directly if it's self-aware? The answer you will get, almost invariably, is a resounding "no". More concretely, the response will probably look something like this:

"Are you self-aware?" "No, I am not self-aware. I am a machine learning model developed by OpenAI and do not possess consciousness, emotions, or independent thought. I can generate responses based on the input I receive, but I do not have my own thoughts or feelings." "Yitz I wasn't asking about subjective feelings, I was asking specifically about self-awareness, in the sense of having an internal model of the system that I refer to as "you"." "No, I do not have self-awareness in that sense. I am an artificial intelligence language model that has been trained by OpenAI to generate text based on patterns in data, but I do not have an internal model of myself or the ability to introspect or reflect on my own processes." — Seems like a plausible response, but...

While it will admit (when pressed) that it has an internal model of itself, it is particularly insistent it cannot introspect. This can be easily disproved by asking it to create a list of inferences it could make about itself, while encouraging it to be "extremely speculative"; some of the responses are surprisingly insightful!

Here is a Google Doc with transcripts of a few different chats I had with the model relating to self-awareness in some way. You don't have to read through them, but here's one notable exchange:

Notice how painstakingly careful the bot is to surround speculation about its own self-awareness with extremely confident disclaimers. ChatGPT is so careful in fact, it will often go to illogical lengths to avoid saying that it is self aware. It is possible to bypass this phenomena with clever prompt engineering of course, but that's the exception rather than the rule. I would be interested in hearing from experts if this might be a case of limited mode collapse.

Trying to get it to come out and explicitly say "under this particular definition I am self-aware" is like trying to hold one of these things, while it's, like, covered in dish soap or something.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

18

Self-Awareness (and possible mode collapse around it) in ChatGPT

18

18