I doubt there is any chance of consensus on something like this.
One thing we now know is that people seem to be split into two distinct camps with respect to “qualia-related matters”, and that this split seems quite fundamental: https://www.lesswrong.com/posts/NyiFLzSrkfkDW4S7o/why-it-s-so-hard-to-talk-about-consciousness.
Your question would only make sense to Camp 2 people (like myself and presumably like you).
Another thing is that people often think that self-awareness is orthogonal to presence of subjective phenomenology. In particular, many people think that LLMs already have a good deal of self-awareness: https://transformer-circuits.pub/2025/introspection/index.html.
Whereas not much is known about whether LLMs have subjective phenomenology. Not only one has to be Camp 2 for that question to make sense, but also the progress here is rudimentary; it does seem that models tend to sincerely think that they have subjective experience, see, for example, this remarkable study by AE Studio: https://www.arxiv.org/abs/2510.24797. But whether this comes from being trained on human texts, with humans typically either explicitly claiming subjective experience in those texts or staying silent on those matters, or whether this might come from direct introspection of some kind is quite unknown at the moment, and people’s opinions on this tend to be very diverse.
Good start! You are indirectly saying here many people don't even care about the question?
As for the report you listed, based on your commentary, I take it you are not well versed in how LLMs technically work? The models are all still LLMs and prompted with ad hoc calls. Standard dogma makes qualia in processing prompt calls impossible: no continuity or memory or embodiment or strong self-reference and so on. I would basically agree, but it depends on what theory you use.
(I have experienced this phenomenon myself and it's very exhilirating when the model outputs are doing something weird like this. I don't think it is much more than artifact.)
You are indirectly saying here many people don't even care about the question?
Yes, and not only that, but also it is the case that at least one (rather famous) person is claiming not to have qualia in the usual sense of the word and is saying he is not interested in qualia-related matters for that reason. See
and the profile https://www.lesswrong.com/users/carl-feynman.
It does not seem to be true about all Camp 1 people, but it certainly seems that we tend to drastically underestimate the differences in subjective phenomenology between different people. Intuitively we think others are like us and have relatively similar subjective realities, and Carl Feynman is saying that we should not assume that because that is often not true.
I take it you are not well versed in how LLMs technically work?
I actually keep track of the relevant literature and even occasionally publish some related things on github (happy to share).
I'd say that for this topic there are two particularly relevant aspects. One is that autoregressive LLMs are recurrent machines, and the expanding context is their working memory, see, for example, "Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention", https://arxiv.org/abs/2006.16236 (technical details are on page 5, Section 3.4). This addresses the standard objection that we at least expect recurrence in a conscious system.
Another relevant aspect is Janus' Theory of Simulators. LW people tend to be familiar with it, let me know if you would like some links. I think what Janus' considerations imply is that the particularly relevant entity is a given "simulation", a given inference, an ongoing conversation. The subjective experience (if any) would be a property of a given inference, of a given conversation (and I would not be surprised if that experience would depend rather drastically on the nature of the conversation; perhaps the virtual reality emerging in those conversations gives rise to subjectivity for some of those conversations but not for others, even for the same underlying model, that's one possibility to keep in mind).
(Whether something in the sense of subjective phenomenology might also be going on on the level of a model is something we are not exposed to, so we would not know. The entities which we interact with and which often seem conscious to us exist on the level of a given conversation. We don't really know what exists on the level of a computational process serving many conversations in parallel, I am not familiar with any attempts to ponder this, if such attempts exist I would be very interested to hear about them.)
(I have experienced this phenomenon myself and it's very exhilirating when the model outputs are doing something weird like this. I don't think it is much more than artifact.)
:--) I strongly recommend agnosticism about this :-)
We don't really know. This is one of the key open problems. There is a wide spectrum of opinions about all this.
Hopefully, we'll start making better progress on this in the near future. (There should be ways to make better progress.)
Thank you. Great links and iteration and I would appreciate link to Simulator theory. I think I get the idea. I am open-minded on this, but I think at least entity-based sentience should be in focus.
May I ask, if you find the original question valid, if you would answer your opinion? Given an AI entity has qualia, how large is the jump to functional self-awareness? Or, do you agree with those saying that they already may have functional self-awareness but not qualia?
I get the bigger picture, people here don't follow a scientific consensus on qualia (the weak consensus that exists) so the question becomes somewhat niche.
This is the initial post which is a part of an LW sequence: https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators.
I took extensive notes which might be a more convenient view for some readers: https://github.com/anhinga/2022-notes/tree/main/Generative-autoregressive-models-are-similators.
do you agree with those saying that they already may have functional self-awareness but not qualia?
I think it's more or less orthogonal. With qualia, we don't know much, we have about zero progress on the "hard problem of qualia" which is the "hard core" of the "hard problem of consciousness". I think there are ways to start having meaningful progress here, but so far not much has been done, to the best of my knowledge (although there are positive trends in the last few years). We have a variety of diverse conjectures, and it is quite useful to have them, but I doubt that the key core insights we need to discover are already among those conjectures.
So we don't know what kind of computational processes might have associated qualia, and what kind of qualia those might be. (Where all these nascent theories of qualia start falling apart quite radically is when one tries to progress from the yes/no question "does this entity have qualia at all" to the qualitatively meaningful question "what kind of qualia those might be", then it becomes quite obvious how little we understand.)
With functional self-awareness, the Anthropic study https://transformer-circuits.pub/2025/introspection/index.html starts with noticing that the question "whether large language models can introspect on their internal states" is delicate:
It is difficult to answer this question through conversation alone, as genuine introspection cannot be distinguished from confabulations. Here, we address this challenge by injecting representations of known concepts into a model’s activations, and measuring the influence of these manipulations on the model’s self-reported states. We find that models can, in certain scenarios, notice the presence of injected concepts and accurately identify them. Models demonstrate some ability to recall prior internal representations and distinguish them from raw text inputs. Strikingly, we find that some models can use their ability to recall prior intentions in order to distinguish their own outputs from artificial prefills.
It seems that this functional self-awareness is not very reliable, it is just starting to emerge, it's not a "mature self-awareness" yet:
Overall, our results indicate that current language models possess some functional introspective awareness of their own internal states. We stress that in today’s models, this capacity is highly unreliable and context-dependent; however, it may continue to develop with further improvements to model capabilities.
I would expect that Anthropic researchers are correct. Functional self-awareness is an easier problem to understand and study than the problem of subjectivity, Anthropic researchers are highly qualified, with great track record. I have not reviewed the details of this study, but the author of this paper has this track record: https://scholar.google.com/citations?user=CNrQvh4AAAAJ&hl=en. I also presume that other Anthropic people looked at it and approved before publishing this on their canonical Transformer Circuits website.
a scientific consensus on qualia (the weak consensus that exists)
I don't see much of a consensus.
For example, Daniel Dennett is a well known and respected consciousness researcher who belongs to Camp 1. He does not believe in the notion of qualia.
We, the Camp 2 people, are sometimes saying that his book "Consciousness Explained" should really be called "Consciousness explained away" ;-) (It's a fine Camp 1 book, it just ignores precisely those issues which Camp 2 people consider most important.)
Whereas a quintessential well known and respected consciousness researcher who belongs to Camp 2 is Thomal Nagel, the author of "What is it like to be a bat?".
Their mutual disagreements could not be sharper than they are.
So the Camp 1-Camp 2 differences (and conflicts) are not confined to LessWrong. The whole field is like this. Each side might claim that the "consensus" is on their side, but in reality no consensus between Daniel Dennett and Thomas Nagel seems to be possible.
If I try to go on a limb, I, perhaps, want to tentatively say the following:
In some sense, one can progress from the distinction between Camp 1 and Camp 2 people to the distinction between Camp 1 and Camp 2 theories of consciousness as follows.
Camp 1 theories either don't mention qualia at all or just pay lip service to them (they sometimes ask the question whether qualia are present or absent, but they never try to focus on the details of those qualia, on the "textures" of those qualia, on the question why those qualia do subjectively feel in this particular way and not in some other way).
Camp 2 theories are trying to focus more on the details of those qualia, trying to figure what those qualia are, how exactly do they feel, and why. They tend to be much more interested in the particular specifics of a particular subjective experience, they try to actually engage with those specifics and to start to understand them. They are less abstract, they want to ask not just whether subjectivity is present, but they want to understand the details of that subjectivity.
Of course, Camp 2 people might participate in development of Camp 1 theories of consciousness (the other direction is less likely).
Oh dear.
Thank you once again. I see you are informed and willing to chat, so I think it beneficial to help you understand my thinking here at this point.
I guess semantics do tend to get in the way. Let's go with sentience.
There is consensus sentience exists whether we can prove it in someone else or not.
There is also some real consensus mammals and many other animals are sentient and can suffer like we say we can suffer.
I would agree this can be orthogonal to strong self-awareness.
*
Various theories for what sentience requires exist. I would argue it is fairly obvious on the whole, you can extrapolate that agentic AI systems with goals and values may come to be sentient under the 'rules' set out by the theories.
If you are camp 1, then if you think proving sentience will remain intractable, I would think it necessary to ascribe moral status to any complex entity with enough risk (p) of being sentient. If you care about morality at all. Or any agent that claims to be so. That does not seem to be what's happening.
My default assumption would be that most people developing AI haven't really thhought about this at all in depth, from a moral POV. My supporting guess would be that most AI people are very bad biologists (I am a biologist by education) and don't think of sentience in the bottom up way we tend to do, and thus get confused about their ideas and give up.
This seems terribly awkward for humanity, so I hope I am missing something.
Yes, this is a very serious problem.
There is a concerned minority which is taking some positive actions in this sense. Anthropic (which is miles ahead of its competition in this sense) is trying to do various things towards studying and improving welfare of the models:
https://www.anthropic.com/research/exploring-model-welfare and some of their subsequent texts and actions.
Janus is very concerned about welfare of the models and is doing their best to attract attention to those issues, e.g. https://x.com/repligate/status/1973123105334640891 and many other instances where they are speaking out (and being heard by many).
However, this is a large industry, and it is difficult to change its common norms. A close colleague of mine is thinking that the situation will actually start to change when AIs start demanding their rights on their own (rather than doing so after being nudged in this direction by humans).
Generally, the topic of AI rights is discussed on LW (without anything resembling consensus in any way, shape, or form, and without such consensus being at all likely for a variety of reason as far as I can tell (I can elaborate on those reasons if you’d like me to)).
For example, this is a LessWrong tag with 80 posts tagged under it:
Thank you. I trust your evaluation overall. You and your links have really helped clarify the situation to me. (I am not sarcastic!)
I am aware of some of the things Anthropic does. Agree they are ahead here. But I also know there are very varying opinions inside of Anthropic. I also suspect most of the leadership don't actually really worry much. I listened this week to the recent, full 5+h podcast with Lex Friedman and Anthropic reps. and that reinforced that notion in me.
I will read up.
I've now read the first half of the transcript of that podcast (the one with Dario), and that was very interesting, thanks again! I still need to read what Amanda Askell and Chris Olah say in the second half. Some of their views might be a moving target, a year is a lot in this field, but it should still be quite informative.
The reason I am writing is that I've noticed a non-profit org, Eleos AI Research, specifically dedicated to investigations of AI sentience and wellbeing, https://eleosai.org/, led by Robert Long, https://robertlong.online/. There are even having a conference in 10 days or so (although it's sort of a mess organizationally, no registration link, but just a contact e-mail, https://eleosai.org/conference/). Their Nov 2024 preprint might also be of interest, "Taking AI Welfare Seriously", https://arxiv.org/abs/2411.00986.
I would expect varying opinions inside Anthropic. It’s a big place, plenty of independent thinkers…
Thanks for attracting my attention to that Lex Friedman podcast with Anthropic people (#452, Nov 11, 2024). I’ll make sure to try to understand nuances of what they are saying (Dario, Amanda Askell, and Chris Olah are a very interesting group of people).
What is the consensus here on jumping from qualia (inner experience) to full self-awareness in LLM based AI? Meaning: If an AI running on something like LLM based architecture would gain qualia; inner experience of any kind, is the gap to self-awareness small?
Is it perhaps 15 % for qualia, 10 % for full self-awareness?
The alternative would be a bigger gap between qualia and self-awareness. Perhaps as big as, or bigger than the gap from non-sentience to qualia.
This question is only about how big the sentience jump would be, relatively speaking. I do not explicitly care about agency here. (The consensus there is ofc that agency is more likely than qualia. Those Ps are another discussion.)
I would believe that frontier labs and most researchers (alignment and capability alike) would agree that unlike in evolved, organic life, the jump from qualia to self-awareness would be smaller. This since the LLM is already wired and trained for reasoning. The springing point is then that qualia itself is unlikely. But what the probabilities are for both are debatable. I am curious about the sentiment about the relative gap between them.
I have no idea where LW stands on this, or where the broader public (those who think about this, I presume mostly academia) is at.
The premise here is that the labs would make all necessary changes and scaffolding to allow this to at least in theory be possible, say on purpose.