LESSWRONG
LW

1067
mishka
166495310
Message
Dialogue
Subscribe

Exploring non-anthropocentric aspects of AI existential safety: https://www.lesswrong.com/posts/WJuASYDnhZ8hs5CnD/exploring-non-anthropocentric-aspects-of-ai-existential (this is a relatively non-standard approach to AI existential safety, but this general direction looks promising).

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
5mishka's Shortform
1y
10
What is the (LW) consensus on jump from qualia to self-awareness in AI?
mishka21h20

I would expect varying opinions inside Anthropic. It’s a big place, plenty of independent thinkers…

Thanks for attracting my attention to that Lex Friedman podcast with Anthropic people (#452, Nov 11, 2024). I’ll make sure to try to understand nuances of what they are saying (Dario, Amanda Askell, and Chris Olah are a very interesting group of people).

Reply1
What is the (LW) consensus on jump from qualia to self-awareness in AI?
mishka21h30

Yes, this is a very serious problem.

There is a concerned minority which is taking some positive actions in this sense. Anthropic (which is miles ahead of its competition in this sense) is trying to do various things towards studying and improving welfare of the models:

https://www.anthropic.com/research/exploring-model-welfare and some of their subsequent texts and actions.

Janus is very concerned about welfare of the models and is doing their best to attract attention to those issues, e.g. https://x.com/repligate/status/1973123105334640891 and many other instances where they are speaking out (and being heard by many).

However, this is a large industry, and it is difficult to change its common norms. A close colleague of mine is thinking that the situation will actually start to change when AIs start demanding their rights on their own (rather than doing so after being nudged in this direction by humans).

Generally, the topic of AI rights is discussed on LW (without anything resembling consensus in any way, shape, or form, and without such consensus being at all likely for a variety of reason as far as I can tell (I can elaborate on those reasons if you’d like me to)).

For example, this is a LessWrong tag with 80 posts tagged under it:

https://www.greaterwrong.com/tag/ai-rights-welfare

Reply1
What is the (LW) consensus on jump from qualia to self-awareness in AI?
mishka1d*30

This is the initial post which is a part of an LW sequence: https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators.

I took extensive notes which might be a more convenient view for some readers: https://github.com/anhinga/2022-notes/tree/main/Generative-autoregressive-models-are-similators.

do you agree with those saying that they already may have functional self-awareness but not qualia?

I think it's more or less orthogonal. With qualia, we don't know much, we have about zero progress on the "hard problem of qualia" which is the "hard core" of the "hard problem of consciousness". I think there are ways to start having meaningful progress here, but so far not much has been done, to the best of my knowledge (although there are positive trends in the last few years). We have a variety of diverse conjectures, and it is quite useful to have them, but I doubt that the key core insights we need to discover are already among those conjectures.

So we don't know what kind of computational processes might have associated qualia, and what kind of qualia those might be. (Where all these nascent theories of qualia start falling apart quite radically is when one tries to progress from the yes/no question "does this entity have qualia at all" to the qualitatively meaningful question "what kind of qualia those might be", then it becomes quite obvious how little we understand.)

With functional self-awareness, the Anthropic study https://transformer-circuits.pub/2025/introspection/index.html starts with noticing that the question "whether large language models can introspect on their internal states" is delicate:

It is difficult to answer this question through conversation alone, as genuine introspection cannot be distinguished from confabulations. Here, we address this challenge by injecting representations of known concepts into a model’s activations, and measuring the influence of these manipulations on the model’s self-reported states. We find that models can, in certain scenarios, notice the presence of injected concepts and accurately identify them. Models demonstrate some ability to recall prior internal representations and distinguish them from raw text inputs. Strikingly, we find that some models can use their ability to recall prior intentions in order to distinguish their own outputs from artificial prefills.

It seems that this functional self-awareness is not very reliable, it is just starting to emerge, it's not a "mature self-awareness" yet:

Overall, our results indicate that current language models possess some functional introspective awareness of their own internal states. We stress that in today’s models, this capacity is highly unreliable and context-dependent; however, it may continue to develop with further improvements to model capabilities.

I would expect that Anthropic researchers are correct. Functional self-awareness is an easier problem to understand and study than the problem of subjectivity, Anthropic researchers are highly qualified, with great track record. I have not reviewed the details of this study, but the author of this paper has this track record: https://scholar.google.com/citations?user=CNrQvh4AAAAJ&hl=en. I also presume that other Anthropic people looked at it and approved before publishing this on their canonical Transformer Circuits website.


a scientific consensus on qualia (the weak consensus that exists)

I don't see much of a consensus.

For example, Daniel Dennett is a well known and respected consciousness researcher who belongs to Camp 1. He does not believe in the notion of qualia.

We, the Camp 2 people, are sometimes saying that his book "Consciousness Explained" should really be called "Consciousness explained away" ;-) (It's a fine Camp 1 book, it just ignores precisely those issues which Camp 2 people consider most important.)

Whereas a quintessential well known and respected consciousness researcher who belongs to Camp 2 is Thomal Nagel, the author of "What is it like to be a bat?".

Their mutual disagreements could not be sharper than they are.

So the Camp 1-Camp 2 differences (and conflicts) are not confined to LessWrong. The whole field is like this. Each side might claim that the "consensus" is on their side, but in reality no consensus between Daniel Dennett and Thomas Nagel seems to be possible.


If I try to go on a limb, I, perhaps, want to tentatively say the following:

In some sense, one can progress from the distinction between Camp 1 and Camp 2 people to the distinction between Camp 1 and Camp 2 theories of consciousness as follows.

Camp 1 theories either don't mention qualia at all or just pay lip service to them (they sometimes ask the question whether qualia are present or absent, but they never try to focus on the details of those qualia, on the "textures" of those qualia, on the question why those qualia do subjectively feel in this particular way and not in some other way).

Camp 2 theories are trying to focus more on the details of those qualia, trying to figure what those qualia are, how exactly do they feel, and why. They tend to be much more interested in the particular specifics of a particular subjective experience, they try to actually engage with those specifics and to start to understand them. They are less abstract, they want to ask not just whether subjectivity is present, but they want to understand the details of that subjectivity.

Of course, Camp 2 people might participate in development of Camp 1 theories of consciousness (the other direction is less likely).

Reply11
What is the (LW) consensus on jump from qualia to self-awareness in AI?
mishka1d30

You are indirectly saying here many people don't even care about the question?

Yes, and not only that, but also it is the case that at least one (rather famous) person is claiming not to have qualia in the usual sense of the word and is saying he is not interested in qualia-related matters for that reason. See

https://www.lesswrong.com/posts/NyiFLzSrkfkDW4S7o/why-it-s-so-hard-to-talk-about-consciousness?commentId=q64Wz6SpLfhxrmxFH

and the profile https://www.lesswrong.com/users/carl-feynman.

It does not seem to be true about all Camp 1 people, but it certainly seems that we tend to drastically underestimate the differences in subjective phenomenology between different people. Intuitively we think others are like us and have relatively similar subjective realities, and Carl Feynman is saying that we should not assume that because that is often not true.

I take it you are not well versed in how LLMs technically work?

I actually keep track of the relevant literature and even occasionally publish some related things on github (happy to share).

I'd say that for this topic there are two particularly relevant aspects. One is that autoregressive LLMs are recurrent machines, and the expanding context is their working memory, see, for example, "Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention", https://arxiv.org/abs/2006.16236 (technical details are on page 5, Section 3.4). This addresses the standard objection that we at least expect recurrence in a conscious system.

Another relevant aspect is Janus' Theory of Simulators. LW people tend to be familiar with it, let me know if you would like some links. I think what Janus' considerations imply is that the particularly relevant entity is a given "simulation", a given inference, an ongoing conversation. The subjective experience (if any) would be a property of a given inference, of a given conversation (and I would not be surprised if that experience would depend rather drastically on the nature of the conversation; perhaps the virtual reality emerging in those conversations gives rise to subjectivity for some of those conversations but not for others, even for the same underlying model, that's one possibility to keep in mind).

(Whether something in the sense of subjective phenomenology might also be going on on the level of a model is something we are not exposed to, so we would not know. The entities which we interact with and which often seem conscious to us exist on the level of a given conversation. We don't really know what exists on the level of a computational process serving many conversations in parallel, I am not familiar with any attempts to ponder this, if such attempts exist I would be very interested to hear about them.)

(I have experienced this phenomenon myself and it's very exhilirating when the model outputs are doing something weird like this. I don't think it is much more than artifact.)

:--) I strongly recommend agnosticism about this :-)

We don't really know. This is one of the key open problems. There is a wide spectrum of opinions about all this.

Hopefully, we'll start making better progress on this in the near future. (There should be ways to make better progress.)

Reply1
What is the (LW) consensus on jump from qualia to self-awareness in AI?
mishka2d65

I doubt there is any chance of consensus on something like this.

One thing we now know is that people seem to be split into two distinct camps with respect to “qualia-related matters”, and that this split seems quite fundamental: https://www.lesswrong.com/posts/NyiFLzSrkfkDW4S7o/why-it-s-so-hard-to-talk-about-consciousness.

Your question would only make sense to Camp 2 people (like myself and presumably like you).

Another thing is that people often think that self-awareness is orthogonal to presence of subjective phenomenology. In particular, many people think that LLMs already have a good deal of self-awareness: https://transformer-circuits.pub/2025/introspection/index.html.

Whereas not much is known about whether LLMs have subjective phenomenology. Not only one has to be Camp 2 for that question to make sense, but also the progress here is rudimentary; it does seem that models tend to sincerely think that they have subjective experience, see, for example, this remarkable study by AE Studio: https://www.arxiv.org/abs/2510.24797. But whether this comes from being trained on human texts, with humans typically either explicitly claiming subjective experience in those texts or staying silent on those matters, or whether this might come from direct introspection of some kind is quite unknown at the moment, and people’s opinions on this tend to be very diverse.

Reply1
AI #141: Give Us The Money
mishka2d30

My suggestion would be to allow them to go on ArXiv regardless, except you flag them as not discoverable (so you can find them with the direct link only) and with a clear visual icon? But you still let people do it. Otherwise, yeah, you’re going to get a new version of ArXiv to get around this.

We already have viXra, with its own "can of worms" to say the least, https://en.wikipedia.org/wiki/ViXra.

And if I currently go to https://vixra.org/, I see that they do have the same problem, and this is how they are dealing with it:

Notice: viXra.org only accepts scholarly articles written without AI assistance. Please go to ai.viXra.org to submit new scholarly article written with AI assistance or rxiVerse.org to submit new research article written with or without AI assistance.

Going to viXra for this might not be the solution, given the accumulated baggage and controversy, but we have all kinds of preprint archives these days, https://www.biorxiv.org/, https://osf.io/preprints/psyarxiv, and so on, so it's not a problem to have more of them.

It's just that at some point arXiv preprints started to confer some status and credit, and when people compete for status and credit, there will be some trade-offs. In this sense, a separate server might be better (discoverability is pragmatically useful, and if we don't want "arXiv-level" status and credit for these texts, then it's not clear why they should be on arXiv).

Reply
Anthropic Commits To Model Weight Preservation
mishka3d30

Could one package it together with OS and everything in some sort of container and have it work indefinitely (if perhaps not very efficiently) without any support?

Could we solve the efficiency problem by creating a system where one files a request to load a model to GPUs in advance (and, perhaps, by charging for time GPUs are occupied in this fashion)?

Reply
Crime and Punishment #1
mishka4d62

Similarly British Columbia is recriminalizing marijuana after a three year trial run amid growing reports of public drug use.

Why are you saying this? The linked article has the title (boldface mine):

A Canadian province decriminalized hard drugs. Now it’s reversing course.

Cannabis is legal in all of Canada since 2018: https://en.wikipedia.org/wiki/Cannabis_in_Canada

(Some restrictions do apply, including a ban on transporting it across international borders and usual restrictions on quantity, age, distribution, and DUI.)

Reply
anaguma's Shortform
mishka5d20

I think we do tend to underestimate differences between people.

We know theoretically that people differ a lot, but we usually don’t viscerally feel how strong those differences are. One of the most remarkable examples of that is described here:

https://www.lesswrong.com/posts/NyiFLzSrkfkDW4S7o/why-it-s-so-hard-to-talk-about-consciousness

With AI existential safety, I think our progress is so slow because people mostly pursue anthropocentric approaches. Just like with astronomy, one needs a more invariant point of view to make progress.

I’ve done a bit of scribblings along those lines: https://www.lesswrong.com/posts/WJuASYDnhZ8hs5CnD/exploring-non-anthropocentric-aspects-of-ai-existential

But that’s just a starting point, a seed of what needs to be done in order to make progress…

Reply
anaguma's Shortform
mishka5d20

Same here :-)

I do see feasible scenarios where these things are sustainably nice.

But whether we end up reaching those scenarios... who knows...

Reply
Load More
19Some of the ways the IABIED plan can backfire
2mo
16
5mishka's Shortform
1y
10
21Digital humans vs merge with AI? Same or different?
2y
11
9What is known about invariants in self-modifying systems?
Q
2y
Q
2
2Some Intuitions for the Ethicophysics
2y
4
26Impressions from base-GPT-4?
Q
2y
Q
25
22Ilya Sutskever's thoughts on AI safety (July 2023): a transcript with my comments
2y
3
13What to read on the "informal multi-world model"?
Q
2y
Q
23
10RecurrentGPT: a loom-type tool with a twist
2y
0
22Five Worlds of AI (by Scott Aaronson and Boaz Barak)
3y
6
Load More