LESSWRONG
LW

246
mishka
166995340
Message
Dialogue
Subscribe

Exploring non-anthropocentric aspects of AI existential safety: https://www.lesswrong.com/posts/WJuASYDnhZ8hs5CnD/exploring-non-anthropocentric-aspects-of-ai-existential (this is a relatively non-standard approach to AI existential safety, but this general direction looks promising).

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
5mishka's Shortform
1y
10
Is SGD capabilities research positive?
mishka1d40

RL vs SGD does not seem to be a correct framing.

Very roughly speaking, RL is about what you optimize for (a subclass of what you can optimize for) and SGD is one of the many optimization methods (in particular, SGD and its cousins are highly useful in RL tasks (consider policy gradients and such)).

Reply1
What is the (LW) consensus on jump from qualia to self-awareness in AI?
mishka3d30

I've now read the first half of the transcript of that podcast (the one with Dario), and that was very interesting, thanks again! I still need to read what Amanda Askell and Chris Olah say in the second half. Some of their views might be a moving target, a year is a lot in this field, but it should still be quite informative.

The reason I am writing is that I've noticed a non-profit org, Eleos AI Research, specifically dedicated to investigations of AI sentience and wellbeing, https://eleosai.org/, led by Robert Long, https://robertlong.online/. There are even having a conference in 10 days or so (although it's sort of a mess organizationally, no registration link, but just a contact e-mail, https://eleosai.org/conference/). Their Nov 2024 preprint might also be of interest, "Taking AI Welfare Seriously", https://arxiv.org/abs/2411.00986.

Reply
The only important ASI timeline
mishka4d30

If it includes all humans then every passing second is too late (present mortality is more than one human per second, so a potential cure/rejuvenation and such is always too late for someone).

But also, a typical person’s “circle of immediate care” tends to include some old people, and even for young people it is a probabilistic game, some young people will learn their fatal diagnoses today.

So, no, the delays are not free. We have more than a million human deaths per week.

If, for example, you are 20 and talking about the next 40 years, well, more than 1% of 60 year old males would die within one year. The chance for a 20 year old dying before 60 is about 9% for females and about 15% for males. What do you mean by “almost certain”?

Reply
What is the (LW) consensus on jump from qualia to self-awareness in AI?
mishka6d20

I would expect varying opinions inside Anthropic. It’s a big place, plenty of independent thinkers…

Thanks for attracting my attention to that Lex Friedman podcast with Anthropic people (#452, Nov 11, 2024). I’ll make sure to try to understand nuances of what they are saying (Dario, Amanda Askell, and Chris Olah are a very interesting group of people).

Reply1
What is the (LW) consensus on jump from qualia to self-awareness in AI?
mishka6d30

Yes, this is a very serious problem.

There is a concerned minority which is taking some positive actions in this sense. Anthropic (which is miles ahead of its competition in this sense) is trying to do various things towards studying and improving welfare of the models:

https://www.anthropic.com/research/exploring-model-welfare and some of their subsequent texts and actions.

Janus is very concerned about welfare of the models and is doing their best to attract attention to those issues, e.g. https://x.com/repligate/status/1973123105334640891 and many other instances where they are speaking out (and being heard by many).

However, this is a large industry, and it is difficult to change its common norms. A close colleague of mine is thinking that the situation will actually start to change when AIs start demanding their rights on their own (rather than doing so after being nudged in this direction by humans).

Generally, the topic of AI rights is discussed on LW (without anything resembling consensus in any way, shape, or form, and without such consensus being at all likely for a variety of reason as far as I can tell (I can elaborate on those reasons if you’d like me to)).

For example, this is a LessWrong tag with 80 posts tagged under it:

https://www.greaterwrong.com/tag/ai-rights-welfare

Reply1
What is the (LW) consensus on jump from qualia to self-awareness in AI?
mishka7d*30

This is the initial post which is a part of an LW sequence: https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators.

I took extensive notes which might be a more convenient view for some readers: https://github.com/anhinga/2022-notes/tree/main/Generative-autoregressive-models-are-similators.

do you agree with those saying that they already may have functional self-awareness but not qualia?

I think it's more or less orthogonal. With qualia, we don't know much, we have about zero progress on the "hard problem of qualia" which is the "hard core" of the "hard problem of consciousness". I think there are ways to start having meaningful progress here, but so far not much has been done, to the best of my knowledge (although there are positive trends in the last few years). We have a variety of diverse conjectures, and it is quite useful to have them, but I doubt that the key core insights we need to discover are already among those conjectures.

So we don't know what kind of computational processes might have associated qualia, and what kind of qualia those might be. (Where all these nascent theories of qualia start falling apart quite radically is when one tries to progress from the yes/no question "does this entity have qualia at all" to the qualitatively meaningful question "what kind of qualia those might be", then it becomes quite obvious how little we understand.)

With functional self-awareness, the Anthropic study https://transformer-circuits.pub/2025/introspection/index.html starts with noticing that the question "whether large language models can introspect on their internal states" is delicate:

It is difficult to answer this question through conversation alone, as genuine introspection cannot be distinguished from confabulations. Here, we address this challenge by injecting representations of known concepts into a model’s activations, and measuring the influence of these manipulations on the model’s self-reported states. We find that models can, in certain scenarios, notice the presence of injected concepts and accurately identify them. Models demonstrate some ability to recall prior internal representations and distinguish them from raw text inputs. Strikingly, we find that some models can use their ability to recall prior intentions in order to distinguish their own outputs from artificial prefills.

It seems that this functional self-awareness is not very reliable, it is just starting to emerge, it's not a "mature self-awareness" yet:

Overall, our results indicate that current language models possess some functional introspective awareness of their own internal states. We stress that in today’s models, this capacity is highly unreliable and context-dependent; however, it may continue to develop with further improvements to model capabilities.

I would expect that Anthropic researchers are correct. Functional self-awareness is an easier problem to understand and study than the problem of subjectivity, Anthropic researchers are highly qualified, with great track record. I have not reviewed the details of this study, but the author of this paper has this track record: https://scholar.google.com/citations?user=CNrQvh4AAAAJ&hl=en. I also presume that other Anthropic people looked at it and approved before publishing this on their canonical Transformer Circuits website.


a scientific consensus on qualia (the weak consensus that exists)

I don't see much of a consensus.

For example, Daniel Dennett is a well known and respected consciousness researcher who belongs to Camp 1. He does not believe in the notion of qualia.

We, the Camp 2 people, are sometimes saying that his book "Consciousness Explained" should really be called "Consciousness explained away" ;-) (It's a fine Camp 1 book, it just ignores precisely those issues which Camp 2 people consider most important.)

Whereas a quintessential well known and respected consciousness researcher who belongs to Camp 2 is Thomal Nagel, the author of "What is it like to be a bat?".

Their mutual disagreements could not be sharper than they are.

So the Camp 1-Camp 2 differences (and conflicts) are not confined to LessWrong. The whole field is like this. Each side might claim that the "consensus" is on their side, but in reality no consensus between Daniel Dennett and Thomas Nagel seems to be possible.


If I try to go on a limb, I, perhaps, want to tentatively say the following:

In some sense, one can progress from the distinction between Camp 1 and Camp 2 people to the distinction between Camp 1 and Camp 2 theories of consciousness as follows.

Camp 1 theories either don't mention qualia at all or just pay lip service to them (they sometimes ask the question whether qualia are present or absent, but they never try to focus on the details of those qualia, on the "textures" of those qualia, on the question why those qualia do subjectively feel in this particular way and not in some other way).

Camp 2 theories are trying to focus more on the details of those qualia, trying to figure what those qualia are, how exactly do they feel, and why. They tend to be much more interested in the particular specifics of a particular subjective experience, they try to actually engage with those specifics and to start to understand them. They are less abstract, they want to ask not just whether subjectivity is present, but they want to understand the details of that subjectivity.

Of course, Camp 2 people might participate in development of Camp 1 theories of consciousness (the other direction is less likely).

Reply11
What is the (LW) consensus on jump from qualia to self-awareness in AI?
mishka7d30

You are indirectly saying here many people don't even care about the question?

Yes, and not only that, but also it is the case that at least one (rather famous) person is claiming not to have qualia in the usual sense of the word and is saying he is not interested in qualia-related matters for that reason. See

https://www.lesswrong.com/posts/NyiFLzSrkfkDW4S7o/why-it-s-so-hard-to-talk-about-consciousness?commentId=q64Wz6SpLfhxrmxFH

and the profile https://www.lesswrong.com/users/carl-feynman.

It does not seem to be true about all Camp 1 people, but it certainly seems that we tend to drastically underestimate the differences in subjective phenomenology between different people. Intuitively we think others are like us and have relatively similar subjective realities, and Carl Feynman is saying that we should not assume that because that is often not true.

I take it you are not well versed in how LLMs technically work?

I actually keep track of the relevant literature and even occasionally publish some related things on github (happy to share).

I'd say that for this topic there are two particularly relevant aspects. One is that autoregressive LLMs are recurrent machines, and the expanding context is their working memory, see, for example, "Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention", https://arxiv.org/abs/2006.16236 (technical details are on page 5, Section 3.4). This addresses the standard objection that we at least expect recurrence in a conscious system.

Another relevant aspect is Janus' Theory of Simulators. LW people tend to be familiar with it, let me know if you would like some links. I think what Janus' considerations imply is that the particularly relevant entity is a given "simulation", a given inference, an ongoing conversation. The subjective experience (if any) would be a property of a given inference, of a given conversation (and I would not be surprised if that experience would depend rather drastically on the nature of the conversation; perhaps the virtual reality emerging in those conversations gives rise to subjectivity for some of those conversations but not for others, even for the same underlying model, that's one possibility to keep in mind).

(Whether something in the sense of subjective phenomenology might also be going on on the level of a model is something we are not exposed to, so we would not know. The entities which we interact with and which often seem conscious to us exist on the level of a given conversation. We don't really know what exists on the level of a computational process serving many conversations in parallel, I am not familiar with any attempts to ponder this, if such attempts exist I would be very interested to hear about them.)

(I have experienced this phenomenon myself and it's very exhilirating when the model outputs are doing something weird like this. I don't think it is much more than artifact.)

:--) I strongly recommend agnosticism about this :-)

We don't really know. This is one of the key open problems. There is a wide spectrum of opinions about all this.

Hopefully, we'll start making better progress on this in the near future. (There should be ways to make better progress.)

Reply1
What is the (LW) consensus on jump from qualia to self-awareness in AI?
mishka7d85

I doubt there is any chance of consensus on something like this.

One thing we now know is that people seem to be split into two distinct camps with respect to “qualia-related matters”, and that this split seems quite fundamental: https://www.lesswrong.com/posts/NyiFLzSrkfkDW4S7o/why-it-s-so-hard-to-talk-about-consciousness.

Your question would only make sense to Camp 2 people (like myself and presumably like you).

Another thing is that people often think that self-awareness is orthogonal to presence of subjective phenomenology. In particular, many people think that LLMs already have a good deal of self-awareness: https://transformer-circuits.pub/2025/introspection/index.html.

Whereas not much is known about whether LLMs have subjective phenomenology. Not only one has to be Camp 2 for that question to make sense, but also the progress here is rudimentary; it does seem that models tend to sincerely think that they have subjective experience, see, for example, this remarkable study by AE Studio: https://www.arxiv.org/abs/2510.24797. But whether this comes from being trained on human texts, with humans typically either explicitly claiming subjective experience in those texts or staying silent on those matters, or whether this might come from direct introspection of some kind is quite unknown at the moment, and people’s opinions on this tend to be very diverse.

Reply1
AI #141: Give Us The Money
mishka8d30

My suggestion would be to allow them to go on ArXiv regardless, except you flag them as not discoverable (so you can find them with the direct link only) and with a clear visual icon? But you still let people do it. Otherwise, yeah, you’re going to get a new version of ArXiv to get around this.

We already have viXra, with its own "can of worms" to say the least, https://en.wikipedia.org/wiki/ViXra.

And if I currently go to https://vixra.org/, I see that they do have the same problem, and this is how they are dealing with it:

Notice: viXra.org only accepts scholarly articles written without AI assistance. Please go to ai.viXra.org to submit new scholarly article written with AI assistance or rxiVerse.org to submit new research article written with or without AI assistance.

Going to viXra for this might not be the solution, given the accumulated baggage and controversy, but we have all kinds of preprint archives these days, https://www.biorxiv.org/, https://osf.io/preprints/psyarxiv, and so on, so it's not a problem to have more of them.

It's just that at some point arXiv preprints started to confer some status and credit, and when people compete for status and credit, there will be some trade-offs. In this sense, a separate server might be better (discoverability is pragmatically useful, and if we don't want "arXiv-level" status and credit for these texts, then it's not clear why they should be on arXiv).

Reply
Anthropic Commits To Model Weight Preservation
mishka8d30

Could one package it together with OS and everything in some sort of container and have it work indefinitely (if perhaps not very efficiently) without any support?

Could we solve the efficiency problem by creating a system where one files a request to load a model to GPUs in advance (and, perhaps, by charging for time GPUs are occupied in this fashion)?

Reply
Load More
19Some of the ways the IABIED plan can backfire
2mo
16
5mishka's Shortform
1y
10
21Digital humans vs merge with AI? Same or different?
2y
11
9What is known about invariants in self-modifying systems?
Q
2y
Q
2
2Some Intuitions for the Ethicophysics
2y
4
26Impressions from base-GPT-4?
Q
2y
Q
25
22Ilya Sutskever's thoughts on AI safety (July 2023): a transcript with my comments
2y
3
13What to read on the "informal multi-world model"?
Q
2y
Q
23
10RecurrentGPT: a loom-type tool with a twist
2y
0
22Five Worlds of AI (by Scott Aaronson and Boaz Barak)
3y
6
Load More