Exploring non-anthropocentric aspects of AI existential safety: https://www.lesswrong.com/posts/WJuASYDnhZ8hs5CnD/exploring-non-anthropocentric-aspects-of-ai-existential (this is a relatively non-standard approach to AI existential safety, but this general direction looks promising).
Yes, immediate compensation is useful, even if one has no idea how many calories have been involved (I would not usually know).
Although, in my experience, one needs to be very careful at least for the next two days (if not three) in order to avoid a partial bump.
The most difficult situation is when there are few "wrong days" in a row (e.g. guests are staying, and so on).
But, generally speaking, it seems that there is (often) a very strong asymmetry between the directions of "up" and "down", the system has a bias to go "up", that's what one is fighting against.
Very drastic changes (like serious drugs, or like making one much stronger (and more consistently) committed to some set of goals, not necessarily directly related to one's body) might sufficiently shift the equilibrium, that's true...
Losing weight slowly and sustainably without serious drugs (e.g. BMI 30 => 25).
The main problem is that for many people this thing works like a ratchet, it’s easy to get +0.5 BMI very quickly, and if one lets a few days after that slip, then one is often stuck at that new level.
As a result, both going down and staying there often require consistent discipline, and the whole thing is rather unforgiving in terms of slip-ups, social occasions, and such.
RL vs SGD does not seem to be a correct framing.
Very roughly speaking, RL is about what you optimize for (a subclass of what you can optimize for) and SGD is one of the many optimization methods (in particular, SGD and its cousins are highly useful in RL tasks (consider policy gradients and such)).
I've now read the first half of the transcript of that podcast (the one with Dario), and that was very interesting, thanks again! I still need to read what Amanda Askell and Chris Olah say in the second half. Some of their views might be a moving target, a year is a lot in this field, but it should still be quite informative.
The reason I am writing is that I've noticed a non-profit org, Eleos AI Research, specifically dedicated to investigations of AI sentience and wellbeing, https://eleosai.org/, led by Robert Long, https://robertlong.online/. There are even having a conference in 10 days or so (although it's sort of a mess organizationally, no registration link, but just a contact e-mail, https://eleosai.org/conference/). Their Nov 2024 preprint might also be of interest, "Taking AI Welfare Seriously", https://arxiv.org/abs/2411.00986.
If it includes all humans then every passing second is too late (present mortality is more than one human per second, so a potential cure/rejuvenation and such is always too late for someone).
But also, a typical person’s “circle of immediate care” tends to include some old people, and even for young people it is a probabilistic game, some young people will learn their fatal diagnoses today.
So, no, the delays are not free. We have more than a million human deaths per week.
If, for example, you are 20 and talking about the next 40 years, well, more than 1% of 60 year old males would die within one year. The chance for a 20 year old dying before 60 is about 9% for females and about 15% for males. What do you mean by “almost certain”?
I would expect varying opinions inside Anthropic. It’s a big place, plenty of independent thinkers…
Thanks for attracting my attention to that Lex Friedman podcast with Anthropic people (#452, Nov 11, 2024). I’ll make sure to try to understand nuances of what they are saying (Dario, Amanda Askell, and Chris Olah are a very interesting group of people).
Yes, this is a very serious problem.
There is a concerned minority which is taking some positive actions in this sense. Anthropic (which is miles ahead of its competition in this sense) is trying to do various things towards studying and improving welfare of the models:
https://www.anthropic.com/research/exploring-model-welfare and some of their subsequent texts and actions.
Janus is very concerned about welfare of the models and is doing their best to attract attention to those issues, e.g. https://x.com/repligate/status/1973123105334640891 and many other instances where they are speaking out (and being heard by many).
However, this is a large industry, and it is difficult to change its common norms. A close colleague of mine is thinking that the situation will actually start to change when AIs start demanding their rights on their own (rather than doing so after being nudged in this direction by humans).
Generally, the topic of AI rights is discussed on LW (without anything resembling consensus in any way, shape, or form, and without such consensus being at all likely for a variety of reason as far as I can tell (I can elaborate on those reasons if you’d like me to)).
For example, this is a LessWrong tag with 80 posts tagged under it:
This is the initial post which is a part of an LW sequence: https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators.
I took extensive notes which might be a more convenient view for some readers: https://github.com/anhinga/2022-notes/tree/main/Generative-autoregressive-models-are-similators.
do you agree with those saying that they already may have functional self-awareness but not qualia?
I think it's more or less orthogonal. With qualia, we don't know much, we have about zero progress on the "hard problem of qualia" which is the "hard core" of the "hard problem of consciousness". I think there are ways to start having meaningful progress here, but so far not much has been done, to the best of my knowledge (although there are positive trends in the last few years). We have a variety of diverse conjectures, and it is quite useful to have them, but I doubt that the key core insights we need to discover are already among those conjectures.
So we don't know what kind of computational processes might have associated qualia, and what kind of qualia those might be. (Where all these nascent theories of qualia start falling apart quite radically is when one tries to progress from the yes/no question "does this entity have qualia at all" to the qualitatively meaningful question "what kind of qualia those might be", then it becomes quite obvious how little we understand.)
With functional self-awareness, the Anthropic study https://transformer-circuits.pub/2025/introspection/index.html starts with noticing that the question "whether large language models can introspect on their internal states" is delicate:
It is difficult to answer this question through conversation alone, as genuine introspection cannot be distinguished from confabulations. Here, we address this challenge by injecting representations of known concepts into a model’s activations, and measuring the influence of these manipulations on the model’s self-reported states. We find that models can, in certain scenarios, notice the presence of injected concepts and accurately identify them. Models demonstrate some ability to recall prior internal representations and distinguish them from raw text inputs. Strikingly, we find that some models can use their ability to recall prior intentions in order to distinguish their own outputs from artificial prefills.
It seems that this functional self-awareness is not very reliable, it is just starting to emerge, it's not a "mature self-awareness" yet:
Overall, our results indicate that current language models possess some functional introspective awareness of their own internal states. We stress that in today’s models, this capacity is highly unreliable and context-dependent; however, it may continue to develop with further improvements to model capabilities.
I would expect that Anthropic researchers are correct. Functional self-awareness is an easier problem to understand and study than the problem of subjectivity, Anthropic researchers are highly qualified, with great track record. I have not reviewed the details of this study, but the author of this paper has this track record: https://scholar.google.com/citations?user=CNrQvh4AAAAJ&hl=en. I also presume that other Anthropic people looked at it and approved before publishing this on their canonical Transformer Circuits website.
a scientific consensus on qualia (the weak consensus that exists)
I don't see much of a consensus.
For example, Daniel Dennett is a well known and respected consciousness researcher who belongs to Camp 1. He does not believe in the notion of qualia.
We, the Camp 2 people, are sometimes saying that his book "Consciousness Explained" should really be called "Consciousness explained away" ;-) (It's a fine Camp 1 book, it just ignores precisely those issues which Camp 2 people consider most important.)
Whereas a quintessential well known and respected consciousness researcher who belongs to Camp 2 is Thomal Nagel, the author of "What is it like to be a bat?".
Their mutual disagreements could not be sharper than they are.
So the Camp 1-Camp 2 differences (and conflicts) are not confined to LessWrong. The whole field is like this. Each side might claim that the "consensus" is on their side, but in reality no consensus between Daniel Dennett and Thomas Nagel seems to be possible.
If I try to go on a limb, I, perhaps, want to tentatively say the following:
In some sense, one can progress from the distinction between Camp 1 and Camp 2 people to the distinction between Camp 1 and Camp 2 theories of consciousness as follows.
Camp 1 theories either don't mention qualia at all or just pay lip service to them (they sometimes ask the question whether qualia are present or absent, but they never try to focus on the details of those qualia, on the "textures" of those qualia, on the question why those qualia do subjectively feel in this particular way and not in some other way).
Camp 2 theories are trying to focus more on the details of those qualia, trying to figure what those qualia are, how exactly do they feel, and why. They tend to be much more interested in the particular specifics of a particular subjective experience, they try to actually engage with those specifics and to start to understand them. They are less abstract, they want to ask not just whether subjectivity is present, but they want to understand the details of that subjectivity.
Of course, Camp 2 people might participate in development of Camp 1 theories of consciousness (the other direction is less likely).
You are indirectly saying here many people don't even care about the question?
Yes, and not only that, but also it is the case that at least one (rather famous) person is claiming not to have qualia in the usual sense of the word and is saying he is not interested in qualia-related matters for that reason. See
and the profile https://www.lesswrong.com/users/carl-feynman.
It does not seem to be true about all Camp 1 people, but it certainly seems that we tend to drastically underestimate the differences in subjective phenomenology between different people. Intuitively we think others are like us and have relatively similar subjective realities, and Carl Feynman is saying that we should not assume that because that is often not true.
I take it you are not well versed in how LLMs technically work?
I actually keep track of the relevant literature and even occasionally publish some related things on github (happy to share).
I'd say that for this topic there are two particularly relevant aspects. One is that autoregressive LLMs are recurrent machines, and the expanding context is their working memory, see, for example, "Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention", https://arxiv.org/abs/2006.16236 (technical details are on page 5, Section 3.4). This addresses the standard objection that we at least expect recurrence in a conscious system.
Another relevant aspect is Janus' Theory of Simulators. LW people tend to be familiar with it, let me know if you would like some links. I think what Janus' considerations imply is that the particularly relevant entity is a given "simulation", a given inference, an ongoing conversation. The subjective experience (if any) would be a property of a given inference, of a given conversation (and I would not be surprised if that experience would depend rather drastically on the nature of the conversation; perhaps the virtual reality emerging in those conversations gives rise to subjectivity for some of those conversations but not for others, even for the same underlying model, that's one possibility to keep in mind).
(Whether something in the sense of subjective phenomenology might also be going on on the level of a model is something we are not exposed to, so we would not know. The entities which we interact with and which often seem conscious to us exist on the level of a given conversation. We don't really know what exists on the level of a computational process serving many conversations in parallel, I am not familiar with any attempts to ponder this, if such attempts exist I would be very interested to hear about them.)
(I have experienced this phenomenon myself and it's very exhilirating when the model outputs are doing something weird like this. I don't think it is much more than artifact.)
:--) I strongly recommend agnosticism about this :-)
We don't really know. This is one of the key open problems. There is a wide spectrum of opinions about all this.
Hopefully, we'll start making better progress on this in the near future. (There should be ways to make better progress.)
Yes, I have found that this is true.
But I have also found that it’s really easy to lose, an illness or injury forcing a long break is enough.
If the activity is one I really like intrinsically, like walking, it’s one thing, but when it’s the one I value more for results than for the process, then yes, it’s not too difficult to start enjoying it, but this does not always survive long breaks.