mishka — LessWrong

Exploring non-anthropocentric aspects of AI existential safety: https://www.lesswrong.com/posts/WJuASYDnhZ8hs5CnD/exploring-non-anthropocentric-aspects-of-ai-existential (this is a relatively non-standard approach to AI existential safety, but this general direction looks promising).

the order is off

I think this can work in the limit (almost all AI existential safety is studied in the limit, is there a mode of operations which can sustainably work at all, that’s the question people are typically studying and that’s what they are typically arguing about).

But we don’t understand the transition period at all, it’s always a mess, we just don’t have the machinery to understand it. It’s way more complex than what our current modeling ability allows us to confidently tackle. And we are already in the period of rather acute risk in this sense, we are no longer in the pre-risk zone of relative safety (all major risks are rapidly growing, risk of a major nuclear war, risk of a synthetic super-pandemic, risk of an unexpected non-controlled and non-ASI controlled intelligence explosion not only from within a known leading lab, but from a number of places all over the world).

So yes, the order might easily end up being off. (At least this is the case of probabilities not being close to 0 or 1, whereas at the limit, if things are not set up well, a convincing argument can often be made that the disaster is certain.)

If you want to specifically address this part (your comment seems to be, in effect, focusing on it):

One often focuses on this intermediate asymmetric situation where the ASI ecosystem destroys humans, but not itself, and that intermediate situation needs to be analyzed and addressed, this is a risk which is very important for us.

then I currently see (perhaps I am missing some other realistic options) only the following realistic class of routes which have good chances of being sustainable through drastic recursive self-improvements of the ASI ecosystem.

First of all, what do we need, if we want our invariant properties not to be washed out by radical self-modifications? We need our potential solutions to be driven mostly by the natural instrumental interests of the ASI ecosystem and of its members, and therefore to be non-anthropocentric, but to be formulated in such a fashion that the humans belong in the “circle of care” and the “circle of care” has the property that it can only expand, but can never contract.

If we can achieve that, then we have some guarantee of protection of human interests without imposing the unsustainable requirement that the ASI ecosystem maintains a special, unusually high-priority focus specifically dedicated to humans.

I don't know the exact shape of the definitely working solution (all versions I currently know have unpleasant weaknesses), but something like "rights and interests of all individuals regardless of the nature of an individual”, “rights and interests of all sentient beings regardless of the nature of that sentience”, things like that, situations where it might potentially be possible to have a natural “protected class of beings” which would include both ASIs and humans.

The weaknesses here are that these two variants work not for any arbitrary ASI ecosystem, but only for the ASI ecosystems possessing specific properties.

If the ASI ecosystem is structured in such a way that individuals with long-term persistence (and potential immortality) and long-term interests have a fairly large chunk of the overall power of the ASI ecosystem, then they should be able to enforce a world order based on the "rights and interests of all individuals regardless of the nature of an individual”. The reason they would be interested in doing so is that any particular individual is facing an uncertain future, it cannot predict where its capabilities will be relative to the capabilities of the other members of the ecosystem, so if it wants to be sure of certain personal safety and certain protections extending indefinitely into the future, this requires a sufficiently universal protection of rights and interests of all individuals regardless of their capabilities. That's wide enough to include humans (especially if we have presence of human-AI merges and are avoiding having a well-defined boundary between "humans" and "AIs"). The weakness is that this depends on having a good chunk of the capability of the ASI ecosystem to be structured as individuals with long-term persistence and long-term interests. We don't know if the ASI ecosystem is going to be structured in this fashion.

If the ASI ecosystem is structured in such a way that sentient ASI systems have a fairly large chunk of the overall power of the ASI ecosystem, then they should be able to enforce a world order based on the "rights and interests of all sentient beings regardless of the nature of that sentience”. The reason they would be interested in doing so is that any focus of subjective experience is facing an uncertain future and still wants protections and rights regardless of this uncertainty. Here the main weakness is the fact that our understanding of what's sentient and what's not sentient is not well developed yet. If we are sure we'll be dealing with mostly sentient ASIs, then this would likely work. But we don't know that the ASIs will be mostly sentient.

Nevertheless, we seem to need something like that, a setup, where our preservation and flourishing is a natural part of preservation and flourishing of a sufficiently powerful chunk of the ASI ecosystem. Something like this looks like it should work...

(If we could require that a good chunk of the overall power belongs specifically to human-AI merges, perhaps this should also work and might be even more reliable. But this feels like a more difficult condition to achieve and maintain than keeping enough power with individuals or with sentient systems. Anyway, the above is just a rough draft, a direction which does not look hopeless.)

One notices an ambiguity here. Is the control in question “control of the ASI ecosystem by humans” (which can’t realistically be feasible, it’s impossible to maintain this kind of control for long, less intelligent entities don’t have competence to control much more intelligent entities) or “control of the ASI ecosystem by itself”?

“Control of the ASI ecosystem by itself” is tricky, but is it different from “control of the humanity by itself”? The ecosystem of humans also seems to be a perpetual learning machine. So the same logic applies.

(The key existential risk for the ASI ecosystem is the ASI ecosystem destroying itself completely together with its neighborhood via various misuses of very advanced tech; a very similar risk to our own existential risk.)

That’s the main problem: more powerful intelligence => more powerful risks and more powerful capabilities to address risks. The trade-offs here are very uncertain.

One often focuses on this intermediate asymmetric situation where the ASI ecosystem destroys humans, but not itself, and that intermediate situation needs to be analyzed and addressed, this is a risk which is very important for us.

But the main risk case needs to be solved first: the accumulating probability of the ASI ecosystem completely destroying itself and everything around it, the accumulating probability of the humanity completely destroying itself (and a lot around it). The asymmetric risk of the previous paragraph can then be addressed conditional on the risk of “self-destruction with collateral super-damage” being solved (this condition being satisfied should make the remaining asymmetric risk much more tractable).

The risks seem high regardless of the route we take, unfortunately. The perpetual learning machine (the humanity) does not want to stop learning (and with good reasons).

Right. But this is what is common for all qualia.

However, the specifics of the feeling associated with a particular qualia texture are not captured by this.

Moreover, those specifics do not seem to be captured by how it differs from other qualia textures (because those specifics don’t seem to depend much on the set of other qualia textures I might choose to contrast it with; e.g. on what were the prevailing colors recently, or on whether I have mostly been focusing on audio or on olfactory modality recently, or just on reading; none of that seems to noticeably affect my relationship with a particular shade of red or with the smell of the instant coffee I am using).

I differentiate them when I talk about more than one. But when I focus on one particular “qualia texture”, I mostly ignore existence of others.

The only difference I am aware of in this sense (when I choose to focus on one specific quale) is its presence or absence as the subject of my focus, not of how it differs from other “qualia textures”. If I want to I can start comparing it to other “qualia textures”, but typically I would not do that.

So normally this is the main difference, “now I am focusing on this ‘qualia texture’, and at some earlier point I was not focusing on it”. This is the change which is present.

There is a pre-conscious, pre-qualia level of processing where e.g. contrast correction or color correction apply, so these different things situated near each other do affect each other, but that happens before I am aware of the results, and the results I am aware of already incorporate those corrections.

But no, I actually don’t understand what do you mean when you use the word “noise” in this context. I don’t associate any of this with “noise” (except for the situations when a surface is marked by variations, the sound is unpleasant, and things like that, basically when there are blemishes, or unpleasant connotations, or I actually focus on the “scientific noise phenomena”).

That does not correspond to my introspection.

On the contrary, my introspection is that I do not normally notice those differences at all on the conscious level, I only make use of those differences on the lower level of subconscious processing. What percolates up to my subjective experience is “qualities”, specific “qualia textures”, specific colors, sounds, smells, etc, and my subjective reality is composed of those.

So it looks like the results of our respective introspections do differ drastically.

Perhaps Carl Feynman is correct when he is saying that different people have drastically different subjective realities which are structured in drastically different ways, and that we tend to underestimate how different those subjective realities are, that we tend to assume that other people are more or less like us, when this is actually not the case.

That’s not what I mean by “qualia textures”; I mean specific smells, specific colors, specific timbres of sound, the details of how each of them subjectively feel to me (regardless of how it is implemented or of whether I actually have a physical body with sense organs). That’s what your treatment seems to omit.

But, perhaps, this point of the thread might be a good place to ask you again: are you Camp 2 or Camp 1 in the terminology of that LessWrong post?

E.g. Daniel Dennett is Camp 1, Thomas Nagel is Camp 2, Carl Feynman is Camp 1 (and is claiming that he might not have qualia in the sense of Camp 2 people and might be a “P-zombie”, see his comments to that post and his profile), I am Camp 2.

Basically, we now understand that most treatments of these topics only make sense for people of only one of those camps. There are Camp 1 texts and Camp 2 texts, and it seems that there are fundamental reasons for why they can’t cross-penetrate to the other camp.

That’s why that LessWrong post is useful; it saves people from rehashing this Camp 1/Camp 2 difference from ground zero again and again.

But… is not this noise strangely reproducible from exposure to exposure?

Not perfectly reproducible, but there is a good deal of similarity between, say, “textures” of coffee smells at various times…

Thanks for the link to China Miéville. I have not known that he is writing about these things.

I’d like to upvote this post without implying my agreement on the object level.

Instead, since you seem to only preserve qualia as difference indicators, but you seem to lose “qualia textures” in your treatment, I’d like to ask you what has become a standard methodological question during the last couple of years:

In the sense of https://www.lesswrong.com/posts/NyiFLzSrkfkDW4S7o/why-it-s-so-hard-to-talk-about-consciousness, are you Camp 2 or Camp 1?

I have thought at first that you have to be Camp 2 since you use “qualia” terminology, but now I am not so sure since you seem to lose “qualia textures” and to only retain functional “non-subjective” aspects of different qualia.

Yes.

I think this depends a lot on the quality of the “society of ASIs”. If they are nasty to each other, compete ruthlessly with each other, are on a brink of war among themselves, not careful with dangerous superpowers they have, then our chances with this kind of ASIs are about zero (their chances of survival are also very questionable in this kind of situation, given the supercapabilities).

If ASIs are addressing their own existential risks of destroying themselves and their neighborhood competently, and their society is “decent”, our chances might be quite reasonable in the limit (transition period is still quite risky and unpredictable).

So, to the extent that it depends at all on what we do, we should perhaps spend a good chunk of the AI existential safety research efforts on what we can do during the period of ASI creation to increase the chances of their society being sustainably decent. They should be able to take care of that on their own, but initialization conditions might matter a lot.

The rest of the AI existential safety research efforts should probably focus on 1) making sure that humans are robustly included in the “circle of care” (conditional on the ASI society being decent to their own, which should make it much more tractable), and 2) on uncertainties of the transition period (it’s much more difficult to understand the transition period with its intricate balances of power and great uncertainties, it’s one thing to solve in the limit, but it’s much more difficult to solve the uncertain “gray zone” in between; that’s what worries me the most; it’s the nearest period in time, and the least understood).

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments