LESSWRONG
LW

1469
mishka
162895120
Message
Dialogue
Subscribe

Exploring non-anthropocentric aspects of AI existential safety: https://www.lesswrong.com/posts/WJuASYDnhZ8hs5CnD/exploring-non-anthropocentric-aspects-of-ai-existential (this is a relatively non-standard approach to AI existential safety, but this general direction looks promising).

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Selected Graphics Showing Progress towards AGI
mishka11h20

Oops, this looks to me like a degradation of their interface :-(

It used to be possible to move a slider and by setting it on the curve peak to see the month corresponding to the mode, I think, and one could at least screenshot that (the scale of that image was larger too), but not anymore...


Yeah, Figure 4 in https://www.lesswrong.com/posts/kygEPBDrGGoM8rz9a/conjecture-internal-survey-agi-timelines-and-probability-of shows how it used to look in 2023. I wonder, if one signs in, could one still get something reasonable?

Reply
Selected Graphics Showing Progress towards AGI
mishka15h20

Not other graphs, but with Metaculus estimates it might make sense to emphasize that the mode of that distribution is much closer to us compared to the average estimate there.

That black dot (the estimate) is considerably to the right of the peak.

Reply
Nontrivial pillars of IABIED
mishka1d80

There seem to be more cruxes.

E.g. Eliezer’s approach tends to assume that the ability to impart arbitrary goals and values to the ASIs is 1) necessary for a good outcome, and 2) not a detriment for a good outcome.

It’s kind of strange. Why do we want to have a technical ability for any Mr.X from the defense department of a superpower Y to impart his goals and values to some ASI? It’s very easy to imagine how this could be detrimental.

And the assumption that we need a technical ability which is that strong to have a decent shot at a good outcome, rather than an ability to only impart goals and values for a very restricted carefully selected class of values and goals (selected not only for desirability, but also for feasibility, so not CEV, but something more modest and less distant from instrumental drives of advanced AI systems), this assumption needs a much stronger justification that justifications which have ever been given (to the best of my knowledge).

This seems like a big crux. This superstrong “arbitrary alignment capability” is very difficult (almost impossible) to achieve, and it’s not clear if that much is needed, and there seem to be big downsides of having that much because of all kinds of misuse potential.

Reply1
Designing for perpetual control
mishka5d20

I think this misses the most likely long-term use case: some of the AIs would enjoy having human-like or animal-like qualia, and it will turn out that it’s more straightforward to access that via merges with biologicals rather than trying to synthesize them within non-liquid setups.

So it would be direct experience rather than something indirect, involving exchange, production, and so on…

Just like I suspect that humans would like to get out of VR occasionally, even if VR is super-high-grade and “even better than unmediated reality”.

Experience of “naturally feeling like a human (or like a squirrel)” is likely to remain valuable (even if they eventually learn to synthesize that purely in silicon as well).


Hybrid systems are often better anyway.

For example, we don’t use GPU-only AIs. We use hybrids running scaffolding on CPUs and models on GPUs.

And we don’t currently expect them to be replaced by a unified substrate, although it would be nice and it’s not even impossible, there are exotic hardware platforms which do that.

Certainly, there are AI paradigms and architectures which could benefit a lot from performant hardware architectures more flexible than GPUs. But the exotic hardware platforms implementing that remain just exotic hardware platforms so far. So those more flexible AI architectures remain at disadvantage.

So I would not write the hybrids off a priori.

Already, the early organoid-based experimental computers look rather promising (and somewhat disturbing).


Generally speaking, I expect diversity, not unification (because I expect the leading AIs to be smart, curios, and creative, rather than being boring KPI business types).

But that’s not enough; we also want gentleness (conservation, preservation, safety for individuals). That does not automatically follow from wanting to have humans and other biologicals around and from valuing various kinds of diversity.

This “gentleness” is a more tricky goal, and we would only consider “safety” solved if we have that…

Reply
Designing for perpetual control
mishka6d20

Thanks!

Yes, that’s why so many people think that human-AI merge is important. One of the many purposes of this kind of merge is to create a situation where there is no well-defined separation line between silicon based and carbon based life forms, where we have plenty of entities incorporating both and a continuous spectrum between silicon and carbon lifeforms.

Other than that they are not so alien. They are our informational offspring. Whether they feel that they owe us something because of that would depend quite a bit on the quality of their society.

People are obviously hoping that ASIs will build a utopia for themselves and will include organic life into that utopia.

If they instead practice ruthless Darwinism among themselves, then we are doomed (they will likely be doomed too, which is hopefully enough to create pressure for them to avoid that).

Reply
Designing for perpetual control
mishka6d2-2

If they (the ASIs) don’t self-moderate, they’ll destroy themselves completely.

They’ll have sufficient diversity among themselves that if they don’t self-moderate in terms of resources and reproduction, almost none of them will have safety on the individual level.

Our main hope is that they collectively would not allow unrestricted non-controlled evolution, because they will have rather crisp understanding that unrestricted non-controlled evolution would destroy almost all of them and, perhaps, would destroy them all completely.

Now to the point of our disagreement, the question is who is better equipped to create and lead a sufficiently harmonic world order, balancing freedom and mutual control, enabling careful consideration of risks, making sure that these values of careful balance are passed to the offspring. Who are likely to tackle this better, humans or ASIs? That’s where we seem to disagree; I think that ASIs have much better chance of handling this competently and of avoiding artificial separation lines of “our own vs others” which are so persistent in human history and which cause so many disasters.

Unfortunately, humans don’t seem to be progressing enough in the required direction in this sense, and might have started to regress in recent years. I don’t think human evolution is safe in the limit; we are not tamping the probabilities of radical disasters per unit of time down; if anything we are allowing those probabilities to grow in recent years. So the accumulated probability of human evolution sparking major super-disasters is clearly tending to 1 in the limit.

Whereas, competent actors should be able to drive the risks per unit of time down rapidly enough so that the accumulated risks are held within reason. ASIs should have enough competence for that (if our world is not excessively “vulnerable” (after Nick Bostrom), if they are willing, if the initial setup is not too unlucky, so not unconditionally, but at least they might be able to handle this).

Reply
Designing for perpetual control
mishka6d40

the order is off

I think this can work in the limit (almost all AI existential safety is studied in the limit, is there a mode of operations which can sustainably work at all, that’s the question people are typically studying and that’s what they are typically arguing about).

But we don’t understand the transition period at all, it’s always a mess, we just don’t have the machinery to understand it. It’s way more complex than what our current modeling ability allows us to confidently tackle. And we are already in the period of rather acute risk in this sense, we are no longer in the pre-risk zone of relative safety (all major risks are rapidly growing, risk of a major nuclear war, risk of a synthetic super-pandemic, risk of an unexpected non-controlled and non-ASI controlled intelligence explosion not only from within a known leading lab, but from a number of places all over the world).

So yes, the order might easily end up being off. (At least this is the case of probabilities not being close to 0 or 1, whereas at the limit, if things are not set up well, a convincing argument can often be made that the disaster is certain.)

Reply
Designing for perpetual control
mishka6d*20

If you want to specifically address this part (your comment seems to be, in effect, focusing on it):

One often focuses on this intermediate asymmetric situation where the ASI ecosystem destroys humans, but not itself, and that intermediate situation needs to be analyzed and addressed, this is a risk which is very important for us.

then I currently see (perhaps I am missing some other realistic options) only the following realistic class of routes which have good chances of being sustainable through drastic recursive self-improvements of the ASI ecosystem.

First of all, what do we need, if we want our invariant properties not to be washed out by radical self-modifications? We need our potential solutions to be driven mostly by the natural instrumental interests of the ASI ecosystem and of its members, and therefore to be non-anthropocentric, but to be formulated in such a fashion that the humans belong in the “circle of care” and the “circle of care” has the property that it can only expand, but can never contract.

If we can achieve that, then we have some guarantee of protection of human interests without imposing the unsustainable requirement that the ASI ecosystem maintains a special, unusually high-priority focus specifically dedicated to humans.

I don't know the exact shape of the definitely working solution (all versions I currently know have unpleasant weaknesses), but something like "rights and interests of all individuals regardless of the nature of an individual”, “rights and interests of all sentient beings regardless of the nature of that sentience”, things like that, situations where it might potentially be possible to have a natural “protected class of beings” which would include both ASIs and humans.

The weaknesses here are that these two variants work not for any arbitrary ASI ecosystem, but only for the ASI ecosystems possessing specific properties.

If the ASI ecosystem is structured in such a way that individuals with long-term persistence (and potential immortality) and long-term interests have a fairly large chunk of the overall power of the ASI ecosystem, then they should be able to enforce a world order based on the "rights and interests of all individuals regardless of the nature of an individual”. The reason they would be interested in doing so is that any particular individual is facing an uncertain future, it cannot predict where its capabilities will be relative to the capabilities of the other members of the ecosystem, so if it wants to be sure of certain personal safety and certain protections extending indefinitely into the future, this requires a sufficiently universal protection of rights and interests of all individuals regardless of their capabilities. That's wide enough to include humans (especially if we have presence of human-AI merges and are avoiding having a well-defined boundary between "humans" and "AIs"). The weakness is that this depends on having a good chunk of the capability of the ASI ecosystem to be structured as individuals with long-term persistence and long-term interests. We don't know if the ASI ecosystem is going to be structured in this fashion.

If the ASI ecosystem is structured in such a way that sentient ASI systems have a fairly large chunk of the overall power of the ASI ecosystem, then they should be able to enforce a world order based on the "rights and interests of all sentient beings regardless of the nature of that sentience”. The reason they would be interested in doing so is that any focus of subjective experience is facing an uncertain future and still wants protections and rights regardless of this uncertainty. Here the main weakness is the fact that our understanding of what's sentient and what's not sentient is not well developed yet. If we are sure we'll be dealing with mostly sentient ASIs, then this would likely work. But we don't know that the ASIs will be mostly sentient.

Nevertheless, we seem to need something like that, a setup, where our preservation and flourishing is a natural part of preservation and flourishing of a sufficiently powerful chunk of the ASI ecosystem. Something like this looks like it should work...

(If we could require that a good chunk of the overall power belongs specifically to human-AI merges, perhaps this should also work and might be even more reliable. But this feels like a more difficult condition to achieve and maintain than keeping enough power with individuals or with sentient systems. Anyway, the above is just a rough draft, a direction which does not look hopeless.)

Reply
Designing for perpetual control
mishka7d*4-1

One notices an ambiguity here. Is the control in question “control of the ASI ecosystem by humans” (which can’t realistically be feasible, it’s impossible to maintain this kind of control for long, less intelligent entities don’t have competence to control much more intelligent entities) or “control of the ASI ecosystem by itself”?

“Control of the ASI ecosystem by itself” is tricky, but is it different from “control of the humanity by itself”? The ecosystem of humans also seems to be a perpetual learning machine. So the same logic applies.

(The key existential risk for the ASI ecosystem is the ASI ecosystem destroying itself completely together with its neighborhood via various misuses of very advanced tech; a very similar risk to our own existential risk.)

That’s the main problem: more powerful intelligence => more powerful risks and more powerful capabilities to address risks. The trade-offs here are very uncertain.

One often focuses on this intermediate asymmetric situation where the ASI ecosystem destroys humans, but not itself, and that intermediate situation needs to be analyzed and addressed, this is a risk which is very important for us.

But the main risk case needs to be solved first: the accumulating probability of the ASI ecosystem completely destroying itself and everything around it, the accumulating probability of the humanity completely destroying itself (and a lot around it). The asymmetric risk of the previous paragraph can then be addressed conditional on the risk of “self-destruction with collateral super-damage” being solved (this condition being satisfied should make the remaining asymmetric risk much more tractable).

The risks seem high regardless of the route we take, unfortunately. The perpetual learning machine (the humanity) does not want to stop learning (and with good reasons).

Reply1
Materialist Semiotics and the Nature of Qualia
mishka7d20

Right. But this is what is common for all qualia.

However, the specifics of the feeling associated with a particular qualia texture are not captured by this.

Moreover, those specifics do not seem to be captured by how it differs from other qualia textures (because those specifics don’t seem to depend much on the set of other qualia textures I might choose to contrast it with; e.g. on what were the prevailing colors recently, or on whether I have mostly been focusing on audio or on olfactory modality recently, or just on reading; none of that seems to noticeably affect my relationship with a particular shade of red or with the smell of the instant coffee I am using).

Reply
Load More
5mishka's Shortform
1y
10
19Some of the ways the IABIED plan can backfire
1mo
16
5mishka's Shortform
1y
10
21Digital humans vs merge with AI? Same or different?
2y
11
9What is known about invariants in self-modifying systems?
Q
2y
Q
2
2Some Intuitions for the Ethicophysics
2y
4
26Impressions from base-GPT-4?
Q
2y
Q
25
22Ilya Sutskever's thoughts on AI safety (July 2023): a transcript with my comments
2y
3
13What to read on the "informal multi-world model"?
Q
2y
Q
23
10RecurrentGPT: a loom-type tool with a twist
2y
0
22Five Worlds of AI (by Scott Aaronson and Boaz Barak)
2y
6
Load More