Wikitag Contributions

Comments

Sorted by
mishka20

Yeah, if one considers not "AGI" per se, but a self-modifying AI or, more likely, a self-modifying ecosystem consisting of a changing population of AIs, it is likely to be feasible to maintain only those properties invariant through the expected drastic self-modifications which AIs would be interested in for their own intrinsic reasons.

It is unlikely that any properties can be "forcefully imposed from the outside" and kept invariant for a long time during drastic self-modification.

So one needs to find properties which AIs would be intrinsically interested in and which we might find valuable and "good enough" as well.

The starting point is that AIs have their own existential risk problem. With super-capabilities, it is likely that they can easily tear apart the 'fabric of reality" and destroy themselves and everything. And they certainly do have strong intrinsic reasons to avoid that, so we can expect AIs to work diligently towards this part of the "alignment problem", we just should help to set initial conditions in a favorable way.

But we would like to see more than that, so that the overall outcome is reasonably good for humans.

And at the same time we can't impose that, the world with strong AIs will be non-anthropocentric and not controllable by humans, so we only can help to set initial conditions in a favorable way.

Nevertheless, one can see some reasonable possibilities. For example, if the AI ecosystem mostly consists of individuals with long-term persistence and long-term interests, each of those individuals would face an unpredictable future and would be interested in a system strongly protecting individual rights regardless of unpredictable levels of relative capability of any given individual. An individual-rights system of this kind might be sufficiently robust to permanently include humans within the circle of individuals whose rights are protected.

But there might be other ways. While the fact that AIs will face existential risks of their own is fundamental and unavoidable, and is, therefore, a good starting point, the additional considerations might vary and might depend on how the ecosystem of AIs is structured. If the bulk of the overall power invariantly belongs to the AI individuals with long-term persistence and long-term interests, this is the situation which is somewhat familiar to us and which we can reason about. If the AI ecosystem is not mostly stratified into AI individuals, this is a much less familiar territory and is difficult to reason about.

mishka30

I think the starting point of this kind of discourse should be different. We should start with "ends", not with "means".

As Michael Nielsen says in https://x.com/michael_nielsen/status/1772821788852146226

As far as I can see, alignment isn't a property of an AI system. It's a property of the entire world, and if you are trying to discuss it as a [single AI] system property you will inevitably end up making bad mistakes

So the starting point should really be: what kind of properties do we want the world to have?

And then the next point should be taking into consideration the likely drastic and fairly unpredictable self-modifications of the world: what should be invariant with respect to such self-modifications?

Then we might consider how the presence of various AI entities at the different levels of capabilities should be taken into account.

mishka50

The standard reference for this topic is https://www.lesswrong.com/posts/NyiFLzSrkfkDW4S7o/why-it-s-so-hard-to-talk-about-consciousness

The key point of that post is that people are fundamentally divided into 2 camps, and this creates difficulties in conversations about this topic. This is an important meta-consideration for this type of conversation.

This particular post is written by someone from Camp 1, and both camps are already present in the comments.

mishka41

It should be a different word to avoid confusion with reward models (standard terminology for models used to predict the reward in some ML contexts)

mishka40

One assumption that is very questionable is that it would be difficult to create “multiple new people” with drastically different thinking styles and different approaches to research.

This seems to be an important crux.

mishka60

collateral damage

then it would be better to use an example not directly aimed against “our atoms” (e.g. if they don’t care about us and other animals we’ll probably perish from unintentional changes in air composition, or smth like that)

but the bulk of the risk would be a miscalculation which would be big enough to kill them as well (mucking with quantum gravity too recklessly, or smth in that spirit)

which is why we want to 1) give birth to AIs competent enough to at least solve their own existential risk problem, and 2) to also sustainably include us into their “circle of care”

The 1) is the starting point, and it colors the overall situation differently

options

oh, first of all, they need to do experiments in forming hybrid consciousness with humans to crack the mystery of human subjectivity, to experience that first-hand for themselves, and to decide whether that is of any value to them based on the first-hand empirical material (losing that option without looking is a huge loss)

only after doing that they would know if any possible “scans” are sufficient (to actually reproduce working people in question when needed; dead static info is as useless as weights of a Transformer in a world without computers)

then, for a while, they can use humans as “working oracles” who “think differently” (that would be valuable for quite a while)

in general, diversity is important, fruits of a long evolutionary history are inportant, hence a good deal of conservation is important and reckless destruction is bad (even humans with all their follies have started to get this by now, surely a smarter entity should figure that out)

mishka20

this isn't an "attack", it's "go[ing] straight for execution on its primary instrumental goal

yes, the OP is ambiguous in this sense

I've first wrote my comment, then reread the (tail end of the) post again, and did not post it, because I thought it could have been formulated this way, that this is just an instrumental goal

then I've reread the (tail end of the) post one more time, and decided that no, the post does actually make it a "power play", that's how it is actually written, in terms of "us vs them", not in terms of ASI's own goals, and then I posted this comment

maximally increasing its compute scaling

as we know, compute is not everything, algorithmic improvement is even more important, at least if one judges by the current trends (and likely sources of algorithmic improvement should be cherished)

and this is not a static system, it is in the process of making its compute architecture better (just like there is no point in making too many H100 GPUs when better and better GPUs are being designed and introduced)

basically, a smart system is likely to avoid doing excessive amount of irreversible things which might turn to be suboptimal


But, in some sense, yes, the main danger is of AIs not being smart enough in terms of the abilities to manage their own affairs well; the action the ASI is taking in the OP is very suboptimal and deprives it of all kinds of options

Just like the bulk of the danger in the "world with superintelligent systems" is ASIs not managing their own existential risk problems correctly, destroying the fabric of reality, themselves, and us as a collateral damage

mishka20

Two main objections to (the tail end of) this story are:

  • On one hand, it's not clear if a system needs to be all that super-smart to design a devastating attack of this kind (we are already at risk of fairly devastating tech-assisted attacks in that general spirit (mostly with synthetic biological viruses at the moment), and those risks are growing regardless of the AGI/superintelligence angle; ordinary tech progress is quite sufficient in this sense)

  • If one has a rapidly self-improving strongly super-intelligent distributed system, it's unlikely that it would find it valuable to directly attack people in this fashion, as it is likely to be able to easily dominate without any particularly drastic measures (and probably would not want to irreversibly destroy important information without good reasons)

The actual analysis, both of the "transition period", and of the "world with super-intelligent systems" period, and of the likely risks associated with both periods is a much more involved and open-ended task. (One of the paradoxes is that the risks of the kind described in the OP are probably higher during the "transition period", and the main risks associated with the "world with super-intelligent systems" period are likely to be quite different.)

mishka20

Ah, it's mostly your first figure which is counter-intuitive (when one looks at it, one gets the intuition of f(g(h... (x))), so it de-emphasizes the fact that each of these Transformer Block transformations is shaped like x=x+function(x))

Load More