When people debating AI x-risk on Twitter talk past each other, my impression is that a significant crux is whether or not the individual buys the instrumental convergence argument.

I wouldn't be surprised if the supermajority of people who don't buy the idea simply haven't engaged with it enough, and I think it is common to have a negative gut reaction to high levels of confidence about something so seemingly far reaching. That said, I'm curious if there are any strong arguments against it? Looking for something stronger than "that's a big claim, and I don't see any empirical proof."

New to LessWrong?

New Answer
New Comment

1 Answers sorted by

PeterMcCluskey

Apr 05, 2023

42

See section 19 of Drexler's CAIS paper (Reframing Superintelligence) for some key limits on what kinds of AI would develop instrumental convergence .

6 comments, sorted by Click to highlight new comments since: Today at 7:05 AM

I think orthogonality and instrumental convergence are mostly arguments for why the singleton scenario is scary. And in my experience, the singleton scenario is the biggest sticking point when talking with people who are skeptical of AI risk. One alternative is to talk about the rising tide scenario: no single AI taking over everything, but AIs just grow in economic and military importance across the board while still sharing some human values and participating in the human economy. That leads to a world of basically AI corporations which are too strong for us to overthrow and whose value system is evolving in possibly non-human directions. That's plenty scary too.

whose value system is evolving in possibly non-human directions

What would be an example of a value that is clearly 'non-human'? AI power being used for 'random stuff' by the AIs' volition? 

Should we even be looking for "arguments for" or "arguments against" a particular component of AI risk? I'd much rather examine evidence that distinguishes models, and then look at what those models predict in terms of AI risk. As far as I can tell we do have some weak evidence, but prospects of stronger evidence in any direction soon are poor since our best AIs are still rather incapable in many respects.

This is concerning, because pretty much any variation of instrumental convergence implies some rather serious risks to humanity, though even without it there may still be major risks from AGI. I'm not convinced that any version of instrumental convergence is actually true, but there seem to be far too many people simply assuming that it's false without evidence.

Is there a specific claim about instrumental convergence that you think is false or want to see arguments against?

I think there's a relatively strong case that instrumental convergence isn't a necessary property for a system to be dangerous. For example, viruses don't exhibit instrumental convergence, or any real agency at all, yet they still manage to be plenty deadly to humans.

Instrumental convergence (and other scary properties, like utility maximization, deception, reflectivity, etc.) seem less like cruxes or key assumptions in the case for AI x-risk, and more like possible implications that follow from extrapolating a relatively simpler world model about what the most capable and general systems probably look like.

These points seems like an argument in support of the case for AI as an x-risk, rather than one against it, so perhaps they're not what you're looking for.

Re: specific claims to falsify, I generally buy the argument. 

If I had to pick out specific aspects which seem weaker, I think they would mostly be related to our confusion around agent foundations. It isn't trivially obvious to me that the way we describe "intelligence" or "goals" within the instrumental convergence argument is a good match for the way current systems operate (though it seems close enough, and we shouldn't expect to be wrong in a way that makes the situation better).

I would agree that instrumental convergence is probably not a necessary component of AI x-risk, so you're correct that "crux" is a bit of a misnomer. 

However, in my experience it is one of the primary arguments people rely on when explaining their concerns to others. The correlation between credence in instrumental convergence and AI x-risk concern seems very high. IMO it is also one of the most concerning legs of the overall argument. 

If somebody made a compelling case that we should not expect instrumental convergence by default in the current ML paradigm, I think the overall argument for x-risk would have to look fairly different from the one that is usually put forward.