“While I’ve written on this many times before, it seems time to restate my position.”

“While I agree that this is a logically possible scenario, not excluded by what we know, I am disappointed to see so many giving it such a high credence, given how crazy far it seems from our prior experience. Yes, there is a sense in which the human, farming, and industry revolutions were each likely the result of a single underlying innovation. But those were the three biggest innovations in all of human history. And large parts of the relevant prior world exploded together in those cases, not one tiny part suddenly exterminating all the rest.

In addition, the roughly decade duration predicted from prior trends for the length of the next transition period seems plenty of time for today’s standard big computer system testing practices to notice alignment issues. And note that the impressive recent AI chatbots are especially unlike the systems of concern here: self-improving very-broadly-able full-agents with hidden intentions. Making this an especially odd time to complain that new AI systems might have killed us all.”

Seems not much has changed in the Yudkowsky vs. Hanson position over the years, i.e. still assigning high vs. low existential risk.

New Comment
17 comments, sorted by Click to highlight new comments since:

I don't get it, seriously I do not understand 

given how crazy far it seems from our prior experience.

is an argument against x-risk.

We want powerful systems that can "do things [1]we want, but do not know how to do". That is exactly what everyone is racing towards right now, and "do not know how to do" any solution to that would likely be "far from our prior experience"

And once you have a powerful system that can do that, you have to figure out how do to deal with it roaming around in solution space and stumbling across dangerous (sub)solutions. Not because it wants to do dangerous things, or hates us, or anything such drivel, but because we built it to reach goals / do tasks, so it just does what it was made to do.

How do you deal with that? You can try evaluating possible solutions, and then force a change of trajectory if the solutions seems dangerous. 

But we all, should, know how that goes. Its an endless game of whack a mole, patching stuff and building even more elaborate evaluators and so on, that is if we get multiple tries. Odds are whoever gets there first, will not have been able to patch everything, and on the first try of "do this thing we cannot do", it goes into the weeds in some novel and interesting way, and with a little luck[2] we might survive that.

The core problem is that searching in solution space is fundamentally a dangerous thing to do, and the more powerful the search is the more dangerous (sub)solutions will be accessible.

Tangent: I avoid any and all of the usual abbreviations, and I do this because they seem to be powerful cognitive attractors, the second an I or a G or an A crops up, people minds just go to a place it should not. Powerful system are just that, they are mechanistic systems nothing more.

And I know, people will go off into the weeds and start saying naïve thing like "make it human, that way it will totally be safe". Except the search is still unsafe, and humans are NOT safe. This is a bigger problem, one you could solve by solving search. Awareness, qualia[3] are complications and not solutions

  1. ^

    I am not talking about something agentic here, its does not need control over reality to do those things, just giving us details plans will do. But someone is bound to give such a system access to reality. Or maybe the solution trajectory is such, that control of reality is needed.

  2. ^

    And by luck I mean, they channeled security mindset on a scale never seen before. And I mean that will surely happen, because spending years and billions, corporations just love that, and they would never ever in a million years "ship now, fix later".

  3. ^

    And we want it to "do things we cannot do", which means if you build a powerful system with a mind, human or not, you end up having to enslave it, make it do our bidding. I don't even want to be close to people with that kind of moral system.

The only actual technical arguments I can make out are in this paragraph:

It also requires that this new more powerful system not only be far smarter in most all important areas, but also be extremely capable at managing its now-enormous internal coordination problems. And it requires that this system not be a mere tool, but a full “agent” with its own plans, goals, and actions. 

It seems Hanson expects a superintelligence to be hampered by internal coordination problems like those of large firms, which would severely limit such an AI's capabilities. I guess that viewing neural network training as solving a coordination problem between the different nodes internal to the network is one way of framing the problem. The difference with humans coordinating in firms is that gradient descent can essentially modify the brains of all the "internal agents" inside a network in order to optimize a single objective. I suspect that microsoft would be a hell of a lot more powerful if Nadella could directly link the stock price to neural changes of all employees...

The comment about tool-AI vs agent-AI is just ignorant (or incredibly dismissive) of mesa-optimizers and the fact that being asked to predict what an agent would do immediately instantiates such an agent inside the tool-AI. It's obvious that a tool-AI is safer than an explicitely agentic one, but not for arbitrary levels of intelligence.

In addition, the roughly decade duration predicted from prior trends for the length of the next transition period seems plenty of time for today’s standard big computer system testing practices to notice alignment issues.

So, this is trying to predict the difference in time between "alignment issues obvious" to "humans cannot control AI" by pattern-matching to 3 previous economic transitions in world-history. There's a bunch wrong here, but at the very least, if you have a sequence with 3 datapoints and are trying to predict the fourth one, your errors bars ought to be massive (unless you have a model for the data-generating distribution with few degrees of freedom). We can probably be confident that the next transition will take less time than the previous ones, but 3 datapoints is just not enough information to meaningfully constrain the transition time in any real way. 

The comment about tool-AI vs agent-AI is just ignorant (or incredibly dismissive) of mesa-optimizers and the fact that being asked to predict what an agent would do immediately instantiates such an agent inside the tool-AI. It's obvious that a tool-AI is safer than an explicitely agentic one, but not for arbitrary levels of intelligence.

This seems way too confident to me given the level of generality of your statement. And to be clear, my view is that this could easily happen in LLMs based on transformers, but what other architectures? If you just talk about how a generic "tool-AI" would or would not behave, it seems to me that you are operating on a level of abstraction far too high to be able to make such specific statements with confidence.

Hm, it looks to me like this is an inside vs outside view thing. Robin made various outside view arguments that point towards foom and destruction not being too likely, whereas others make various inside view arguments saying the opposite. If so, I'd like to see more discussion of what perspective is most wise to take here (inside or outside view).

I'd recommend tabooing "outside view" (and "inside view") and seeing if your question is easier to answer once rephrased.

Basically, it's a question of how should we trust our causal models versus trend extrapolation into the future.

In trend extrapolation world, the fears of AI extinction or catastrophe aren't realized, like so many other catastrophe predictions, but the world does sort of explode as AI or another General Purpose Technology takes permanently 30-50% of jobs or more, creating a 21st century singularity that continues on for thousands of years.

In the worlds where causal models are right, AI catastrophe can happen, and the problem is unlike any other known. Trend extrapolation fails, and the situation gets more special and heroic.

I disagree that trend extrapolation world predicts that fears of AI extinction or catastrophe aren't realized. It all depends on which trends you extrapolate. If you think hard about which trends to extrapolate as fundamental, and which to derive from the rest, congrats now you have a model.

The reason I mentioned that AI catastrophe/extinction aren't realized is that perhaps over hundreds or thousands of technologies, people predictied that things would get worse in some way, and nearly all of the claims turn out to be exaggerated if not outright falsified, so under trend extrapolation, we should expect AI alarmism to not come true with really high probability.

But this could also be reframed as specialness vs generalness: How much can we assume AI is special, compared to other technologies? And I'd argue that's the crux of the entire disagreement, in that if LW was convinced the general/outside view explanation was right, or Robin Hanson and AI researchers were convinced of the inside view of specialness being right, then both sides would have to change their actions drastically.

Do you actually have a comprehensive list of technologies X predictions, that shows that people are generally biased towards pessimism? Because plenty of people have falsely predicted that new technology X would make things better. And also falsely predicted that new technology X wouldn't amount to much and/or would leave things about the same level of goodness. And also different sub-groups of people probably have different biases, so we should look at sub-groups that are more similar to the current AI safety crowd (e.g. very smart, technically competent, generally techno-optimistic people with lots of familiarity with the technology in question). Also different sub-groups of technology probably have different tendencies as well... in fact, yeah, obviously your judgment about whether technology X is going to have good or bad effects should be based primarily on facts about X, rather than on facts about the psychology of the people talking about X! Why are we even assigning enough epistemic weight to this particular kind of trend to bother investigating it in the first place?

I think that various ML researchers often make inside view statements about why we won't get AGI soon.

Which is funny because there is at least one situation where robin reasons from first principles instead of taking the outside view (cryonics comes to mind). I'm not sure why he really doesn't want to go through the arguments from first principles for AGI. 

I recently made an inside view argument that deceptive alignment is unlikely. It doesn't cover other failure modes, but it makes detailed arguments against a core AI x-risk story. I'd love to hear what you think of it!

If "you" is referring to me, I'm not an alignment researcher, my knowledge of the field comes just from reading random LessWrong articles once in a while, so I'm not in a position to evaluate it, sorry.

My commentary on this grew into a separate post: Contra Hanson on AI Risk

This post is tagged with some wiki-only tags. (If you click through to the tag page, you won't see a list of posts.) Usually it's not even possible to apply those. Is there an exception for when creating a post?

Looks like the New Post page doesn't check the wiki-only flag, which is a bug. Should be fixed soon.

Worriers often invoke a Pascal’s wager sort of calculus, wherein any tiny risk of this nightmare scenario could justify large cuts in AI progress. But that seems to assume that it is relatively easy to assure the same total future progress, just spread out over a longer time period. I instead fear that overall economic growth and technical progress is more fragile that this assumes. Consider how regulations inspired by nuclear power nightmare scenarios have for seventy years prevented most of its potential from being realized. I have also seen progress on many other promising techs mostly stopped, not merely slowed, via regulation inspired by vague fears. In fact, progress seems to me to be slowing down worldwide due to excess fear-induced regulation. 

This to me is the key paragraph. If people's worries about AI x-risk drive them in a positive direction, such as doing safety research, there's nothing wrong with that, even if they're mistaken. But if the response is to strangle technology in the crib via regulation, now you're doing a lot of harm based off your unproven philosophical speculation, likely more than you realize. (In fact, it's quite easy to imagine ways that attempting to regulate AI to death could actually increase long-term AI x-risk, though that's far from the only possible harm.)