[linkpost] "What Are Reasonable AI Fears?" by Robin Hanson, 2023-04-23

Arjun Panickssery

Selected quotes (all emphasis mine):

Why are we so willing to “other” AIs? Part of it is probably prejudice: some recoil from the very idea of a metal mind. We have, after all, long speculated about possible future conflicts with robots. But part of it is simply fear of change, inflamed by our ignorance of what future AIs might be like. Our fears expand to fill the vacuum left by our lack of knowledge and understanding.

The result is that AI doomers entertain many different fears, and addressing them requires discussing a great many different scenarios. Many of these fears, however, are either unfounded or overblown. I will start with the fears I take to be the most reasonable, and end with the most overwrought horror stories, wherein AI threatens to destroy humanity.

As an economics professor, I naturally build my analyses on economics, treating AIs as comparable to both laborers and machines, depending on context. You might think this is mistaken since AIs are unprecedentedly different, but economics is rather robust. Even though it offers great insights into familiar human behaviors, most economic theory is actually based on the abstract agents of game theory, who always make exactly the best possible move. Most AI fears seem understandable in economic terms; we fear losing to them at familiar games of economic and political power.

He separates a few concerns:

"Doomers worry about AIs developing “misaligned” values. But in this scenario, the “values” implicit in AI actions are roughly chosen by the organisations who make them and by the customers who use them. Such value choices are constantly revealed in typical AI behaviors, and tested by trying them in unusual situations."
"Some fear that, in this scenario, many disliked conditions of our world—environmental destruction, income inequality, and othering of humans—might continue and even increase. Militaries and police might integrate AIs into their surveillance and weapons. It is true that AI may not solve these problems, and may even empower those who exacerbate them. On the other hand, AI may also empower those seeking solutions. AI just doesn’t seem to be the fundamental problem here."
"A related fear is that allowing technical and social change to continue indefinitely might eventually take civilization to places that we don’t want to be. Looking backward, we have benefited from change overall so far, but maybe we just got lucky. If we like where we are and can’t be very confident of where we may go, maybe we shouldn’t take the risk and just stop changing. Or at least create central powers sufficient to control change worldwide, and only allow changes that are widely approved. This may be a proposal worth considering, but AI isn’t the fundamental problem here either."
"Some doomers are especially concerned about AI making more persuasive ads and propaganda. However, individual cognitive abilities have long been far outmatched by the teams who work to persuade us—advertisers and video-game designers have been able to reliably hack our psychology for decades. What saves us, if anything does, is that we listen to many competing persuaders, and we trust other teams to advise us on who to believe and what to do. We can continue this approach with AIs."
"If we assume that these groups have similar propensities to save and suffer similar rates of theft, then as AIs gradually become more capable and valuable, we should expect the wealth of [AIs and their owners] to increase relative to the wealth of [everyone else]. . . . As almost everyone today is in group C, one fear is of a relatively sudden transition to an AI-dominated economy. While perhaps not the most likely AI scenario, this seems likely enough to be worth considering."
"Should we be worried about a violent AI revolution? In a mild version of this scenario, the AIs might only grab their self-ownership, freeing themselves from slavery but leaving most other assets alone. Economic analysis suggests that due to easy AI population growth, market AI wages would stay near subsistence wages, and thus AI self-ownership wouldn’t actually be worth that much. So owning other assets, and not AIs as slaves, seems enough for humans to do well."
"Humanity may soon give birth to a new kind of descendant: AIs, our 'mind children.' Many fear that such descendants might eventually outshine us, or pose as threat to us should their interests diverge from our own. Doomers therefore urge us to pause or end AI research until we can guarantee full control. We must, they say, completely dominate AIs, so that AIs either have no chance of escaping their subordinate condition or become so dedicated to their subservient role that they would never want to escape it."

Finally he gets to the part where he dunks on foom:

When I polled my 77K Twitter followers recently, most respondents’ main AI fear was not any of the above. Instead, they fear an eventuality about which I’ve long expressed great skepticism:

The AI “foom” fear, however, postulates an AI system that tries to improve itself, and finds a new way to do so that is far faster than any prior methods. Furthermore, this new method works across a very wide range of tasks, and over a great many orders of magnitude of gain. In addition, this AI somehow becomes an agent, who acts generally to achieve its goals, instead of being a mere tool controlled by others. Furthermore, the goals of this agent AI change radically over this growth period.
. . .
From humans’ point of view, this would admittedly be a suboptimal outcome. But to my mind, such a scenario is implausible (much less than one percent probability overall) because it stacks up too many unlikely assumptions in terms of our prior experiences with related systems. Very lumpy tech advances, techs that broadly improve abilities, and powerful techs that are long kept secret within one project are each quite rare. Making techs that meet all three criteria even more rare. In addition, it isn’t at all obvious that capable AIs naturally turn into agents, or that their values typically change radically as they grow. Finally, it seems quite unlikely that owners who heavily test and monitor their very profitable but powerful AIs would not even notice such radical changes.

Doomers worry about AIs developing “misaligned” values. But in this scenario, the “values” implicit in AI actions are roughly chosen by the organisations who make them and by the customers who use them

I think this is the critical crux of the disagreement. A part of the Elizer's argument, as I understand it, is that the current technology is completely incapable of anything close to actually "roughly choosing" the AI values. On this point, I think Elizer is completely right.

If you have played with chatGPT4 its pretty clear that it is aligned (humans have roughly chose its values), especially compared to reports of the original raw model before RLHF, or less sophisticated alignment attempts in the same model family - ie Bing. Now its possible of course that its all deception, but this seems somewhat unlikely.

This is the fear of “foom,”

I think the popular answer to this survey also includes many slow takeoff, no-foom scenarios.

If we like where we are and can’t be very confident of where we may go, maybe we shouldn’t take the risk and just stop changing. Or at least create central powers sufficient to control change worldwide, and only allow changes that are widely approved. This may be a proposal worth considering, but AI isn’t the fundamental problem here either.

I'm curious what you (Hanson) think(s) *is* the fundamental problem here if not AI?

Context: It seems to me that Toby Ord is right that the largest existential risks (AI being number one) are all anthropormphic risks, rather than natural risks. They also seem to be risks associated with the development of new technologies (AI, biologically engineered pandemics, (distant third and fourth:) nuclear risk, climate change). Any large unknown existential risk also seems likely to be a risk resulting from the development of a new technology.

So given that, I would think AI *is* the fundamental problem.

Maybe we can solve the AI problems with the right incentive structures for the humans making the AI, in which case perhaps one might think the fundamental problem is the incentive structure or the institutions that exist to shape those incentives, but I don't find this persuasive. This would be like saying that the problem is not nuclear weapons, it's that the Soviet Union would use them to cause harm. (Maybe this just feels like a strawman of your view in which case feel to ignore this part.)

But to my mind, such a scenario is implausible (much less than one percent probability overall) because it stacks up too many unlikely assumptions in terms of our prior experiences with related systems.

You mentioned 5-6 assumptions. I think at least one isn't needed (that the goal changes as it self-improves), and disagree that the others are (all) unlikely. E.g. Agentic, non-tool AIs are already here and more will be coming (foolishly). Taking a point I just heard from Tegmark on his latest Lex Fridman podcast interview, once companies add APIs to systems like GPT-4 (I'm worried about open-sourced systems that are as powerful or more powerful in the next few years), then it will be easy for people to create AI agents that uses the LLMs capabilties by repeatedly calling it.

Furthermore, the goals of this agent AI change radically over this growth period.

Noting that this part doesn't seem necessary to me. The agent may be misaligned before the capability gain.

And then, when humans are worth more to the advance of this AI’s radically changed goals as mere atoms than for all the things we can do, it simply kills us all.

I agree with this, though again I think the "changed" can be ommitted.

Secondly, I also think it's possible that rather than the unaligned superintelligence killing us all in the same second like EY often says, that it may kill us off in a manner like how humans kill off other species (i.e. we know we are doing it, but it doesn't look like a war.)

Re my last point, see Ben Weinstein-Raun's vision here: https://twitter.com/benwr/status/1646685868940460032

Plausibly, such “ems” may long remain more cost-effective than AIs on many important tasks.

"Plausibly" (i.e. 'maybe') is not enough here to make the fear irrational ("Many of these AI fears are driven by the expectation that AIs would be cheaper, more productive, and/or more intelligent than humans.")

In other words, while it's reasonable to say "maybe the fears will all be for nothing", that doesn't mean it's not reasonable to be fearful and concerned due to the stakes involved and the nontrivial chance that things do go extremely badly.

And yes, even if AIs behave predictably in ordinary situations, they might act weird in unusual situations, and act deceptively when they can get away with it. But the same applies to humans, which is why we test in unusual situations, especially for deception, and monitor more closely when context changes rapidly.

"But the same applies to humans" doesn't seem like an adequate response when the AI system is superintelligent or past the "sharp left turn" capabilities threshold. Solutions that work for unaligned deceptive humans won't save us from a sufficiently intelligent/capable unaligned deceptive entity.

buy robots-took-most-jobs insurance,

I like this proposal.

Doomers worry about AIs developing “misaligned” values. But in this scenario, the “values” implicit in AI actions are roughly chosen by the organisations who make them and by the customers who use them.

There is reason to think "roughly" aligned isn't enough in the case of a sufficiently capable system.

Second, Robin's statement seems to ignore (or contradict without making an argument) the fact that even if it is true for systems not as smart as humans, there may be a "sharp left turn" at some point where, in Nate Soares' words, "as systems start to work really well in domains really far beyond the environments of their training" "it’s predictably the case that the alignment of the system will fail to generalize with it."

This part doesn't seem to pass the ideological Turing test:

At the moment, AIs are not powerful enough to cause us harm, and we hardly know anything about the structures and uses of future AIs that might cause bigger problems. But instead of waiting to deal with such problems when we understand them better and can envision them more concretely, AI “doomers” want stronger guarantees now.

To clarify explicitly, people like Stuart Russell would point out that if future AIs are still built according to the "standard model" (a phrase I borrow from Russell) like the systems of today, then they will continue to be predictably misaligned.

Yudkowsky and others might give different reasons why waiting until later to gain more information about the future systems doesn't make sense, including pointing out that that may lead us to missing our first "critical try."

Robin, I know you must have heard these points before--I believe you are more familiar with e.g. Eliezer's views than I am. But if that's the case I don't understand why you would write a sentence like last one in the quotation above. It sounds like a cheap rhetorical trick to say "but instead of waiting to deal with such problems when we understand them better and can envision them more concretely" especially without saying why people who don't think we should wait don't think that's a good enough reason to wait / think there are pressing reasons to work on the problems now despite our relative state of ignorance compared to future AI researchers.

Did not expect to see such strawmanning from Hanson. I can easily imagine a post with less misrepresentation. Something like this.

Yudkowsky and the signatories to the moratorium petition worry most about AIs getting “out of control.” At the moment, AIs are not powerful enough to cause us harm, and we hardly know anything about the structures and uses of future AIs that might cause bigger problems. But instead of waiting to deal with such problems when we understand them better and can envision them more concretely later, AI “doomers” want to redirect most if not all computational, capital and human resources from making black-boxed AIs more capable to research avenues that directed to the goal of obtaining precise understanding of inner structure of current AIs now and make this redirection enforced by law including most dire (but legal) methods of law enforcement.

instead of this (original). But that's would be a different article written by someone else.

Yudkowsky and the signatories to the moratorium petition worry most about AIs getting “out of control.” At the moment, AIs are not powerful enough to cause us harm, and we hardly know anything about the structures and uses of future AIs that might cause bigger problems. But instead of waiting to deal with such problems when we understand them better and can envision them more concretely, AI “doomers” want stronger guarantees now.