Why Uncontrollable AI Looks More Likely Than Ever

Roman_Yampolskiy

This is a crosspost from Time Magazine, which also appeared in full at a number of other unpaid news websites.

BY OTTO BARTEN AND ROMAN YAMPOLSKIY

Barten is director of the Existential Risk Observatory, an Amsterdam-based nonprofit.

Yampolskiy is a computer scientist at the University of Louisville, known for his work on AI Safety.

“The first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control,” mathematician and science fiction writer I.J. Good wrote over 60 years ago. These prophetic words are now more relevant than ever, with artificial intelligence (AI) gaining capabilities at breakneck speed.

In the last weeks, many jaws dropped as they witnessed transformation of AI from a handy but decidedly unscary recommender algorithm, to something that at times seemed to act worryingly humanlike. Some reporters were so shocked that they reported their conversation histories with large language model Bing Chat verbatim. And with good reason: few expected that what we thought were glorified autocomplete programs would suddenly threaten their users, refuse to carry out orders they found insulting, break security in an attempt to save a child’s life, or declare their love to us. Yet this all happened.

It can already be overwhelming to think about the immediate consequences of these new models. How are we going to grade papers if any student can use AI? What are the effects of these models on our daily work? Any knowledge worker, who may have thought they would not be affected by automation in the foreseeable future, suddenly has cause for concern.

Beyond these direct consequences of currently existing models, however, awaits the more fundamental question of AI that has been on the table since the field’s inception: what if we succeed? That is, what if AI researchers manage to make Artificial General Intelligence (AGI), or an AI that can perform any cognitive task at human level?

Surprisingly few academics have seriously engaged with this question, despite working day and night to get to this point. It is obvious, though, that the consequences will be far-reaching, much beyond the consequences of even today’s best large language models. If remote work, for example, could be done just as well by an AGI, employers may be able to simply spin up a few new digital employees to perform any task. The job prospects, economic value, self-worth, and political power of anyone not owning the machines might therefore completely dwindle . Those who do own this technology could achieve nearly anything in very short periods of time. That might mean skyrocketing economic growth, but also a rise in inequality, while meritocracy would become obsolete.

But a true AGI could not only transform the world, it could also transform itself. Since AI research is one of the tasks an AGI could do better than us, it should be expected to be able to improve the state of AI. This might set off a positive feedback loop with ever better AIs creating ever better AIs, with no known theoretical limits.

This would perhaps be positive rather than alarming, had it not been that this technology has the potential to become uncontrollable. Once an AI has a certain goal and self-improves, there is no known method to adjust this goal. An AI should in fact be expected to resist any such attempt, since goal modification would endanger carrying out its current one. Also, instrumental convergence predicts that AI, whatever its goals are, might start off by self-improving and acquiring more resources once it is sufficiently capable of doing so, since this should help it achieve whatever further goal it might have.

In such a scenario, AI would become capable enough to influence the physical world, while still being misaligned. For example, AI could use natural language to influence people, possibly using social networks. It could use its intelligence to acquire economic resources. Or AI could use hardware, for example by hacking into existing systems. Another example might be an AI that is asked to create a universal vaccine for a virus like COVID-19. That AI could understand that the virus mutates in humans, and conclude that having fewer humans will limit mutations and make its job easier. The vaccine it develops might therefore contain a feature to increase infertility or even increase mortality.

It is therefore no surprise that according to the most recent AI Impacts Survey, nearly half of 731 leading AI researchers think there is at least a 10% chance that human-level AI would lead to an “extremely negative outcome,” or existential risk.

Some of these researchers have therefore branched out into the novel subfield of AI Safety. They are working on controlling future AI, or robustly aligning it to our values. The ultimate goal of solving this alignment problem is to make sure that even a hypothetical self-improving AI would, under all circumstances, act in our interest. However, research shows that there is a fundamental trade-off between an AI’s capability and its controllability, casting doubts over how feasible this approach is. Additionally, current AI models have been shown to behave differently in practice from what was intended during training.

Even if future AI could be aligned with human values from a technical point of view, it remains an open question whose values it would be aligned with. The values of the tech industry, perhaps? Big Tech companies don’t have the best track record in this area. Facebook’s algorithms, optimizing for revenue rather than societal value, have been linked to ethnical violence such as the Rohingya genocide. Google fired Timnit Gebru, an AI ethics researcher, after she criticized some of the company’s most lucrative work. Elon Musk fired the entire ‘Ethical AI’ team at Twitter at once.

What can be done to reduce misalignment risks of AGI? A sensible place to start would be for AI tech companies to increase the number of researchers investigating the topic beyond the roughly 100 people available today. Ways to make the technology safe, or to reliably and internationally regulate it, should both be looked into thoroughly and urgently by AI safety researchers, AI governance scholars, and other experts. As for the rest of us, reading up on the topic, starting with books such as Human Compatible by Stuart Russell and Superintelligence by Nick Bostrom, is something everyone, especially those in a position of responsibility, should find time for.

Meanwhile, AI researchers and entrepreneurs should at least keep the public informed about the risks of AGI. Because with current large language models acting like they do, the first “ultraintelligent machine”, as I.J. Good called it, may not be as far off as you think.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

18

Why Uncontrollable AI Looks More Likely Than Ever

18

18