Near-Term Risks of an Obedient Artificial Intelligence

[Hi everyone, Yassine here: long-time orbiter first-time poster. I figured this piece I published on 1/23/23 would be as good as any of an introduction.]

I’ll be honest: I used to think talk of AI risk was so boring that I literally banned the topic at every party I hosted. The discourse generally focused on existential risks so hopelessly detached from any semblance of human scale that I couldn’t be bothered to give a shit. I played the Universal Paperclips game and understood what a cataclysmic extinction scenario would sort of look like, but what the fuck was I supposed to do about it now? It was either too far into the future for me to worry about it, or the singularity was already imminent and inevitable. Moreover, the solution usually bandied about was to ensure AI is obedient (“aligned”) to human commands. It’s a quaint idea, but given how awful humans can be, this is just switching one problem for another.

So if we set aside the grimdark sci-fi scenarios for the moment, what are some near-term risks of humans using AI for evil? I can think of three possibilities where AI can be leveraged as a force multiplier by bad (human) actors: hacking, misinformation, and scamming.

(I initially was under the deluded impression that I chanced upon a novel insight, but in researching this topic, I realized that famed security researcher Bruce Schneier already wrote about basically the same subject way back in fucking April 2021 [what a jerk!] with his paper The Coming AI Hackers. Also note that I’m roaming outside my usual realm of expertise and hella speculating. Definitely do point out anything I may have gotten wrong, and definitely don’t do anything as idiotic as make investment decisions based on what I’ve written here. That would be so fucking dumb.)

Computers are given instructions through the very simple language of binary: on and off, ones and zeroes. The original method of “talking” to computers was a punch card, which had (at least in theory) an unambiguous precision to its instructions: punch or nah, on or off, one or zero. Punch cards were intimate, artisanal, and extremely tedious to work with. In a fantastic 2017 Atlantic article titled The Coming Software Apocalypse, James Somers charts how computer programming changed over time. As early as the 1960s, software engineers were objecting to the introduction of this new-fangled “assembly language” as a replacement for punch cards. The old guard worried that replacing 10110000 01100001 on a punch card with MOV AL, 61h might result in errors or misunderstandings about what the human actually was trying to accomplish. This argument lost because the benefits of increased code abstraction were too great to pass up. Low-level languages like assembly are an ancient curiosity now, having long since been replaced by high-level languages like Python and others. All those in turn risk being replaced by AI coding tools like Github’s Copilot.

Yet despite the increasing complexity, even sophisticated systems remained scrutable to mere mortals. Take, for example, a multibillion-dollar company like Apple, which employs thousands of the world’s greatest cybersecurity talent and tasks them with making sure whatever code ends up on iPhones is buttoned up nice and tight. Nevertheless, not too long ago it was still perfectly feasible for a single sufficiently motivated and talented individual to successfully find and exploit vulnerabilities in Apple’s library code just by tediously working out of his living room.

Think of increased abstraction in programming as a gain in altitude, and AI coding tools are the yoke pull that will bring us escape velocity. The core issue here is that any human operator looking below will increasingly lose the ability to comprehend anything within the landscape their gaze happens to rest upon. In contrast, AI can swallow up and understand entire rivers of code in a single gulp, effortlessly highlighting and patching vulnerabilities as it glides through the air. In the same amount of time, a human operator can barely kick a panel open only to then find themselves staring befuddled at the vast oceans of spaghetti code below them.

There’s a semi-plausible scenario in the far future where technology becomes so unimaginably complex that only Tech-Priests endowed with the proper religious rituals can meaningfully operate machinery. Setting aside that grimdark possibility and focusing just on the human risk aspect for now, increased abstraction isn’t actually too dire of a problem. In the same way that tech companies and teenage hackers waged an arms race over finding and exploiting vulnerabilities, the race will continue except the entry price will require a coding BonziBuddy. Code that is not washed clean of vulnerabilities by an AI check will be hopelessly torn apart in the wild by malicious roving bots sniffing for exploits.

Until everyone finds themselves on equal footing where defensive AI is broadly distributed, the transition period will be particularly dangerous for anyone even slightly lagging behind. But because AI can be used to find exploits before release, Schneier believes this dynamic will ultimately result in a world that favors the defense, where software vulnerabilities eventually become a thing of the past. The arms race will continue, except it will be relegated to a clash of titans between adversarial governments and large corporations bludgeoning each other with impossibly large AI systems. I might end up eating my words eventually, but the dynamics described here seem unlikely to afford rogue criminal enterprises the ability to have both access to whatever the cutting-edge AI code sniffers are and the enormous resource footprint required to operate them.

So how about something more fun, like politics! Schneier and Nathan E. Sanders wrote an NYT op-ed recently that was hyperbolically titled How ChatGPT Hijacks Democracy. I largely agree with Jesse Singal’s response in that many of the concerns raised easily appear overblown when you realize they’re describing already existing phenomena:

There’s also a fatalism lurking within this argument that doesn’t make sense. As Sanders and Schneier note further up in their piece, computers (assisted by humans) have long been able to generate huge amounts of comments for… well, any online system that accepts comments. As they also note, we have adapted to this new reality. These days, even folks who are barely online know what spam is.

Adaptability is the key point here. There is a tediously common cycle of hand-wringing over whatever is the latest deepfake technology advance, and how it has the potential to obliterate our capacity to discern truth from fiction. This just has not happened. We’ve had photograph manipulation literally since the invention of the medium; we have been living with a cinematic industry capable of rendering whatever our minds can conjure with unassailable fidelity; and yet, we’re still here. Anyone right now can trivially fake whatever text messages they want, but for some reason this has not become any sort of scourge. It’s by no means perfect, but nevertheless, there is something remarkably praiseworthy about humanity’s ability to sustain and develop properly calibrated skepticism about the changing world we inhabit.

What also helps is that, at least at present, the state of astroturf propaganda is pathetic. Schneier cites an example of about 250,000 tweets repeating the same pro-Saudi slogan verbatim after the 2018 murder of the journalist Jamal Khashoggi. Perhaps the most concerted effort in this arena is what is colloquially known as Russiagate. Russia did indeed try to spread deliberate misinformation in the 2016 election, but the effect (if any) was too miniscule to have any meaningful impact on any electoral outcome, MSNBC headlines notwithstanding. The lack of results is despite the fact that Russia’s Internet Research Agency, which was responsible for the scheme, had $1.25 million to spend every month and employed hundreds of “specialists.”

But let’s steelman the concern. Whereas Russia had to rely on flesh and blood humans to generate fake social media accounts, AI can be used to drastically expand the scope of possibilities. Beyond reducing the operating cost to near-zero, entire ecosystems of fake users can be conjured out of thin air, along with detailed biographies, unique distinguishing characteristics, and specialization backgrounds. Entire libraries of fabricated bibliographies can similarly be summoned and seeded throughout the internet. Google’s system for detecting fraudulent website traffic was calibrated based on the assumption that a majority of users were human. How would we know what’s real and what isn’t if the swamp gets too crowded? Humans also rely on heuristics (“many people are saying”) to make sense of information overload, so will this new AI paradigm augur an age of epistemic learned helplessness?

Eh, doubtful. Propaganda created with the resources and legal immunity of a government is the only area I might have concerns over. But consistent with the notion of the big lie, the false ideas that spread the farthest appear deliberately made to be as bombastic and outlandish as possible. Something false and banal is not interesting enough to care about, but something false and crazy spreads because it selects for gullibility among the populace (see QAnon). I can’t predict the future, but the concerns raised here do not seem materially different from similar previous panics that turned out to be duds. Humans’ persistent adaptability in processing information appears to be so consistent that it might as well be an axiom.

And finally, scamming. Hoo boy, are people fucked. There’s nothing new about swindlers. The classic Nigerian prince email scam was just a repackaged version of similar scams from the sixteenth century. The awkward broken English used in these emails obscures just how labor-intensive it can be to run a 419 scam enterprise from a Nigerian cybercafe. Scammers can expect maybe a handful of initial responses from sending hundreds of emails. The patently fanciful circumstances described by these fictitious princes follow a similar theme for conspiracies: The goal is to select for gullibility.

But even after a mark is hooked, the scammer has to invest a lot of time and finesse to close the deal, and the immense gulf in wealth between your typical Nigerian scammer and your typical American victim is what made the atrociously low success rates worthwhile. The New Yorker article The Perfect Mark is a highly recommended and deeply frustrating read, outlining in excruciating detail how one psychotherapist in Massachusetts lost more than $600,000 and was sentenced to prison.

This scam would not have been as prevalent had there not existed a country brimming with English-speaking people with internet access and living in poverty. Can you think of anything else with internet access that can speak infinite English? Get ready for Nigerian Prince Bot 4000.

Unlike the cybersecurity issue, where large institutions have the capabilities and the incentive to shore up defenses, it’s not obvious how individuals targeted by confidence tricks can be protected. Besides putting them in a rubber room, of course. No matter how tightly you encrypt the login credentials of someone’s bank account, you will always need to give them some way to access their own account, and this means that social engineering will always remain the prime vulnerability in a system. Best of luck, everyone.

Anyways, AI sounds scary! Especially when wielded by bad people. On the flipside of things, I am excited about all the neat video games we’re going to get as AI tools continue to trivialize asset creation and coding generation. That’s pretty cool, at least. 🤖

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

20

Near-Term Risks of an Obedient Artificial Intelligence

20

20