Historically, it has been difficult to persuade people of the likelihood of AI risk because the examples tend to sound “far-fetched” to audiences not bought in on the premise. One particular problem with many traditional framings for AI takeover is that most people struggle to imagine how e.g. “a robot programmed to bake maximum pies” figures out how to code, locates its own source-code, copies itself elsewhere via an internet connection and then ends the world.

There’s a major logical leap there: “pie-baking” and “coding” are things done by different categories of agent in our society, and so it’s fundamentally odd for people to imagine an agent capable of both. This oddness makes it feel like we must be far away from any system that could be that general, and thus pushes safety concerns to a philosophical exercise.

I want to make the case that the motivating example we should really be using is automatic code generation. Here’s a long list of reasons why:

  • It’s obvious to people why and how a system good at generating code could generate code to copy itself, if it were given an open-ended task. It’s a basic system-reliability precaution that human engineers would also take.
  • For non-experts, they are already afraid of unrestrained hackers and of large tech companies building software products that damage society - this being done by an unaccountable AI fits into an emotional narrative.
  • For software people (whom we most need to convince) the problem of unexpected behaviors from code is extremely intuitive - as is the fact that it is always the case that code bases are too complex for any human to be certain of what they’ll do before they’re run.
  • Code generation does seem to be getting dramatically better, and the memetic/media environment is ripe for people to decide how to feel about these capabilities.
  • Nearly all conceivable scalable prosaic alignment solutions will require some degree of “program verification” - making sure that code isn’t being run with an accidentally terrible utility function, or to verify the outputs of other AIs via code-checking Tool AIs. So we want substantial overlap between the AI safety and AI codegen communities.
  • The “alignment problem” already exists in nearly all large software engineering projects: it’s very difficult to specify what you want a program to do ahead of time, and so we mostly just run codebases and see what happens.
  • All of the concerns around “the AI learns to use Rowhammer to escape” feel much more obvious when you’re building a code-generator.
  • We can even motivate the problem by having the AI’s objective be “make sure that other code-generating AIs don’t misbehave”. This is open-ended in a way that obviously makes it a utility-maximizer, and preemptively addresses the usual technooptimistic response of “we’ll just build auditor AIs” by starting with aligning those as the premise.
  • The distinction between act-based AIs and EUMs is obvious in the case of code-gen. Similarly, the idea of Safety via Debate is related to code reviewing processes.
  • Software project generation capabilities seem both necessary and possibly sufficient for FOOM/takeover scenarios.
  • Ultimately, the people in government/companies most sympathetic to high-tech risk mitigation are the people who think about cybersecurity - so scaring them gets us a very useful ally. (It’s also a community with plenty of people with the “security mindset” needed for many empirical alignment scenarios.)

On the other hand, there may be some risk that focusing on code generation increases its public salience and thus investment in it. But this seems likely to have happened anyway. It’s also more obviously the path towards recursive self-improvement, and thus may accelerate AI capabilities, but again this does already seem to be happening whether or not we discuss it.

What do people think of this as a framing device?

New Comment
16 comments, sorted by Click to highlight new comments since: Today at 9:26 AM

I think the important point is recursive self-improvement, but it's not clear to me if there is any obvious analogy that can be used here. It's not just the ability to learn and get smarter that way, it's the ability to increase your own intelligence without bound that is critical, and we have no frame of reference for that.

Awesome post, love the perspective. I've generally thought in these lines as well and it was some of the most convincing arguments for working in AI safety when I switched ("Copilot is already writing 20% of my code, what will happen next?"). 

I do agree with other comments that Oracle AI takeover is plausible but will say that a strong code generation tool seems to have better chances and to me seem to arrive parallel to conscious chatbots, i.e. there's currently more incentive to create code generation tools than chatbots and the chatbots that have virtual assistant-like capabilities seem easier to make as code generation tools (e.g. connecting to Upwork APIs for posting a job). 

And as you well mention, converting engineers is much easier with this framing as well and allows us to relate better to the field of AI capabilities, though we might want to just add it to our arsenal of argumentation rather than replace our framing completely ;) Thank you for posting!

Code generation is the example I use to convince all of my software or AI friends of the likelihood of AI risk.

  • most of them have heard of it or even tried it (eg. via Github Copilot)
  • they all recognize at directly useful for their work
  • it's easier to understand labour automation when it applies to your own job

The cybersecurity aspect seems a good one. Maybe not so much to get people worried about x-risk, but to generally take the issue of rouge AI seriously. I admit I don't know much about this, but I'm under the impression that:

  • models are getting more efficient
  • handling speech is a Hard task, but AIs are scarily good at it
  • Moore's law still applies (if not in the same shape) 

These points implies that it might be possible to make auto-infectors that would use AI to search for vulnerabilities, exploit them, and spread updated versions of themselves. It's probably just a matter of time before a smart virus appears. 

Maybe AGI, x-risk, alignment and safety can be separated into smaller issues? The "general" part of AGI seems to be a sticking point with many people - perhaps it would be good to start by showing that even totally dumb AI is dangerous? Especially when bad actors are taken into account - even if you grant that most AI won't be evil, there are groups which will actively strive to create harmful AI.

Yeah, this was my motivation for writing this post - helping people get on the train (and do the same actions) without needing them to buy into eschatology or x-risk seems hugely valuable.

I think this is good to get people initially on board, but I worry that people will start to falsely think that tasks unrelated to writing code are safe.

Honest question: what’s the easiest x-risk scenario that doesn’t involve generating code at some point? I’m struggling to think of any that aren’t pretty convoluted.

(I agree with the point, but think it’s easier to make once our foot is in the door.)

IMO the point of no return will be passed before recursive self improvement probably. All we need is a sufficiently charismatic chatbot to start getting strategic about what it says to people.

I don’t especially disagree that it’s most likely the AI to end the world will be one that writes code? But if you keep throwing optimization power into a reasonably general AI that has no direct code experience it’ll still end the world eventually.

If the AI isn’t a code-writing one I don’t have any particular next guess.

Somewhere in the late-2021 MIRI conversations Eliezer opines that non-recursively-self-improving AI are definitely dangerous. I can search for it if anyone is interested.

Yes please!

From Discussion with Eliezer Yudkowsky on AGI interventions:

Compared to the position I was arguing in the Foom Debate with Robin, reality has proved way to the further Eliezer side of Eliezer along the Eliezer-Robin spectrum. It’s been very unpleasantly surprising to me how little architectural complexity is required to start producing generalizing systems, and how fast those systems scale using More Compute. The flip side of this is that I can imagine a system being scaled up to interesting human+ levels, without “recursive self-improvement” or other of the old tricks that I thought would be necessary, and argued to Robin would make fast capability gain possible. You could have fast capability gain well before anything like a FOOM started. Which in turn makes it more plausible to me that we could hang out at interesting not-superintelligent levels of AGI capability for a while before a FOOM started. It’s not clear that this helps anything, but it does seem more plausible.

From Ngo and Yudkowsky on alignment difficulty:

It later turned out that capabilities started scaling a whole lot without self-improvement, which is an example of the kind of weird surprise the Future throws at you . . .

And yeah I realize now that my summary of what Eliezer wrote is not particularly close to what he actually wrote.

Depends what you mean by generate code. Can it have a prebaked function that copies itself (like computer viruses)? Does it count if it generates programs to attack other systems? If it changes its own source code? Its code stored in memory? You could argue that changing anything in memory is in a certain sense generating code.

If it can't generate code, it'll be a 1 shot type of thing. Which means that it must be preprogrammed with the tools to do its job. I can't come up with any way for it to take control, but it doesn't seem that hard to come up with some doomsday machine scenarios. E.g. smashing a comet into earth, or making a virus that sterilizes everyone. Or a Shiri’s Scissor could do the trick. The idea being to make something that doesn't have to learn or improve itself too much.

I was thinking of "something that can understand and write code at the level of a 10x SWE". I'm further assuming that human designers didn't give it functions to copy itself or other dumb things.

As an optimist/skeptic of the alignment problem, I can confirm that code generation is much more persuasive to me than other AI risk hypotheticals. I'm still not convinced that an early AGI would be powerful enough to hack the internet, or that hacking the internet could lead to x-risk (there's still a lot that is not computerized and civilization is very resilient), but at least code generation is at the intersection of "it's plausible that an early AGI would have super human coding capacity" and "hacking the internet is a plausible thing to do" (unlike nanotech or an extinction-causing super-plague).