[ Question ]

AI Boxing for Hardware-bound agents (aka the China alignment problem)

by Logan Zoellner10 min read8th May 202027 comments



I feel like I fit into a small group of people who are both:

1) Very confident AI will be developed in the next few decades

2) Not terribly worried about AI alignment

Most people around here seem to fall into either the camp "AI is always 50 years away" or "Oh my gosh Skynet is going to kill us all!".

This is my attempt to explain why

A) I'm less worried than a lot of people around here.

B) I think a lot of the AI alignment work is following a pretty silly research path.


Sorry this is so long.

The short version is: I believe we will see a slow takeoff in which AI is developed simultaneously in several places around the globe. This means we need to focus on building institutions not software.

!!!Edit: Important Clarification

I apparently did a poor job writing this, since multiple people have commented "Wow, you sure hate China!" or "Are you saying China taking over the world is worse than a paperclip maximizer!?"

That is not what this post is about!

What this post is about is:

Suppose you were aware that an entity was soon to come into existence which would be much more powerful than you are. Suppose further that you had limited faith in your ability to influence the goals and values of that entity. How would you attempt to engineer the world so that nonetheless after the rise of that entity, your values and existence continue to be protected?

What this post is not about:

An AI perfectly aligned with the interests of the Chinese government would not be worse than a paperclip maximizer (or your preferred bad outcome to the singularity). An AI perfectly aligned with the consistently extrapolated values of the Chinese would probably be pretty okay, since the Chinese are human and share many of the same values as I do. However in a world with a slow takeoff I think it is unlikely any single AI will dominate, much less one that happens to perfectly extrapolate the values of any one group or individual.

Foom, the classic AI-risk scenario

Generally speaking the AI risk crowd tells a very simple story that goes like this:

1) Eventually AI will be developed capable of doing all human-like intellectual tasks

2) Improving AI is one of these tasks

3) Goto 1

The claim is that "Any AI worth its salt will be capable of writing an even better AI, which will be capable of building an even better AI, which will be capable of...." so within (hours? minutes? days?) AI will have gone from human-level to galacticly beyond anything humanity is capable of doing.

I propose an alternative hypothesis:

By the time human-level AI is achieved, most of the low-hanging fruit in the AI improvement domain will have already been found, so subsequent improvements in AI capability will require a superhuman level of intelligence. The first human-level AI will be no more capable of recursive-self-improvement than the first human was.

Note: This does not mean that recursive self-improvement is a thing that is going to stop happening, or that the development of human-level AI will not have profound economic, scientific and philosophical consequences. What this means is, the first AI is going to take some serious time and compute power to out-compete 200 plus years worth of human effort on developing machines that think.

What the first AI looks like in each of these scenarios:

Foom: One day, some hacker in his mom's basement writes an algorithm for a recursively self-improving AI. Ten minutes later, this AI has conquered the world and converted Mars into paperclips

Moof: One day, after a 5 years of arduous effort, Google finally finishes training the first human-level AI. Its intelligence is approximately that of a 5-year-old child. Its first publicly uttered sentence is "Mama, I want to watch Paw Patrol!" A few years later, anybody can "summon" a virtual assistant with human level intelligence from their phone to do their bidding. But people have been using virtual AI assistants on their phone since the mid 2010's, so nobody is nearly as shocked as a time-traveler from the year 2000 would be.

What is the key difference between these scenarios? (Software vs Hardware bound AI)

In the Foom scenario, the key limiting resource or bottleneck was the existence of the correct algorithm. Once this algorithm was found, the AI was able to edit its own source-code, leading to dramatic recursive self-improvement.

In the Moof scenario, the key limiting resources were hardware and "training effort". Building the first AI required massively more compute power and training data than running the first AI, and also massively more than the first AI had access to.

Does this mean that the development of human-level AI might not surprise us? Or that by the time human level AI is developed it will already be old news? I don't know. That depends on whether or not you were surprised by the development of Alpha-Go.

If, on the one hand, you had seen that since the 1950's computer AIs had been capable of beating humans increasingly difficult games and that progress in this domain had been fairly steady and mostly limited by compute power. And moreover that computer Go programs had themselves gone from idiotic to high-amateur level over a course of decades, then the development of alpha-go (if not the exact timing of that development) probably seemed inevitable.

If, on the other hand, you thought that playing Go was a uniquely human skill that required the ability to think creatively which machines could never ever replicate, then Alpha Go probably surprised you.

For the record, I was surprised at how soon Alpha-Go happened, but not that it happened.

What arguments are there in favor of (or against) Hardware Bound AI?

The strongest argument in favor of hardware-bound AI is that in areas of intense human interest, the key "breakthroughs" tend to found by multiple people independently, suggesting they are a result of conditions being correct rather than the existence of a lone genius.

Consider: Writing was independently invented at a minimum in China, South America, and the middle-east. Calculus was developed by both Newton and Leibnez. There are half a dozen people who claim to have beaten the Wright brothers for the first powered flight. Artificial neural networks had been a topic of research for 50 years before the deep-learning revolution.

The strongest argument against Hardware Bound AI (and in favor of Foom) is that we do not currently know the algorithm that will be used to develop a human level intelligence. This leaves open the possibility that a software breakthrough will lead to rapid progress.

However, I argue that not only will the "correct algorithm" be known well in advance of the development of human-level AI, but it will be widely deployed as well. I say this because we have every reason to believe that the algorithm that human-level AI in humans is the same algorithm that produces chimpanzee-level AI in chimps, dog-level AI in dogs and mouse-level AI in mice, if not cockroach-level AI in cockroaches. The evolutionary changes from chimpanzee to human were largely of scale and function, not some revolutionary new brain architecture.

Why should we expect dog-AI or chimp AI to be developed before human-AI? Because they will be useful and because considerable economic gain will go to their developer. Imagine an AI that could be trained as easily as a dog, but whose training could then be instantly copied to millions of "dogs" around the planet.

Furthermore, once dog-AI is developed, billions of dollars of research and investment will be spent improving it to make sure its software and hardware run as efficiently as possible. Consider the massive effort that has gone into the development of software like TensorFlow or Google's TPU's. If there were a "trick" that would make dog-AI even 2x as powerful (or energy efficient), researchers would be eager to find it.

What does this mean for AI alignment? (Or, what is the China Alignment problem?)

Does the belief in hardware-bound AI mean that AI alignment doesn't matter, or that the development of human-level AI will be a relative non-event?


Rather, it means that when thinking about AI risk, we should think of AI less as a single piece of software and more as a coming economic shift that will be widespread and unstoppable well before it officially "happens".

Suppose, living in the USA in the early 1990's, you were aware that there was a nation called China with the potential to be vastly more economically powerful than the USA and whose ideals were vastly different from your own. Suppose, further, that rather than trying to stop the "rise" of China, you believed that developing China's vast economic and intellectual potential could be a great boon for humankind (and for the Chinese themselves).

How would you go about trying to "contain" China's rise? That is, how would you make sure that at whatever moment China's power surpassed your own, you would face a benevolent rather than a hostile opponent.

Well, you would probably do some game theory. If you could convince the Chinese that benevolence was in their own best interest while they were still less-powerful than you, perhaps you would have a chance of influencing their ideology before they became a threat. At the very least your goals would be the following:

1) Non-aggression. You should make it perfectly clear to the Chinese that force will be met with overwhelming force and should they show hostility, they will suffer.

2) Positive-sum games. You should engage China in mutual-economic gain, so that they realize that peaceful intercourse with you is better than the alternative.

3) Global institutions. You should establish a series of global institutions that enshrine the values you hold most dear (human rights, freedom of speech) and make clear that only entities that respect these values (at least on an international stage) will be welcomed to the "club" of developed nations.

Contrast this with traditional AI alignment, which is focused on developing the "right software" so that the first human-level AI will have the same core values as human beings. Not only does this require you to have a perfect description of human values, you must also figure out how to encode those values in a recursively self-improving program, and make sure that your software is the first to achieve Foom. If anyone anywhere develops an AI based off of software that is not perfect before you, we're all doomed.

AI Boxing Strategies for Hardware Bound AI

AI boxing is actually very easy for Hardware Bound AI. You put the AI inside of an air-gapped firewall and make sure it doesn't have enough compute power to invent some novel form of transmission that isn't known to all of science. Since there is a considerable computational gap between useful AI and "all of science", you can do quite a bit with an AI in a box without worrying too much about it going rogue.

Unfortunately, AI boxing is also a bit of a lost cause. It's fine if your AI is nicely contained in a box. However, your competitor in India has been deploying AI on the internet doing financial trading for a decade already. An AI that is programmed to make as much money as possible trading stocks and is allowed to buy more hardware to do so has all of the means, motive, and opportunity to be a threat to humankind.

The only viable strategy is to make sure that you have a pile of hardware of your own that you can use to compete economically before getting swamped by the other guy. The safest path isn't to limit AI risk as much as possible, but rather to make sure that agents you have friendly economic relations with rise as quickly as possible.

What research can I personally invest in to maximize AI safety?

If the biggest threat from AI doesn't come from AI Foom, but rather from Chinese-owned AI with a hostile world-view. And if, like me, you consider the liberal values held by the Western world something worth saving, then the single best course of action you can take is to make sure those liberal values have a place in the coming AI-dominated economy.

This means:

1) Making sure that liberal western democracies continue to stay on the cutting-edge of AI development.

2) Ensuring that global institutions such as the UN and WTO continue to embrace and advance ideals such as free-trade and human-rights.

Keeping the West ahead

Advancing AI research is actually one of the best things you can do to ensure a "peaceful rise" of AI in the future. The sooner we discover the core algorithms behind intelligence, the more time we will have to prepare for the coming revolution. The worst-case scenario still is that some time in the mid 2030's a single research team comes up with a revolutionary new software that puts them miles ahead of anyone else. The more evenly distributed AI research is, the more mutually beneficial economic games will ensure the peaceful rise of AI.

I actually think there is considerable work that can be done right now to develop human-level AI. While I don't think that Moore's law has yet reached the level required to develop human AI, I believe we're approaching "dog-level" and we are undoubtedly well beyond "cockroach level". Serious work on developing sub-human AI not only advances the cause of AI safety, but will also provide enormous economic benefits to all of us living here on earth.

Personally, I think one fruitful area in the next few years will be the combination of deep-learning with "classical AI" to develop models that can make novel inferences and exhibit "one shot" or "few shot" learning. The combination of a classic method (Alpha–beta pruning) and deep learning is what made alpha-go so powerful.

Imagine an AI that was capable of making general inferences about the world, where the inferences themselves were about fuzzy categories extracted through deep learning and self-play. For example it might learn "all birds have wings", where "bird" and "wing" refer to different activations in a deep learning network but the sentence "all birds have wings" is encoded in a expert-system like collection of facts. The system would then progressively expand and curate its set of facts, keeping the ones that were most useful for making predictions about the real world. Such a system could be trained on a youtube-scale video corpus, or on a simulated environment such as Skyrim or Minecraft.

Building institutions

In addition to making sure that AI isn't developed first by an organization hostile to Western liberal values, we also need to make sure that when AI is developed, it is born into a world that encourages its peaceful development. This means promoting norms of liberty, free trade and protection of personal property. In a world with multiple actors trading freely, the optimal strategy is one of trade and cooperation. Violence will only be met with countervailing force.

This means we need to strengthen our institutions as well as our alliances. The more we can enshrine principles of liberty in the basic infrastructure of our society, the more likely they will survive. This means building an internet and financial network that resists surveillance and censorship. Currently blockchain is the best platform I am aware of for this.

This also means developing global norms in which violence is met with collective action against the aggressor. When Russia invades Ukraine or China invades Taiwan, the world cannot simply turn a blind eye. Tit-for-tat like strategies can encourage the evolution of pro-social or at least rational AI entities.

Finally, we need to make sure that Western liberal democracy survives long enough to hand off the reins to AI. This means we need to seriously address problems like secular stagnation, climate change, and economic inequality.

When will human-level AI be developed?

I largely agree with Metaculus that it will happen sometime between 2030 and 2060. I expect that we will see some pretty amazing breakthroughs (dog-level AI) in the next few years. One group whose potential I think is slightly unappreciated is Tesla. They have both a need (self-driving) and the means (video data from millions of cars) to make a huge breakthrough here. Google, Amazon, and whoever is building the surveillance state in China are also obvious places to watch.

One important idea is that of AI fire-alarms. Mine personally was Alpha-Go, which caused me to update from "eventually" to "soon". The next fire-alarm will be an AI that can react to a novel environment with a human-like amount of training data. Imagine an AI that can learn to play Super Mario in only a few hours of gameplay, or an AI that can learn a new card game just by playing with a group of humans for a few hours. When this happens, I will update from "soon" to "very soon".

What are your credences? (How much would you be willing to bet?)

Foom vs Moof:

I think this is a bit of a sucker bet, since if Foom happens we're (probably) all dead. But I would be willing to bet at least 20:1 against Foom. Forms this bet might take are "Will the first human-level AI be trained on hardware costing more or less than $1 million (inflation adjusted)?"

When will AGI happen?

I would be willing to take a bet at 1:1 odds that human-level AI will not happen before 2030.

I will not take a fair bet that human-level AI will happen before 2060, since it's possible that Moore's law will break down in some way I can not predict. I might take such a bet at 1:3 odds.


I will take a bet at 10:1 odds that human-level AI will be developed before we have a working example of "aligned AI", that is an AI algorithm that provably incorporates human values in a way that is robust against recursive self-improvement.

Positive outcome to the singularity:

This is even more of a sucker bet than Foom vs Moof. However, my belief is closer to 1:1 than it is to 100:1, since I think there is a real danger that a hostile power such as China develops AI before us, or that we haven't developed sufficiently robust institutions to survive the dramatic economic upheaval that human-level AI will produce.

Tesla vs Google:

I would be willing to bet 5:1 that Tesla will produce a mass-market self-driving car before Google.



New Answer
Ask Related Question
New Comment

2 Answers

Free trade can also have a toxic side. It could make sidelining human dignity in terms of economic efficiency the expected default.

Powerful tit for that can also mean that the law of the strongest is normalised. When you stop being the powerful one and the AI feels that you are doing something / immoral dangerous it will take severe action.

The porblem should remain essentially the same if we reframe the China problem as the US problem. I don't want to AI to fail to implement universal healthcare and letting US lead into new ages risks that those values are not upheld. If I don't want there to be a global police state shoud I take swift action when US tries to act as one? One of the problems is that global security mechanism as not exactly multiparty systems but have strong oligarcical features. And when popular nations disregard their function they don't significantly get hampered by them.

Gets politcal and the inference step from values to politics isn't particularly strong.

Nice post! The moof scenario reminds me somewhat of Paul Christiano's slow take-off scenario which you might enjoy reading about. This is basically my stance as well.

AI boxing is actually very easy for Hardware Bound AI. You put the AI inside of an air-gapped firewall and make sure it doesn't have enough compute power to invent some novel form of transmission that isn't known to all of science. Since there is a considerable computational gap between useful AI and "all of science", you can do quite a bit with an AI in a box without worrying too much about it going rogue.

My major concern with AI boxing is the possibility that the AI might just convince people to let it out (ie remove the firewall, provide unbounded internet access, connect it to a Cloud). Maybe you can get around this by combining a limited AI output data stream with a very arduous gated process for letting the AI out in advance but I'm not very confident.

If the biggest threat from AI doesn't come from AI Foom, but rather from Chinese-owned AI with a hostile world-view.

The biggest threat from AI comes from AI-owned AI with a hostile worldview -- no matter whether how the AI gets created. If we can't answer the question "how do we make sure AIs do the things we want them to do when we can't tell them all the things they shouldn't do?", we might wind up with Something Very Smart scheming to take over the world while lacking at least one Important Human Value. Think Age of Em except the Ems aren't even human.

Advancing AI research is actually one of the best things you can do to ensure a "peaceful rise" of AI in the future. The sooner we discover the core algorithms behind intelligence, the more time we will have to prepare for the coming revolution. The worst-case scenario still is that some time in the mid 2030's a single research team comes up with a revolutionary new software that puts them miles ahead of anyone else. The more evenly distributed AI research is, the more mutually beneficial economic games will ensure the peaceful rise of AI.

Because I'm still worried about making sure AI is actually doing the things we want it to do, I'm worried that faster AI advancements will imperil this concern. Beyond that, I'm not really worried about economic dominance in the context of AI. Given a slow takeoff scenario, the economy will be booming like crazy wherever AI has been exercised to its technological capacities even before AGI emerges. In a world of abundant labor and so on, the need for mutually beneficial economic games with other human players, let alone countries, will be much less.

I'm a little worried about military dominance though -- since the country with the best military AI may leverage it to radically gain a geopolitical upper-hand. Still, we were able to handle nuclear weapons so we should probably be able to handle this to.