Alignment of a superintelligent AI with human values is very difficult, if possible at all. Given the current speed in AI development, it seems unlikely that we will have a solution for the alignment problem before we can build an uncontrollable AI. A misaligned uncontrollable AI, however, will very likely destroy our future. If these assumptions are true, the only option we have is to not build one, at least until we can solve alignment.
A common objection to this is: “But that’s impossible, given the unilateralist’s curse. You can’t get the level of global coordination necessary to regulate AI so that nobody will develop an AGI. Even if you could, it’s impossible to enforce that regulation globally. Therefore, AGI is inevitable.”
AI governance is indeed very difficult. But if we can’t align an uncontrollable AI and regulation to prevent it isn’t feasible, then “dying with dignity” seems to be the only option.
However, there may be another alternative. Humans do not only coordinate through rules and regulations. Sometimes, a sufficient level of common knowledge is enough.
De Freitas et al. have shown that common knowledge is an important factor in getting people to cooperate and coordinate. It works in two ways: knowing what the right thing to do is individually and knowing that others know the same and will act accordingly. The latter makes it much easier to do the right thing in most cases.
There are two reasons, for instance, to stop at a red traffic light. On one hand, you know individually that you shouldn’t cross it and that if you do it anyway and get caught, you’ll get fined. But more importantly, you know that others will generally follow the traffic rules and expect you to do the same. If it’s green, you can trust that other drivers at a crossroads will stop at their red lights and not crash into you. Everyone follows these rules mainly because we trust in others to know and obey them. Of course, there are exceptions – people do ignore red lights sometimes – but they are relatively rare.
People coordinate by common knowledge all the time. We agree on common languages, legal rules, and standards of politeness. People show up at the same time at birthday parties, concerts, and soccer matches because everyone knows where the event will take place and when it starts. Money is probably the most powerful example of coordination by common knowledge: A hundred-dollar note is only worth anything if everyone believes in its worth. If people lose that faith, the value of a currency goes down and inflation goes up. The same is true for many things we regard as valuable, like an NFT or an original painting by Picasso.
We also agree on things we shouldn’t do. We don’t let our children play on the highway. We don’t eat the first unknown mushroom we find in the woods. We don’t climb into a cage in the zoo to pet the tigers. It’s common knowledge that these things are dangerous and no one in their right mind would do them. There’s even a satirical award for people who are stupid enough to do obviously dangerous things anyway, precisely because people rarely do them.
The big advantage of coordination by common knowledge is that you don’t necessarily have to enforce regulation to prevent bad things from happening. Even if there were no fines, most people would still stop at red traffic lights. However, it’s difficult to prevent 100% of bad things this way, so coordination by common knowledge is usually combined with regulation to reduce the chance that people who are ignorant of the common knowledge or choose to ignore it can do bad things. For example, to drive a car you have to be old enough, need a driver’s license, and must not be under the influence of alcohol or drugs.
Coordination by common knowledge also has its downsides. It is sometimes difficult to create the necessary level of common knowledge, especially if that knowledge is controversial. False common knowledge can be used to deceive and mislead people, for example manipulating their political opinions with fake news. People can get caught in a bubble, accepting false common knowledge in their social group, like covid deniers or flat earthers. But this only illustrates how powerful a coordination tool common knowledge is.
To avoid uncontrollable AI in this way, we would need a high degree of common knowledge about how an AI could become uncontrollable, why it is difficult to align it to our values, and why a misaligned uncontrollable AI would very likely destroy our future. This is certainly difficult to achieve, but maybe not impossible.
The first thing that we would need to establish is the common knowledge that uncontrollable AI can actually be prevented. If everyone believes this is impossible and AGI - which would ultimately be uncontrollable - is inevitable, no one will be motivated to end the race for AGI. People will fall for the illusion that it is better if they develop an AGI before someone else does. This is a fallacy: after all, it doesn’t really matter who develops the uncontrollable AI that destroys the world. The belief that AGI is inevitable may turn out to be a deadly self-fulfilling prophecy.
AGI is not truly inevitable. We haven’t built it yet. We can still commonly decide to not build it, at least until we have solved the alignment problem. We don’t even need it for an amazing future.
On first glance, history seems to show that humans have always done every stupid thing they could. Whenever something was technically possible, people have built it. Nuclear bombs seem to be an obvious example: we have created enough of them to destroy humanity many times over. During the Cold War, many people thought that a nuclear war was inevitable. Yet, even though there were some close calls, we have managed to avoid it so far. The main reason for this is the common knowledge of mutually assured destruction: if one side attacks the other with nuclear weapons, there will be retaliation and both sides will lose more than they could ever gain.
This kind of equilibrium is of course fragile. Through nuclear proliferation, more and more nations get the ability to start a nuclear war. This increases the probability that it will happen someday, either by accident or initiated by some mad dictator who thinks he has nothing to lose. So in the long run, a nuclear war, at least a local one, seems inevitable. However, each year without a nuclear war is a good year. It buys us time to find ways for better mitigating the risks, e.g. by governing nuclear weapons with international treaties. Maybe we’ll even manage to get rid of them completely one day, for example if we can create a stable global world order. This is difficult, but maybe not impossible. And if we survive long enough, we may spread across the galaxy and even a global nuclear war would not destroy human civilization anymore.
There are other examples of things we’re not doing although we could. We have banned biological weapons (although there still seem to be some unhealthy experiments in secret labs). We largely refrain from doing genetic experiments on human embryos. We have stopped sacrificing humans to appease the gods. In most parts of the world, slavery is illegal. Regulation plays a role in these examples, but what came first was a common understanding that these things were bad and shouldn’t be done.
Of course, there are also counterexamples. We have not managed to create a high enough level of common knowledge about the covid pandemic to get everyone vaccinated and wearing masks. Although there is a lot of common knowledge about climate change, many people, corporations, and governments largely ignore it and act as if it didn’t happen.
To avoid the creation of an uncontrollable AI, the overlap of people who want to create one with people who are able to do so must be precisely zero. The size of the first group can be reduced by creating as much common knowledge about the dangers and the difficulties of alignment as possible. However, the second group must also be kept as small as possible. For that, regulation is needed, for example by tracking GPUs and TPUs and possibly restricting access to computing power. This is a complex task and not the topic of this post.
Given all the difficulties, why should we expect that it is possible to coordinate by common knowledge enough to prevent an uncontrollable AI?
There are some reasons to be hopeful:
But there are also a number of barriers to creating the necessary level of common knowledge:
To overcome these barriers, we probably need more research to better understand what exactly makes an AI uncontrollable, so we can draw “red lines” that mustn’t be crossed. For these red lines to be commonly accepted, the underlying research must be common knowledge among all who might cross them. We also need a common understanding of the dangers of uncontrollable AI so people know why it would be stupid to create one.
The latter is mainly a communication problem. The arguments why it’s not a good idea to create an AI that is smarter than a human and not aligned with our values are already on the table. The one example we know of a superior intelligence taking over the world – homo sapiens killing off all other hominid species, destroying most natural habitats, changing earth’s climate, and causing a mass extinction – speaks for itself. Many laypeople intuitively understand this. But there are a lot of AI risk deniers who for various reasons disregard these arguments, mostly without even engaging with them.
It takes time, patience, and a lot of effort to convince these people. However, the Overton window seems to be shifting right now. The amazing capabilities of ChatGPT and GPT-4 have made the claims that LLMs are just “stochastic parrots” increasingly unconvincing. The open letter by the Future of Life Institute has drawn a lot of media attention. Geoffrey Hinton’s departure from Google to warn about the risks of advanced AI has given the field of AI safety additional credibility. The leaders of Deepmind, Google, Microsoft, and OpenAI have even publicly stated that they are at least partly aware of AI risks.
Of course, this doesn’t mean that everything is fine. Currently, the race for AGI is still fully underway. We need a major coordination effort to stop it. We need the leaders of the top AI labs to come together and agree that they’ll abandon this race and not risk our future by blindly pushing ahead.
The necessary prerequisite for this to happen is that those leaders all share the same common knowledge that if they continue the race, someone will likely create a misaligned uncontrollable AI which would destroy our future. And they need to know that the others understand this as well.
Things that could be helpful to achieve a common understanding of the dangers of uncontrollable AI (incomplete list):
Things that may not be helpful (incomplete list):
The possibility of coordination by common knowledge seems to be largely neglected in the current discussion about AI safety. There are many difficulties and possible objections, but declaring coordination to be impossible before even trying is giving up one of the few options left to us to avert an existential catastrophe from uncontrollable AI.
A hundred-dollar note is only worth anything if everyone believes in its worth. If people lose that faith, the value of a currency goes down and inflation goes up.
Ah, the condition for the reality of money is much weaker though - you only have to believe that you will be able to find "someone" who believes they can find someone for whom money will be worth something, no need to involve "everyone" in one's reasoning.
Inflation is much more complicated of course, but in essence, you only have to believe that other people believe that money is losing value and will buy the same thing for higher price from you to be incentivized to increase prices, you don't have to believe that you yourself will be able to buy less from your suppliers, increasing the price for higher profits is a totally valid reason for doing so.
This is also a kind of "coordination by common knowledge", but the parties involved don't have to share the same "knowledge" per se - consumers might believe "prices are higher because of inflation" while retailers might belive "we can make prices higher because people believe in inflation"...
Not sure myself whether search for coordination by common knowledge incentivizes deceptive alignment "by default" (having an exponentially larger basin) or if some reachable policy can incentivize true aligmnent 🤷
Yes, thanks for the clarification! I was indeed oversimplifying a bit.