AGI Safety Solutions Map

turchin

When I started to work on the map of AI safety solutions, I wanted to illustrate the excellent article “Responses to Catastrophic AGI Risk: A Survey” by Kaj Sotala and Roman V. Yampolskiy, 2013, which I strongly recommend.

However, during the process I had a number of ideas to expand the classification of the proposed ways to create safe AI. In their article there are three main categories: social constraints, external constraints and internal constraints.

I added three more categories: "AI is used to create a safe AI", "Multi-level solutions" and "meta-level", which describes the general requirements for any AI safety theory.

In addition, I divided the solutions into simple and complex. Simple are the ones whose recipe we know today. For example: “do not create any AI”. Most of these solutions are weak, but they are easy to implement.

Complex solutions require extensive research and the creation of complex mathematical models for their implementation, and could potentially be much stronger. But the odds are less that there will be time to realize them and implement successfully.

After aforementioned article several new ideas about AI safety appeared.

These new ideas in the map are based primarily on the works of Ben Goertzel, Stuart Armstrong and Paul Christiano. But probably many more exist and was published but didn’t come to my attention.

Moreover, I have some ideas of my own about how to create a safe AI and I have added them into the map too. Among them I would like to point out the following ideas:

1. Restriction of self-improvement of the AI. Just as a nuclear reactor is controlled by regulation the intensity of the chain reaction, one may try to control AI by limiting its ability to self-improve in various ways.

2. Capture the beginning of dangerous self-improvement. At the start of potentially dangerous AI it has a moment of critical vulnerability, just as a ballistic missile is most vulnerable at the start. Imagine that AI gained an unauthorized malignant goal system and started to strengthen itself. At the beginning of this process, it is still weak, and if it is below the level of human intelligence at this point, it may be still more stupid than the average human even after several cycles of self-empowerment. Let's say it has an IQ of 50 and after self-improvement it rises to 90. At this level it is already committing violations that can be observed from the outside (especially unauthorized self-improving), but does not yet have the ability to hide them. At this point in time, you can turn it off. Alas, this idea would not work in all cases, as some of the objectives may become hazardous gradually as the scale grows (1000 paperclips are safe, one billion are dangerous, 10 power 20 are x-risk). This idea was put forward by Ben Goertzel.

3. AI constitution. First, in order to describe the Friendly AI and human values we can use the existing body of criminal and other laws. (And if we create an AI that does not comply with criminal law, we are committing a crime ourselves.) Second, to describe the rules governing the conduct of AI, we can create a complex set of rules (laws that are much more complex than Asimov’s three laws), which will include everything we want from AI. This set of rules can be checked in advance by specialized AI, which calculates only the way in which the application of these rules can go wrong (something like mathematical proofs on the basis of these rules).

4. "Philosophical landmines." In the map of AI failure levels I have listed a number of ways in which high-level AI may halt when faced with intractable mathematical tasks or complex philosophical problems. One may try to fight high-level AI using "landmines", that is, putting it in a situation where it will have to solve some problem, but within this problem is encoded more complex problems, the solving of which will cause it to halt or crash. These problems may include Godelian mathematical problems, nihilistic rejection of any goal system or the inability of AI to prove that it actually exists.

5. Multi-layer protection. The idea here is not that if we apply several methods at the same time, the likelihood of their success will add up, this notion will not work if all methods are weak. The idea is that the methods of protection work together to protect the object from all sides. In a sense, human society works the same way: a child is educated by an example as well as by rules of conduct, then he begins to understand the importance of compliance with these rules, but also at the same time the law, police and neighbours are watching him, so he knows that criminal acts will put him in jail. As a result, lawful behaviour is his goal which he finds rational to obey. This idea can be reflected in the specific architecture of AI, which will have at its core a set of immutable rules, around it will be built human emulation which will make high-level decisions, and complex tasks will be delegated to a narrow Tool AIs. In addition, independent emulation (conscience) will check the ethics of its decisions. Decisions will first be tested in a multi-level virtual reality, and the ability of self-improvement of the whole system will be significantly limited. That is, it will have an IQ of 300, but not a million. This will make it effective in solving aging and global risks, but it will also be predictable and understandable to us. The scope of its jurisdiction should be limited to a few important factors: prevention of global risks, death prevention and the prevention of war and violence. But we should not trust it in such an ethically delicate topic as prevention of suffering, which will be addressed with the help of conventional methods.

This map could be useful for the following applications:

1. As illustrative material in the discussions. Often people find solutions ad hoc, once they learn about the problem of friendly AI or are focused on one of their favourite solutions.

2. As a quick way to check whether a new solution really has been found.

3. As a tool to discover new solutions. Any systematisation creates "free cells" to fill for which one can come up with new solutions. One can also combine existing solutions or be inspired by them.

4. There are several new ideas in the map.

A companion to this map is the map of AI failures levels. In addition, this map is subordinated to the map of global risk prevention methods and corresponds to the block "Creating Friendly AI" Plan A2 within it.

The pdf of the map is here: http://immortality-roadmap.com/aisafety.pdf

Previous posts:

A map: AI failures modes and levels

A Roadmap: How to Survive the End of the Universe

A map: Typology of human extinction risks

Roadmap: Plan of Action to Prevent Human Extinction Risks

(scroll down to see the map)