I just read Roman Yampolskiy's new 500-pages-multi-author anthology: Artificial Intelligence Safety and Security to get a comprehensive view of the field. Here is my attempt at reviewing the book while explaining the main points of those 28 independent chapters.
The first part, Concerns of Luminaries, is made of 11 of the most influential articles related to future AI progress, presented in chronological order. Reading this part felt like navigating through history, chatting with Bill Joy and Ray Kurzweil.
The second part, Responses of Scholars, is comprised of 17 chapters written by academics specifically for this book. Each chapter addresses AI Safety and Security from a different perspective, from Robotics/Computer security to Value Alignment/Ethics.
The book starts with the influential essay written by Bill Joy at the beginning of the century. The author, co-founder of Sun Microsystems, author of vi and core contributor of BSD Unix, recounts a decisive talk he had with Ray Kurzweil in 1998 that changed his perspective on the future of the technology.
In particular, Joy compares Genetics, Nanotechnology and Robotics (GNR) in the 21st century with Weapons of Mass Destruction (WMD) last century, being mainly concerned with Nanotechnology and Robotics (where robotics encompasses both actual robotics but also AI.
The author's knowledge about nanotechnology comes from Feynman's talk There is plenty of room at the bottom and Drexler's Engines of Creation (1986). Joy believed nanotechnology didn't work, until he discovered that nanoscale molecular electronics was becoming practical. This raised his awareness about Knowledge-Based Mass Destruction amplified by self-replication in nanotechnologies.
My comments: this article was not much about AI, but gave a general overview of the concerns with future technology. I really enjoyed this essay on GNR by someone who contributed this much to technological progress.
It's one of the only chapter in The Singularity is Near that addresses existential risk.
Responding to Bill Joy's article, Kurzweil presents several ways to build defensive technology to avoid existential risk: to prevent out-of-control self-replicating nanorobots, just build an immune system that can also self-replicate.
For Kurzweil, "Intelligence is inherently impossible to control". His proposition to align AI with human values is "to foster those values in our society today and [go] forward. [...] The non-biological intelligence we are creating is and will be embedded in our societies and will reflect our values."
My comments: I was pleasantly surprised to see Kurzweil addressing Yudkwosky's Friendly AI and Bostrom's framework of existential risk in 2005. He appears to know very well the risks associated with an intelligence explosion, but have different opinions on how to deal with them.
This is an essential AI paper describing the core "drives" that any sufficiently advanced intelligence would possess (e.g. self-improvement, rationality or self-protection).
My comments: this paper presents critical examples of instrumental goals, laying the groundwork for current AI Safety research.
The paper starts by giving some context about AGI, and then presents principles to better think about AGI ethics:
My comments: this chapter answered Kurzweil and cited Omohundro (cf. previous chapters), giving a feeling of consistency to the book.
Max Tegmark proposes a more physicist-oriented approach to Friendly AI. The questions I found the most interesting are:
My comments: some exciting points and examples. Interesting to have Tegmark's physics-oriented perspective.
This paper presents a method of producing a self-replicating AI with a safe goal of intelligence distillation, where the key metric is the description length of an AI capable of open-ended recursive improvement.
Additionally, Drexler defines what he calls Transitional AI Safety, or reduction-risks methods for AI Safety research, like:
My comments: This was the first technical chapter, and it was much more difficult to follow. It feels like Drexler (author of the Engines of Creation on nanotechnology which was cited in the beginning by Bill Joy) is answering the previous chapters that often cited his work! In the conclusion, the author gives more meta-advices on AI Safety research, like bridging the AI research agenda gap or enriching the conceptual universe, which I found really interesting.
The paper surveys methods and problems for the value learning problem. One of the key component is an inductive value learning system that would learn to classify outcomes, using some labeled value-learning data. Soares then considers how to adapt the algorithm for different issues (e.g. corrigibility or ontology/ambiguity identification). To solve those problems, such an algorithm would need to be able to learn from sparse data and to identify a referent of a label of the training data for any given model of reality.
My comments: This paper was very enjoyable to read. Clear and straight to the point. It surveys multiple important problems and answers them with a simple and concrete method: inductive value learning.
Essentially, it's possible to send adversarial examples to a classifier that will be misclassified, even in a real world setting (e.g. using a camera input, not feeding directly data into the model). Even assuming the difference between an adversarial example and a training example is much smaller (in magnitude) than a certain noise, the adversarial example can still be misclassified while the noise correctly classified.
My comments: Straight to the point paper with a bunch of concrete experiments, pictures and results. Made me consider the Security concerns in a more pragmatic way (e.g. adding a noise on a stop sign).
Multiple paths of AI development are discussed (e.g. in "robotic-embodied childhood", child-like robots are fostered into human homes and raised like children). Other concepts such as Sapience, the control problem or ethics are considered.
My comments: The essay explains superficially multiple concepts. The author seems to be trying to give an intuition for a "Skynet-scenario" to a general audience. This chapter felt weak and fuzzy compared to the rest.
"Ten years from now, you won't be able to tell whether you're interacting with a human online or not. In the future, most online speech and content will be machines talking to machines."
Machine-driven communication (or MADCOM) will revolutionize online interaction. This could lead to global information warfare where humans can't compete alone with computational propaganda. Multiple US Policy recommendations are given to limit the bad consequences of MADCOM and help implement countermeasures.
My comments: I found the thorough analysis of the possible impacts of MADCOM insightful. This made me update my beliefs about the importance of computational propaganda this century. However, I found that the chapter focused too much on the short-term. For instance, in the definition of AGI (in the glossary), it says: "AGI is still science-fiction"!
What would be the short-, medium- and long-term impacts of an open AI development?
In the short- and medium-term, AI labs would still conduct original research to build skills, keep up with the state of the art and have a monopoly on their research for a few months (while competitors are catching up). So openness would result in accelerating AI progress.
In the long-term, the final stages for building AGI will likely be much more competitive. To avoid a tight race where AI Safety is dismissed, a singleton scenario is preferable.
Another possibility to take into account is hardware overhang. If algorithmic breakthrough is what leads to an intelligence explosion, then openness would favor small groups (that don't have access to advanced hardware). On the other hand, if hardware is the decisive factor, then openness would favor "elite", or already established labs.
Openness in AI development would clearly help AI Safety research, for it's difficult to work on making AI systems safer if those systems are kept secret. AI Safety could benefit more from openness than AI Capability, because of a need of external perspectives. However, altruistic outsiders are already likely to openly contribute to AI Safety.
My comments: I found this paper very dense. It made me deeply consider the strategic intricacies of openness. I think this chapter is fundamental for this book and I was happy to have it at the end of this part.
Two intelligences are differentiated: Human Intelligence (HI) and Gene Intelligence (GI). GI is essentially the intelligence of all human genomes (or the intelligence of evolution). GI provides control for human reproduction (and therefore controls HI). If all intelligences prove to be inherently dangerous, then we could use the same human principles that made humanity safe for AI (22 principles are considered).
My comments: This chapter appeared to me both concrete and ambitious because of the precise description of those 22 proposed principles.
In this chapter, the mathematician Edward Frenkel tells his story. In particular, he describes the defense mechanism after a traumatism called dissociation, and how it affected his research life.
His point is the following: at the end humans will get to program the AI, so we should make sure every AI Safety personnel is sane to avoid a flight 9525 scenario, where a depressed pilot caused the death of hundreds of passengers.
My comments: His story touched me, but I felt it did not directly address AI Safety.
Is it safe to have an oracle AI answering only yes/no question and being reset every time? Can the AI send a lethal message?
The AI has several methods to (still) try to send a specific message:
My comments: Very clear chapter. The math to prove the bounds were easy to follow. I am still unsure about the possibility that the humans would already know the partition Q_1, ... , Q_n, or try to decipher a lethal message. The methods seem to only work if the humans know what method the AI is using and want to know the content of the message.
There is a trade-off between spending time trying to solve motivation drift, representation drift, and reducing the risks of hacking.
My comments: Short and clear chapter. It was a bit redundant considering the chapter "The basic AI drives" and "The value learning problem".
This chapter addresses verifiability using a framework for deontology. The main result is that verifying if an agent will always exhibit a good behaviour given a deontology is not computable. This result is then applied to AI Safety strategy.
My comments: I found the math symbols unnecessary. More generally, the chapter was too technical for a non-expert audience.
After a general introduction on what adversarial examples are, the chapter presents a taxonomy of different attacks depending on the level of information about the model being attacked.
One of the key claims is that AI security against adversarial examples should not be neglected, because given enough time and efforts an attacker can build a pseudo-model and cause harm.
My comments: This chapter was very well presented and clear. It resonated with one of the first chapters "Adversarial examples in the physical world".
A framework to compute distances and represent conditional preferences (e.g. "preferring red cars if the car is a convertible") is proposed. The value alignment procedure relies on a distance between subjective preferences and ethical principles.
My comments: The chapter provided very little explanation of how the procedure for value alignment worked intuitively, and I found it difficult to follow without any background on CP-nets.
In economy, beneficial addiction increases your consumption capital as you consume. For instance, a superintelligence could get utility from learning mathematics, and the more it learns mathematics the more it can learn it "quickly" by self-modification (the self-modification uses mathematics).
The author claims that a superintelligence will have a drive to pursue such beneficial addictions. Therefore, AI Safety researchers should try to find beneficial addictions that promote human welfare.
My comments: It's the first article of the book that is heavily influenced by economics. I found it clear and it convinced me of the importance of beneficial addictions.
This chapter addresses security concerns regarding a robotic framework: the Robotic Operating System (ROS). This framework is only used in research, does not have security by default and has multiple security holes (e.g. communication between nodes use clear text in TCP/IP and UDP/IP).
Additional security features are presented:
My comments: This chapter made me realize how important securing robotic applications is. No matter how safe we make our AI systems, if someone is able to hack a connected device or a robot, then everything is lost.
Social choice theory studies how we can aggregate information about agents into a group decision. Although most normative questions (i.e. how the world should be) about AI ethics will be provided by social choice, some will be built into the AI by its designers. Some questions that must be answered are:
Solutions to those questions include:
My comments: I feel like this chapter was adapted from an AI ethics article for a textbook on AI Safety and Security, but did not particularly address AI Safety issues. The story about Condorcet was unnecessary.
The ASI-PATH model proposes to estimate the risk of an AI catastrophe from the probability of the conjunction of multiple events. However, this model does not take into account the fact that Major Strategic Advantages (MSA) are sufficient for catastrophic risks (i.e. human well-being damage on a large scale with more than 10 million deaths) and that any global turbulence can lead to existential risk.
After exhaustively listing Decisive Strategic Advantage (DSA) and MSA enablers, Sotala proposes a disjunctive diagram that could lead to an AI catastrophe.
My comments: Being already familiar with the ASI-PATH model, I was happy to see another diagram that helped me think about AI risk estimates and an exhaustive list of DSA/MSA enablers.
How can International Relations be improved to better manage AI Risk, given the theory of Offensive Realism (i.e. the lack of a world government leads states to joust for power)? This chapter focused on militarized AI, AI Security and political instability.
My comments: I found that the chapter could be applied to any highly destructive weapons and was not specific to AI.
The key claim is that humanity needs to build an enlightened despot, or friendly supersingleton to prevent state and non-state actors to blow up the world with dual-use emerging technologies
My comments: Very clear diagrams. This made me update my beliefs about an enlightened despot scenario.
There is a drive to build Military Tool AIs because an AGI will face enemies, such as:
Examples of Military AI (such as the Terminator) have become taboo, and that is a pity because it is (still) a likely outcome.
My comments: Made me reconsider the importance of military AI. Making progress on Alignment is not enough.
Value-Sensitive Design (VSD) is a flexible and capable design for building Aligned agents. It's self-reflexive, fallible and continually improving (it seeks to predict emerging values/issues and influence the design of technologies early in the process). The main focus is on stakeholders' values.
My comments: I found this chapter too abstract. I would have preferred more concrete applications or a more precise formalization.
The claim in this chapter is that AI Safety research should prefer consequentialism over deontology. The author then does an introduction on normative ethics: Utilitarianism values happiness (maximizing principle) where Kantianism values the moral good will (never substract, i.e. never treat people solely as means).
My comments: This chapter did not address AI Safety and superficially covered normative ethics.
Kevin Warwick criticizes two points made be Penrose in Shadows of the Mind (1994):
My comments: I feel like this chapter addresses problems that were being discussed more than 20 years ago, that are already solved or not relevant anymore (e.g. Asimov's three laws).
The first part, presenting authoritative articles from top AI researchers, made me realize how the most influential ideas of this century have built upon each other. I was pleased to see previous chapters being cited over and over by the next ones. Most of the content of this part is available online, but I enjoyed the selection and organization of those chapters.
The second part, made of 17 chapters written by academics specifically for this book, discussed recent research on very different topics (adversarial machine learning, social choice, nanotechnology, ethics, alignment, robotics, AI governance, AI strategy, psychology, AI policy), and gave me a general overview of the field of AI Safety and Security.
Thanks for the positive comment on my chapter. I'm going to be doing more work on AGI and utility functions so if you (or anyone else) has any further thoughts please contact me.
Typo: Tegmarck should be Tegmark