You’re a mind, and that puts you in a pretty strange predicament.
Very few things get to be minds. You’re that odd bit of stuff in the universe that can form predictions and make plans, weigh and revise beliefs, suffer, dream, notice ladybugs, or feel a sudden craving for mango. You can even form, inside your mind, a picture of your whole mind. You can reason about your own reasoning process, and work to bring its operations more in line with your goals.
You’re a mind, implemented on a human brain. And it turns out that a human brain, for all its marvelous flexibility, is a lawful thing, a thing of pattern and routine. Your mind can follow a routine for a lifetime, without ever once noticing that it is doing so. And these routines can have great consequences. When a mental pattern serves you well, we call that “rationality.”
You exist as you are, hard-wired to exhibit certain species of rationality and certain species of irrationality, because of your ancestry. You, and all life on Earth, are descended from ancient self-replicating molecules. This replication process was initially clumsy and haphazard, and soon yielded replicable differences between the replicators. “Evolution” is our name for the change in these differences over time.
￼Since some of these reproducible differences impact reproducibility—a phenomenon called “selection”—evolution has resulted in organisms suited to reproduction in environments like the ones their ancestors had. Everything about you is built on the echoes of your ancestors’ struggles and victories.
And so here you are: a mind, carved from weaker minds, seeking to understand your own inner workings, that they can be improved upon—improved upon relative to your goals, and not those of your designer, evolution. What useful policies and insights can we take away from knowing that this is our basic situation?
Ghosts and Machines
Our brains, in their small-scale structure and dynamics, look like many other mechanical systems. Yet we rarely think of our minds in the same terms we think of objects in our environments or organs in our bodies. Our basic mental categories—belief, decision, word, idea, feeling, and so on—bear little resemblance to our physical categories.
Past philosophers have taken this observation and run with it, arguing that minds and brains are fundamentally distinct and separate phenomena. This is the view the philosopher Gilbert Ryle called “the dogma of the Ghost in the Machine.” But modern scientists and philosophers who have rejected dualism haven’t necessarily replaced it with a better predictive model of how the mind works. Practically speaking, our purposes and desires still function like free-floating ghosts, like a magisterium cut off from the rest of our scientific knowledge. We can talk about “rationality” and “bias” and “how to change our minds,” but if those ideas are still imprecise and unconstrained by any overarching theory, our scientific-sounding language won’t protect us from making the same kinds of mistakes as those whose theoretical posits include spirits and essences.
Interestingly, the mystery and mystification surrounding minds doesn’t just obscure our view of humans. It also accrues to systems that seem mind-like or purposeful in evolutionary biology and artificial intelligence (AI). Perhaps, if ￼we cannot readily glean what we are from looking at ourselves, we can learn more by using obviously inhuman processes as a mirror.
There are many ghosts to learn from here—ghosts past, and present, and yet to come. And these illusions are real cognitive events, real phenomena that we can study and explain. If there appears to be a ghost in the machine, that appearance is itself the hidden work of a machine.
The first sequence of The Machine in the Ghost, “The Simple Math of Evolution,” aims to communicate the dissonance and divergence between our hereditary history, our present-day biology, and our ultimate aspirations. This will require digging deeper than is common in introductions to evolution for non-biologists, which often restrict their attention to surface-level features of natural selection.
The third sequence, “A Human’s Guide to Words,” discusses the basic relationship between cognition and concept formation. This is followed by a longer essay introducing Bayesian inference.
Bridging the gap between these topics, “Fragile Purposes” abstracts from human cognition and evolution to the idea of minds and goal-directed systems at their most general. These essays serve the secondary purpose of explaining the author’s general approach to philosophy and the science of rationality, which is strongly informed by his work in AI.
Yudkowsky is a decision theorist and mathematician who works on foundational issues in Artificial General Intelligence (AGI), the theoretical study of domain-general problem-solving systems. Yudkowsky’s work in AI has been a major driving force behind his exploration of the psychology of human rationality, as he noted in his very first blog post on Overcoming Bias, The Martial Art of Rationality:
Such understanding as I have of rationality, I acquired in the course of wrestling with the challenge of Artificial General Intelligence (an endeavor which, to actually succeed, would require ￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼ ￼sufficient mastery of rationality to build a complete working rationalist out of toothpicks and rubber bands). In most ways the AI problem is enormously more demanding than the personal art of rationality, but in some ways it is actually easier. In the martial art of mind, we need to acquire the real-time procedural skill of pulling the right levers at the right time on a large, pre-existing thinking machine whose innards are not end-user-modifiable. Some of the machinery is optimized for evolutionary selection pressures that run directly counter to our declared goals in using it. Deliberately we decide that we want to seek only the truth; but our brains have hardwired support for rationalizing falsehoods. [...]
Trying to synthesize a personal art of rationality, using the science of rationality, may prove awkward: One imagines trying to invent a martial art using an abstract theory of physics, game theory, and human anatomy. But humans are not reflectively blind; we do have a native instinct for introspection. The inner eye is not sightless; but it sees blurrily, with systematic distortions. We need, then, to apply the science to our intuitions, to use the abstract knowledge to correct our mental movements and augment our metacognitive skills. We are not writing a computer program to make a string puppet execute martial arts forms; it is our own mental limbs that we must move. Therefore we must connect theory to practice. We must come to see what the science means, for ourselves, for our daily inner life.
From Yudkowsky’s perspective, I gather, talking about human rationality without saying anything interesting about AI is about as difficult as talking about AI without saying anything interesting about rationality.
In the long run, Yudkowsky predicts that AI will come to surpass humans in an “intelligence explosion,” a scenario in which self-modifying AI improves its own ability to productively redesign itself, kicking off a rapid succession of further self-improvements. The term “technological singularity” is sometimes used in place of “intelligence explosion;” until January 2013, MIRI was named “the Singularity Institute for Artificial Intelligence” and hosted an annual Singularity Summit. Since then, Yudkowsky has come to favor I.J. Good’s older term, “intelligence explosion,” to help distinguish his views from other futurist predictions, such as Ray Kurzweil’s exponential technological progress thesis.
Technologies like smarter-than-human AI seem likely to result in large societal upheavals, for the better or for the worse. Yudkowsky coined the term “Friendly AI theory” to refer to research into techniques for aligning an AGI’s preferences with the preferences of humans. At this point, very little is known about when generally intelligent software might be invented, or what safety approaches would work well in such cases. Present-day autonomous AI can already be quite challenging to verify and validate with much confidence, and many current techniques are not likely to generalize to more intelligent and adaptive systems. “Friendly AI” is therefore closer to a menagerie of basic mathematical and philosophical questions than to a well-specified set of programming objectives.
As of 2015, Yudkowsky’s views on the future of AI continue to be debated by technology forecasters and AI researchers in industry and academia, who have yet to converge on a consensus position. Nick Bostrom’s book Superintelligence provides a big-picture summary of the many moral and strategic questions raised by smarter-than-human AI.
For a general introduction to the field of AI, the most widely used textbook is Russell and Norvig’s Artificial Intelligence: A Modern Approach. In a chapter discussing the moral and philosophical questions raised by AI, Russell and Norvig note the technical difficulty of specifying good behavior in strongly adaptive AI:
[Yudkowsky] asserts that friendliness (a desire not to harm humans) should be designed in from the start, but that the designers should recognize both that their own designs may be flawed, and that the robot will learn and evolve over time. Thus the challenge is one of mechanism design—to define a mechanism for evolving AI systems under a system of checks and balances, and to give the systems utility functions that will remain friendly in the face of such changes. We can’t just give a program a static utility function, because circumstances, and our desired responses to circumstances, change over time.
Disturbed by the possibility that future progress in AI, nanotechnology, biotech- nology, and other fields could endanger human civilization, Bostrom and Ćirković compiled the first academic anthology on the topic, Global Catastrophic Risks. The most extreme of these are the existential risks, risks that could result in the permanent stagnation or extinction of humanity.
People (experts included) tend to be extraordinarily bad at forecasting major future events (new technologies included). Part of Yudkowsky’s goal in discussing rationality is to figure out which biases are interfering with our ability to predict and prepare for big upheavals well in advance. Yudkowsky’s contributions to the Global Catastrophic Risks volume, “Cognitive biases potentially affecting judgement of global risks” and “Artificial intelligence as a positive and negative factor in global risk,” tie together his research in cognitive science and AI. Yudkowsky and Bostrom summarize near-term concerns along with long-term ones in a chapter of the Cambridge Handbook of Artificial Intelligence, “The ethics of artificial intelligence.”
Though this is a book about human rationality, the topic of AI has relevance as a source of simple illustrations of aspects of human cognition. Long- term technology forecasting is also one of the more important applications of Bayesian rationality, which can model correct reasoning even in domains where the data is scarce or equivocal.
Knowing the design can tell you much about the designer; and knowing the designer can tell you much about the design.
We’ll begin, then, by inquiring into what our own designer can teach us about ourselves.