Fundamentally Flawed, or Fast and Frugal?


41


Kaj_Sotala

Whenever biases are discussed around here, it tends to happen under the following framing: human cognition is a dirty, jury-rigged hack, only barely managing to approximate the laws of probability even in a rough manner. We have plenty of biases, many of them a result of adaptations that evolved to work well in the Pleistocene, but are hopelessly broken in a modern-day environment.

That's one interpretation. But there's also a different interpretation: that a perfect Bayesian reasoner is computationally intractable, and our mental algorithms make for an excellent, possibly close to an optimal, use of the limited computational resources we happen to have available. It's not that the programming would be bad, it's simply that you can't do much better without upgrading the hardware. In the interest of fairness, I will be presenting this view by summarizing a classic 1996 Psychological Review article, "Reasoning the Fast and Frugal Way: Models of Bounded Rationality" by Gerd Gigerenzer and Daniel G. Goldstein. It begins by discussing two contrasting views: the Enlightenment ideal of the human mind as the perfect reasoner, versus the heuristics and biases program that considers human cognition as a set of quick-and-dirty heuristics.

Many experiments have been conducted to test the validity of these two views, identifying a host of conditions under which the human mind appears more rational or irrational. But most of this work has dealt with simple situations, such as Bayesian inference with binary hypotheses, one single piece of binary data, and all the necessary information conveniently laid out for the participant (Gigerenzer & Hoffrage, 1995). In many real-world situations, however, there are multiple pieces of information, which are not independent, but redundant. Here, Bayes’ theorem and other “rational” algorithms quickly become mathematically complex and computationally intractable, at least for ordinary human minds. These situations make neither of the two views look promising. If one would apply the classical view to such complex real-world environments, this would suggest that the mind is a supercalculator like a Laplacean Demon (Wimsatt, 1976)— carrying around the collected works of Kolmogoroff, Fisher, or Neyman—and simply neds a memory jog, like the slave in Plato’s Meno. On the other hand, the heuristics-and-biases view of human irrationality would lead us to believe that humans are hopelessly lost in the face of real-world complexity, given their supposed inability to reason according to the canon of classical rationality, even in simple laboratory experiments.

There is a third way to look at inference, focusing on the psychological and ecological rather than on logic and probability theory. This view questions classical rationality as a universal norm and thereby questions the very definition of “good” reasoning on which both the Enlightenment and the heuristics-and-biases views were built. Herbert Simon, possibly the best-known proponent of this third view, proposed looking for models of bounded rationality instead of classical rationality. Simon (1956, 1982) argued that information-processing systems typically need to satisfice rather than optimize. Satisficing, a blend of sufficing and satisfying, is a word of Scottish origin, which Simon uses to characterize algorithms that successfully deal with conditions of limited time, knowledge, or computational capacities. His concept of satisficing postulates, for instance, that an organism would choose the first object (a mate, perhaps) that satisfies its aspiration level—instead of the intractable sequence of taking the time to survey all possible alternatives, estimating probabilities and utilities for the possible outcomes associated with each alternative, calculating expected utilities, and choosing the alternative that scores highest.

Let us consider the following example question: Which city has a larger population? (a) Hamburg (b) Cologne.

The paper describes algorithms fitting into a framework that the authors call a theory of probabilistic mental models (PMM). PMMs fit three visions: (a) Inductive inference needs to be studied with respect to natural environments; (b) Inductive inference is carried out by satisficing algorithms; (c) Inductive inferences are based on frequencies of events in a reference class. PMM theory does not strive for the classical Bayesian ideal, but instead attempts to build an algorithm the mind could actually use.

These satisficing algorithms dispense with the fiction of omniscient Laplacean Demon, who has all the time and knowledge to search for all relevant information, to compute the weights and covariances, and then to integrate all this information into an inference.

The first algorithm presented is the Take the Best algorithm, named because its policy is "take the best, ignore the rest". In the first step, it invokes the recognition principle: if only one of two objects is recognized, it chooses the recognized object. If neither is recognized, it chooses randomly. If both are recognized, it moves on to the next discrimination step. For instance, if a person is asked which of city a and city b is bigger, and the person has never heard of b, they will pick a.

If both objects are recognized, the algorithm will next search its memory for useful information that might provide a cue regarding the correct answer. Suppose that you know a certain city has its own football team, while another doesn't have one. It seems reasonable to assume that a city having a football team correlates with the city being of at least some minimum size, so the existence of a football team has positive cue value for predicting city size - it signals a higher value on the target variable.

In the second step, the Take the Best algorithm retrieves from memory the cue values of the highest ranking cue. If the cue discriminates, which is to say one object has a positive cue value and the other does not, the search is terminated and the object with the positive cue value is chosen. If the cue does not discriminate, the algorithm keeps searching for better cues, choosing randomly if no discriminating cue is found.

The algorithm is hardly a standard statistical tool for inductive inference: It does not use all available information, it is non-compensatory and nonlinear, and variants of it can violate transitivity. Thus, it differs from standard linear tools for inference such as multiple regression, as well as from nonlinear neural networks that are compensatory in nature. The Take The Best algorithm is noncompensatory because only the best discriminating cue determines the inference or decision; no combination of other cue values can override this decision. [...] the algorithm violates the Archimedian axiom, which implies that for any multidimensional object a (a1, a2, ... an) preferred to b (b1, b2, ... bn) where a1 dominates b1, this preference can be reversed by taking multiples of any one or a combination of b2, b3, ... , bn. As we discuss, variants of this algorithm also violate transitivity, one of the cornerstones of classical rationality (McClennen, 1990).

This certainly sounds horrible: possibly even more horrifying is that a wide variety of experimental results make perfect sense if we assume that the test subjects are unconsciously employing this algorithm. Yet, despite all of these apparent flaws, the algorithm works.

The authors designed a scenario where 500 simulated individuals with varying amounts of knowledge were presented with pairs of cities and were tasked with choosing the bigger one (83 cities, 3,403 city pairs). The Take the Best algorithm was pitted against five other algorithms that were suggested by "several colleagues in the fields of statistics and economics": Tallying (where the number of positive cue values for each object is tallied across all cues and the object with the largest number of positive cue values is chosen), Weighted Tallying, the Unit-Weight Linear Model, the Weighted Linear Model, and Multiple Regression.

Take the Best was clearly the fastest algorithm, needing to look up far fewer cue values than the rest. But what about the accuracy? When the simulated individuals had knowledge of all the cues, Take the Best drew as many correct inferences as any of the other algorithms, and more than some. When looking at individuals with imperfect knowledge? Take the Best won or tied for the best position for individuals with knowledge of 20 and 50 percent of the cues, and didn't lose by more than a few tenths of a percent for individuals that knew 10 and 75 percent of the cues. Averaging over all the knowledge classes, Take the Best made 65.8% correct inferences, tied with Weighted Tallying for the gold medal.

The authors also tried two, even more stupid algorithms, which were variants of Take the Best. Take the Last, instead of starting the search from the highest-ranking cue, first tries the cue that discriminated last, then the cue that discriminated the time before the last, and so on. The Minimalist algorithm picks a cue at random. This produced a perhaps surprisingly small drop in accuracy, with Take the Last getting 64,7% correct inferences and Minimalist 64,5%.

After the algorithm comparison, the authors spend a few pages discussing some of the principles related to the PMM family of algorithms and their empirical validity, as well as the implications all of this might have on the study of rationality. They note, for instance, that even though transitivity (if we prefer a to b and b to c, then we should also prefer a to c) is considered a cornerstone axiom in classical relativity, several algorithms violate transitivity without suffering very much from it.

At the beginning of this article, we pointed out the common opposition between the rational and the psychological, which emerged in the nineteenth century after the breakdown of the classical interpretation of probability (Gigerenzer et al., 1989). Since then, rational inference is commonly reduced to logic and probability theory, and psychological explanations are called on when things go wrong. This division of labor is, in a nutshell, the basis on which much of the current research on judgment under uncertainty is built. As one economist from the Massachusetts Institute of Technology put it, “either reasoning is rational or it’s psychological” (Gigerenzer, 1994). Can not reasoning be both rational and psychological?

We believe that after 40 years of toying with the notion of bounded rationality, it is time to overcome the opposition between the rational and the psychological and to reunite the two. The PMM family of cognitive algorithms provides precise models that attempt to do so. They differ from the Enlightenment’s unified view of the rational and psychological, in that they focus on simple psychological mechanisms that operate under constraints of limited time and knowledge and are supported by empirical evidence. The single most important result in this article is that simple psychological mechanisms can yield about as many (or more) correct inferences in less time than standard statistical linear models that embody classical properties of rational inference. The demonstration that a fast and frugal satisficing algorithm won the competition defeats the widespread view that only “rational” algorithms can be accurate. Models of inference do not have to forsake accuracy for simplicity. The mind can have it both ways.