Whenever biases are discussed around here, it tends to happen under the following framing: human cognition is a dirty, jury-rigged hack, only barely managing to approximate the laws of probability even in a rough manner. We have plenty of biases, many of them a result of adaptations that evolved to work well in the Pleistocene, but are hopelessly broken in a modern-day environment.

That's one interpretation. But there's also a different interpretation: that a perfect Bayesian reasoner is computationally intractable, and our mental algorithms make for an excellent, possibly close to an optimal, use of the limited computational resources we happen to have available. It's not that the programming would be bad, it's simply that you can't do much better without upgrading the hardware. In the interest of fairness, I will be presenting this view by summarizing a classic 1996 Psychological Review article, "Reasoning the Fast and Frugal Way: Models of Bounded Rationality" by Gerd Gigerenzer and Daniel G. Goldstein. It begins by discussing two contrasting views: the Enlightenment ideal of the human mind as the perfect reasoner, versus the heuristics and biases program that considers human cognition as a set of quick-and-dirty heuristics.

Many experiments have been conducted to test the validity of these two views, identifying a host of conditions under which the human mind appears more rational or irrational. But most of this work has dealt with simple situations, such as Bayesian inference with binary hypotheses, one single piece of binary data, and all the necessary information conveniently laid out for the participant (Gigerenzer & Hoffrage, 1995). In many real-world situations, however, there are multiple pieces of information, which are not independent, but redundant. Here, Bayes’ theorem and other “rational” algorithms quickly become mathematically complex and computationally intractable, at least for ordinary human minds. These situations make neither of the two views look promising. If one would apply the classical view to such complex real-world environments, this would suggest that the mind is a supercalculator like a Laplacean Demon (Wimsatt, 1976)— carrying around the collected works of Kolmogoroff, Fisher, or Neyman—and simply neds a memory jog, like the slave in Plato’s Meno. On the other hand, the heuristics-and-biases view of human irrationality would lead us to believe that humans are hopelessly lost in the face of real-world complexity, given their supposed inability to reason according to the canon of classical rationality, even in simple laboratory experiments.

There is a third way to look at inference, focusing on the psychological and ecological rather than on logic and probability theory. This view questions classical rationality as a universal norm and thereby questions the very definition of “good” reasoning on which both the Enlightenment and the heuristics-and-biases views were built. Herbert Simon, possibly the best-known proponent of this third view, proposed looking for models of bounded rationality instead of classical rationality. Simon (1956, 1982) argued that information-processing systems typically need to satisfice rather than optimize. Satisficing, a blend of sufficing and satisfying, is a word of Scottish origin, which Simon uses to characterize algorithms that successfully deal with conditions of limited time, knowledge, or computational capacities. His concept of satisficing postulates, for instance, that an organism would choose the first object (a mate, perhaps) that satisfies its aspiration level—instead of the intractable sequence of taking the time to survey all possible alternatives, estimating probabilities and utilities for the possible outcomes associated with each alternative, calculating expected utilities, and choosing the alternative that scores highest.

Let us consider the following example question: *Which city has a larger population? (a) Hamburg (b) Cologne.*

The paper describes algorithms fitting into a framework that the authors call a theory of *probabilistic mental model*s (PMM). PMMs fit three visions: (a) Inductive inference needs to be studied with respect to natural environments; (b) Inductive inference is carried out by satisficing algorithms; (c) Inductive inferences are based on frequencies of events in a reference class. PMM theory does not strive for the classical Bayesian ideal, but instead attempts to build an algorithm the mind could actually use.

These satisficing algorithms dispense with the fiction of omniscient Laplacean Demon, who has all the time and knowledge to search for all relevant information, to compute the weights and covariances, and then to integrate all this information into an inference.

The first algorithm presented is the *Take the Best* algorithm, named because its policy is "take the best, ignore the rest". In the first step, it invokes the *recognition principle*: if only one of two objects is recognized, it chooses the recognized object. If neither is recognized, it chooses randomly. If both are recognized, it moves on to the next discrimination step. For instance, if a person is asked which of city *a* and city *b* is bigger, and the person has never heard of *b*, they will pick *a*.

If both objects are recognized, the algorithm will next search its memory for useful information that might provide a cue regarding the correct answer. Suppose that you know a certain city has its own football team, while another doesn't have one. It seems reasonable to assume that a city having a football team correlates with the city being of at least some minimum size, so the existence of a football team has positive cue value for predicting city size - it signals a higher value on the target variable.

In the second step, the Take the Best algorithm retrieves from memory the cue values of the highest ranking cue. If the cue *discriminates*, which is to say one object has a positive cue value and the other does not, the search is terminated and the object with the positive cue value is chosen. If the cue does not discriminate, the algorithm keeps searching for better cues, choosing randomly if no discriminating cue is found.

The algorithm is hardly a standard statistical tool for inductive inference: It does not use all available information, it is non-compensatory and nonlinear, and variants of it can violate transitivity. Thus, it differs from standard linear tools for inference such as multiple regression, as well as from nonlinear neural networks that are compensatory in nature. The Take The Best algorithm is noncompensatory because only the best discriminating cue determines the inference or decision; no combination of other cue values can override this decision. [...] the algorithm violates the Archimedian axiom, which implies that for any multidimensional object a (a

_{1}, a_{2}, ... a_{n}) preferred to b (b_{1}, b_{2}, ... b_{n}) where a_{1}dominates b_{1}, this preference can be reversed by taking multiples of any one or a combination of b_{2}, b_{3}, ... , b_{n}. As we discuss, variants of this algorithm also violate transitivity, one of the cornerstones of classical rationality (McClennen, 1990).

This certainly sounds horrible: possibly even more horrifying is that a wide variety of experimental results make perfect sense if we assume that the test subjects are unconsciously employing this algorithm. Yet, despite all of these apparent flaws, the algorithm *works*.

The authors designed a scenario where 500 simulated individuals with varying amounts of knowledge were presented with pairs of cities and were tasked with choosing the bigger one (83 cities, 3,403 city pairs). The Take the Best algorithm was pitted against five other algorithms that were suggested by "several colleagues in the fields of statistics and economics": Tallying (where the number of positive cue values for each object is tallied across all cues and the object with the largest number of positive cue values is chosen), Weighted Tallying, the Unit-Weight Linear Model, the Weighted Linear Model, and Multiple Regression.

Take the Best was clearly the fastest algorithm, needing to look up far fewer cue values than the rest. But what about the accuracy? When the simulated individuals had knowledge of all the cues, Take the Best *drew as many correct inferences as any of the other algorithms, and more than some*. When looking at individuals with imperfect knowledge? Take the Best won or tied for the best position for individuals with knowledge of 20 and 50 percent of the cues, and didn't lose by more than a few tenths of a percent for individuals that knew 10 and 75 percent of the cues. Averaging over all the knowledge classes, Take the Best made 65.8% correct inferences, tied with Weighted Tallying for the gold medal.

The authors also tried two, even more stupid algorithms, which were variants of Take the Best. Take the Last, instead of starting the search from the highest-ranking cue, first tries the cue that discriminated last, then the cue that discriminated the time before the last, and so on. The Minimalist algorithm picks a cue at random. This produced a perhaps surprisingly small drop in accuracy, with Take the Last getting 64,7% correct inferences and Minimalist 64,5%.

After the algorithm comparison, the authors spend a few pages discussing some of the principles related to the PMM family of algorithms and their empirical validity, as well as the implications all of this might have on the study of rationality. They note, for instance, that even though transitivity (if we prefer a to b and b to c, then we should also prefer a to c) is considered a cornerstone axiom in classical relativity, several algorithms violate transitivity without suffering very much from it.

At the beginning of this article, we pointed out the common opposition between the rational and the psychological, which emerged in the nineteenth century after the breakdown of the classical interpretation of probability (Gigerenzer et al., 1989). Since then, rational inference is commonly reduced to logic and probability theory, and psychological explanations are called on when things go wrong. This division of labor is, in a nutshell, the basis on which much of the current research on judgment under uncertainty is built. As one economist from the Massachusetts Institute of Technology put it, “either reasoning is rational or it’s psychological” (Gigerenzer, 1994). Can not reasoning be both rational and psychological?

We believe that after 40 years of toying with the notion of bounded rationality, it is time to overcome the opposition between the rational and the psychological and to reunite the two. The PMM family of cognitive algorithms provides precise models that attempt to do so. They differ from the Enlightenment’s unified view of the rational and psychological, in that they focus on simple psychological mechanisms that operate under constraints of limited time and knowledge and are supported by empirical evidence. The single most important result in this article is that simple psychological mechanisms can yield about as many (or more) correct inferences in less time than standard statistical linear models that embody classical properties of rational inference. The demonstration that a fast and frugal satisficing algorithm won the competition defeats the widespread view that only “rational” algorithms can be accurate. Models of inference do not have to forsake accuracy for simplicity. The mind can have it both ways.

I don't, incidentally, think that our algorithms are anywhere close to optimal, but I nonetheless felt that the opposing point of view still merits a bit more attention than it has had here so far. They

dohave a point, even if they're not 100% correct.This could actually act as counterevidence against the claim that AI will surpass humans around the time that the processing speed of computers rivals that of the human brain.

It may be that running a non-jury-rigged rational system against the complexity of the real world requires another order of magnitude or more of processing power.

This brings up the likelihood that initial AIs will need to be jury-rigged, and will have their own set of cognitive biases.

Looking at Sandberg and Bostrom's The Wisdom of Nature: An Evolutionary Heuristic for Human Enhancement, we see that there are several reasons why the human brain's native algorithms are unlikely to be anything close to optimal, even given the limited computational resources we happen to have available inside our skulls:

Changed tradeoffs. Evoluti

That's a great discussion to have. I'd say the biggest changes are that a modern person interacts with a

lotof other people and receives alotof symbolic information. Other "major" changes, like increased availability of food or better infant healthcare, look to me minor by comparison. Not sure how to weigh this stuff, though.It is worth remembering that human computation is a limited resource - we just don't have the ability to subject everything to Bayesian analysis. So, save our best rationality for what's important, and use heuristics to decide what kind of chips to buy at the grocery store.

It would be extraordinary if the algorithm that is optimal given infinite computational resource is also optimal given limited resource.

I suspect that by framing this as a battle between Bayesian inference and actual evolved human algorithms, we are missing the third alternative: algorithm X, which is the optimal algorithm for decision-making given the resources and options that we have in the society that we find ourselves in.

Incidentally, if I don't have a good answer to a "guessing" problem immediately, I find it faster to just Google the relevant facts than to try to struggle to find a distinction between them that I can latch onto.

As for Hamburg vs. Cologne, my recognition heuristic is more familiar with Hamburg as a city than Cologne as a city (I know Hamburg is in Germany, I suspect that Cologne is in France). On the other hand, I know that I recognize Hamburg because I often eat hamburgers, which doesn't seem like it says much about the city. Nevertheless, if ... (read more)

I am suspicious of work that attempts to provide evidence for a counterintuitive result in a way that could fairly obviously have been rigged. In this case, the key question is how "generic" their competition really was. It might be more convincing if arguments could be made about a plausible "real-world" distribution of problem instances, then a set of sample competitions drawn from that distribution and various decision algorithms run on those instances.

While this demonstration is interesting in some sense, it's pretty obvious that for any algorithm one can find an example problem at which the algorithm excels. Does the paper state how many example problems were tried?

(Not directly related, but may be interesting to someone. )

In a certain technical sense, "satisficing" is formally equivalent to expected utility maximization. Specifically, consider an interval on a real line (e.g. the amount of money that could be made), and a continuous and monotonous utility function on that interval. Expected utility maximization for that utility function

u(i.e. the choice of a random variable X with codomain in the amounts of money) is then equivalent to maximization of probability Pr(X>V), where V is a random variable ... (read more)I think one thing that evolution could have easily done with our existing hardware is to at least allow us to use rational algorithms whenever it's not intractable to do so. This would have easily eliminated things such as Akrasia, where our rational thoughts do give a solution, but our instincts do not allow us to use them.

Heh, this reminds me of something I saw a while ago. http://plover.net/~bonds/shibboleths.html

Here is an example of an amusing "Fast and Frugal" heuristic for evaluating claims with a lot of missing knowledge and required computation: http://xkcd.com/678/

Outstanding post and clearly written. I'd like to see more posts of this nature on here. The results definitely seem to make sense, and seem pleasing to my intuition, but I feel kind of skeptical about such a simplified account of the cognitive process. I suppose you have to start somewhere though, and I'm not really at all familiar with this kind of science.

From personal experience, encountering a lot of excellent mathematicians in University, I have often felt that some of the best mathematicians are people who simply have the best computational resource... (read more)

This point is important if one is constructing a theory about how future AIs will think, and assumes that they will reach Aumann agreement because they are Bayesians.

The "recognition heuristic" tends to work surprisingly well for stock picking, or so I've heard.

Find a bunch of "ordinary" people who have no special knowledge of stock picking, give them a list of companies, and ask them to say which ones they've heard of. Stocks of companies people have heard of tend to do better than stocks that people haven't heard of.

There is a very clear cluster of people working in cognitive science with bayesian and machine learning savvy, centered around Tenenbaum, Griffiths, Kemp, Goodman, Chater, Oaksley, Perfors, Steyvers, et cetera. They often coauthor papers and have something of a unified perspective on The Way to do things (more unified and more coauthory even restricting the field to other bayesian and machine learning savvy folk, like Hinton, Gigerenzer, Friston, MD Lee). It seems like they should have a name. Tengrikemgoochoakpersteyvetcet perhaps? But then, perhaps not.

A... (read more)

What if the question required picking the

smallercity? Then, if you've only heard of one, it would seem you should pick the unknown city, as you are more likely to know of larger than smaller cities. Doesn't the take the best algorithm, by specifying taking the one you know as a general fast-and-frugal tactic, lead you astray? Do you know whether subjects still choose the known city?Just a question for MWI advocates.

If this world W1 has a parallel world W2, which has a parallel world W3, and which W1 hasn't - this is the very difference between W1 and W2 - is the W3 second order parallel to us?

There'a no person who plays chess on a good level while employing Bayesian reasoning.

In Go Bayesian reasoning performs even worse. A good Go player makes some of his move simply because he appreciate their beauty and without having "rational" reasons for them. Our brain is capable of doing very complex pattern matching that allows the best humans to be better at a large variety of tasks than computers which use rule based algorithms.

It seems to me that the problems with human rationality really start to come out when our sense of self is somehow on the line.

It's one thing to guess at which of two foreign cities is bigger. It's another to guess at which child is smarter -- our own child or or somone else's.

So perhaps we as humans have hardware and software which is pretty good, except that we sometimes use our brainpower to fool ourselves.