I don't, incidentally, think that our algorithms are anywhere close to optimal, but I nonetheless felt that the opposing point of view still merits a bit more attention than it has had here so far. They do have a point, even if they're not 100% correct.
This could actually act as counterevidence against the claim that AI will surpass humans around the time that the processing speed of computers rivals that of the human brain.
It may be that running a non-jury-rigged rational system against the complexity of the real world requires another order of magnitude or more of processing power.
This brings up the likelihood that initial AIs will need to be jury-rigged, and will have their own set of cognitive biases.
a perfect Bayesian reasoner is computationally intractable, and our mental algorithms make for an excellent, possibly close to an optimal, use of the limited computational resources we happen to have available
Looking at Sandberg and Bostrom's The Wisdom of Nature: An Evolutionary Heuristic for Human Enhancement, we see that there are several reasons why the human brain's native algorithms are unlikely to be anything close to optimal, even given the limited computational resources we happen to have available inside our skulls:
Changed tradeoffs. Evoluti
An important question - how changed is the environment, really?
That's a great discussion to have. I'd say the biggest changes are that a modern person interacts with a lot of other people and receives a lot of symbolic information. Other "major" changes, like increased availability of food or better infant healthcare, look to me minor by comparison. Not sure how to weigh this stuff, though.
It is worth remembering that human computation is a limited resource - we just don't have the ability to subject everything to Bayesian analysis. So, save our best rationality for what's important, and use heuristics to decide what kind of chips to buy at the grocery store.
It would be extraordinary if the algorithm that is optimal given infinite computational resource is also optimal given limited resource.
I suspect that by framing this as a battle between Bayesian inference and actual evolved human algorithms, we are missing the third alternative: algorithm X, which is the optimal algorithm for decision-making given the resources and options that we have in the society that we find ourselves in.
Incidentally, if I don't have a good answer to a "guessing" problem immediately, I find it faster to just Google the relevant facts than to try to struggle to find a distinction between them that I can latch onto.
As for Hamburg vs. Cologne, my recognition heuristic is more familiar with Hamburg as a city than Cologne as a city (I know Hamburg is in Germany, I suspect that Cologne is in France). On the other hand, I know that I recognize Hamburg because I often eat hamburgers, which doesn't seem like it says much about the city. Nevertheless, if ...
The demonstration that a fast and frugal satisficing algorithm won the competition defeats the widespread view that only “rational” algorithms can be accurate.
I am suspicious of work that attempts to provide evidence for a counterintuitive result in a way that could fairly obviously have been rigged. In this case, the key question is how "generic" their competition really was. It might be more convincing if arguments could be made about a plausible "real-world" distribution of problem instances, then a set of sample competitions drawn from that distribution and various decision algorithms run on those instances.
The demonstration that a fast and frugal satisficing algorithm won the competition defeats the widespread view that only “rational” algorithms can be accurate.
While this demonstration is interesting in some sense, it's pretty obvious that for any algorithm one can find an example problem at which the algorithm excels. Does the paper state how many example problems were tried?
(Not directly related, but may be interesting to someone. )
In a certain technical sense, "satisficing" is formally equivalent to expected utility maximization. Specifically, consider an interval on a real line (e.g. the amount of money that could be made), and a continuous and monotonous utility function on that interval. Expected utility maximization for that utility function u (i.e. the choice of a random variable X with codomain in the amounts of money) is then equivalent to maximization of probability Pr(X>V), where V is a random variable ...
I think one thing that evolution could have easily done with our existing hardware is to at least allow us to use rational algorithms whenever it's not intractable to do so. This would have easily eliminated things such as Akrasia, where our rational thoughts do give a solution, but our instincts do not allow us to use them.
Heh, this reminds me of something I saw a while ago. http://plover.net/~bonds/shibboleths.html
Here is an example of an amusing "Fast and Frugal" heuristic for evaluating claims with a lot of missing knowledge and required computation: http://xkcd.com/678/
Outstanding post and clearly written. I'd like to see more posts of this nature on here. The results definitely seem to make sense, and seem pleasing to my intuition, but I feel kind of skeptical about such a simplified account of the cognitive process. I suppose you have to start somewhere though, and I'm not really at all familiar with this kind of science.
From personal experience, encountering a lot of excellent mathematicians in University, I have often felt that some of the best mathematicians are people who simply have the best computational resource...
This point is important if one is constructing a theory about how future AIs will think, and assumes that they will reach Aumann agreement because they are Bayesians.
The "recognition heuristic" tends to work surprisingly well for stock picking, or so I've heard.
Find a bunch of "ordinary" people who have no special knowledge of stock picking, give them a list of companies, and ask them to say which ones they've heard of. Stocks of companies people have heard of tend to do better than stocks that people haven't heard of.
There is a very clear cluster of people working in cognitive science with bayesian and machine learning savvy, centered around Tenenbaum, Griffiths, Kemp, Goodman, Chater, Oaksley, Perfors, Steyvers, et cetera. They often coauthor papers and have something of a unified perspective on The Way to do things (more unified and more coauthory even restricting the field to other bayesian and machine learning savvy folk, like Hinton, Gigerenzer, Friston, MD Lee). It seems like they should have a name. Tengrikemgoochoakpersteyvetcet perhaps? But then, perhaps not.
A...
What if the question required picking the smaller city? Then, if you've only heard of one, it would seem you should pick the unknown city, as you are more likely to know of larger than smaller cities. Doesn't the take the best algorithm, by specifying taking the one you know as a general fast-and-frugal tactic, lead you astray? Do you know whether subjects still choose the known city?
Just a question for MWI advocates.
If this world W1 has a parallel world W2, which has a parallel world W3, and which W1 hasn't - this is the very difference between W1 and W2 - is the W3 second order parallel to us?
There'a no person who plays chess on a good level while employing Bayesian reasoning.
In Go Bayesian reasoning performs even worse. A good Go player makes some of his move simply because he appreciate their beauty and without having "rational" reasons for them. Our brain is capable of doing very complex pattern matching that allows the best humans to be better at a large variety of tasks than computers which use rule based algorithms.
It seems to me that the problems with human rationality really start to come out when our sense of self is somehow on the line.
It's one thing to guess at which of two foreign cities is bigger. It's another to guess at which child is smarter -- our own child or or somone else's.
So perhaps we as humans have hardware and software which is pretty good, except that we sometimes use our brainpower to fool ourselves.
Whenever biases are discussed around here, it tends to happen under the following framing: human cognition is a dirty, jury-rigged hack, only barely managing to approximate the laws of probability even in a rough manner. We have plenty of biases, many of them a result of adaptations that evolved to work well in the Pleistocene, but are hopelessly broken in a modern-day environment.
That's one interpretation. But there's also a different interpretation: that a perfect Bayesian reasoner is computationally intractable, and our mental algorithms make for an excellent, possibly close to an optimal, use of the limited computational resources we happen to have available. It's not that the programming would be bad, it's simply that you can't do much better without upgrading the hardware. In the interest of fairness, I will be presenting this view by summarizing a classic 1996 Psychological Review article, "Reasoning the Fast and Frugal Way: Models of Bounded Rationality" by Gerd Gigerenzer and Daniel G. Goldstein. It begins by discussing two contrasting views: the Enlightenment ideal of the human mind as the perfect reasoner, versus the heuristics and biases program that considers human cognition as a set of quick-and-dirty heuristics.
Let us consider the following example question: Which city has a larger population? (a) Hamburg (b) Cologne.
The paper describes algorithms fitting into a framework that the authors call a theory of probabilistic mental models (PMM). PMMs fit three visions: (a) Inductive inference needs to be studied with respect to natural environments; (b) Inductive inference is carried out by satisficing algorithms; (c) Inductive inferences are based on frequencies of events in a reference class. PMM theory does not strive for the classical Bayesian ideal, but instead attempts to build an algorithm the mind could actually use.
The first algorithm presented is the Take the Best algorithm, named because its policy is "take the best, ignore the rest". In the first step, it invokes the recognition principle: if only one of two objects is recognized, it chooses the recognized object. If neither is recognized, it chooses randomly. If both are recognized, it moves on to the next discrimination step. For instance, if a person is asked which of city a and city b is bigger, and the person has never heard of b, they will pick a.
If both objects are recognized, the algorithm will next search its memory for useful information that might provide a cue regarding the correct answer. Suppose that you know a certain city has its own football team, while another doesn't have one. It seems reasonable to assume that a city having a football team correlates with the city being of at least some minimum size, so the existence of a football team has positive cue value for predicting city size - it signals a higher value on the target variable.
In the second step, the Take the Best algorithm retrieves from memory the cue values of the highest ranking cue. If the cue discriminates, which is to say one object has a positive cue value and the other does not, the search is terminated and the object with the positive cue value is chosen. If the cue does not discriminate, the algorithm keeps searching for better cues, choosing randomly if no discriminating cue is found.
This certainly sounds horrible: possibly even more horrifying is that a wide variety of experimental results make perfect sense if we assume that the test subjects are unconsciously employing this algorithm. Yet, despite all of these apparent flaws, the algorithm works.
The authors designed a scenario where 500 simulated individuals with varying amounts of knowledge were presented with pairs of cities and were tasked with choosing the bigger one (83 cities, 3,403 city pairs). The Take the Best algorithm was pitted against five other algorithms that were suggested by "several colleagues in the fields of statistics and economics": Tallying (where the number of positive cue values for each object is tallied across all cues and the object with the largest number of positive cue values is chosen), Weighted Tallying, the Unit-Weight Linear Model, the Weighted Linear Model, and Multiple Regression.
Take the Best was clearly the fastest algorithm, needing to look up far fewer cue values than the rest. But what about the accuracy? When the simulated individuals had knowledge of all the cues, Take the Best drew as many correct inferences as any of the other algorithms, and more than some. When looking at individuals with imperfect knowledge? Take the Best won or tied for the best position for individuals with knowledge of 20 and 50 percent of the cues, and didn't lose by more than a few tenths of a percent for individuals that knew 10 and 75 percent of the cues. Averaging over all the knowledge classes, Take the Best made 65.8% correct inferences, tied with Weighted Tallying for the gold medal.
The authors also tried two, even more stupid algorithms, which were variants of Take the Best. Take the Last, instead of starting the search from the highest-ranking cue, first tries the cue that discriminated last, then the cue that discriminated the time before the last, and so on. The Minimalist algorithm picks a cue at random. This produced a perhaps surprisingly small drop in accuracy, with Take the Last getting 64,7% correct inferences and Minimalist 64,5%.
After the algorithm comparison, the authors spend a few pages discussing some of the principles related to the PMM family of algorithms and their empirical validity, as well as the implications all of this might have on the study of rationality. They note, for instance, that even though transitivity (if we prefer a to b and b to c, then we should also prefer a to c) is considered a cornerstone axiom in classical relativity, several algorithms violate transitivity without suffering very much from it.