A parole board considers the release of a prisoner: Will he be violent again? A hiring officer considers a job candidate: Will she be a valuable asset to the company? A young couple considers marriage: Will they have a happy marriage?
The cached wisdom for making such high-stakes predictions is to have experts gather as much evidence as possible, weigh this evidence, and make a judgment. But 60 years of research has shown that in hundreds of cases, a simple formula called a statistical prediction rule (SPR) makes better predictions than leading experts do. Or, more exactly:
When based on the same evidence, the predictions of SPRs are at least as reliable as, and are typically more reliable than, the predictions of human experts for problems of social prediction.1
For example, one SPR developed in 1995 predicts the price of mature Bordeaux red wines at auction better than expert wine tasters do. Reaction from the wine-tasting industry to such wine-predicting SPRs has been "somewhere between violent and hysterical."
How does the SPR work? This particular SPR is called a proper linear model, which has the form:
P = w1(c1) + w2(c2) + w3(c3) + ...wn(cn)
The model calculates the summed result P, which aims to predict a target property such as wine price, on the basis of a series of cues. Above, cn is the value of the nth cue, and wn is the weight assigned to the nth cue.2
In the wine-predicting SPR, c1 reflects the age of the vintage, and other cues reflect relevant climatic features where the grapes were grown. The weights for the cues were assigned on the basis of a comparison of these cues to a large set of data on past market prices for mature Bordeaux wines.3
There are other ways to construct SPRs, but rather than survey these details, I will instead survey the incredible success of SPRs.
- Howard and Dawes (1976) found they can reliably predict marital happiness with one of the simplest SPRs ever conceived, using only two cues: P = [rate of lovemaking] - [rate of fighting]. The reliability of this SPR was confirmed by Edwards & Edwards (1977) and by Thornton (1979).
- Unstructured interviews reliably degrade the decisions of gatekeepers (e.g. hiring and admissions officers, parole boards, etc.). Gatekeepers (and SPRs) make better decisions on the basis of dossiers alone than on the basis of dossiers and unstructured interviews. (Bloom and Brundage 1947, DeVaul et. al. 1957, Oskamp 1965, Milstein et. al. 1981; Hunter & Hunter 1984; Wiesner & Cronshaw 1988). If you're hiring, you're probably better off not doing interviews.
- Wittman (1941) constructed an SPR that predicted the success of electroshock therapy for patients more reliably than the medical or psychological staff.
- Carroll et. al. (1988) found an SPR that predicts criminal recidivism better than expert criminologists.
- An SPR constructed by Goldberg (1968) did a better job of diagnosing patients as neurotic or psychotic than did trained clinical psychologists.
- SPRs regularly predict academic performance better than admissions officers, whether for medical schools (DeVaul et. al. 1957), law schools (Swets, Dawes and Monahan 2000), or graduate school in psychology (Dawes 1971).
- SPRs predict loan and credit risk better than bank officers (Stillwell et. al. 1983).
- SPRs predict newborns at risk for Sudden Infant Death Syndrome better than human experts do (Lowry 1975; Carpenter et. al. 1977; Golding et. al. 1985).
- SPRs are better at predicting who is prone to violence than are forensic psychologists (Faust & Ziskin 1988).
- Libby (1976) found a simple SPR that predicted firm bankruptcy better than experienced loan officers.
And that is barely scratching the surface.
If this is not amazing enough, consider the fact that even when experts are given the results of SPRs, they still can't outperform those SPRs (Leli & Filskov 1985; Goldberg 1968).
So why aren't SPRs in use everywhere? Probably, suggest Bishop & Trout, we deny or ignore the success of SPRs because of deep-seated cognitive biases, such as overconfidence in our own judgments. But if these SPRs work as well as or better than human judgments, shouldn't we use them?
Robyn Dawes (2002) drew out the normative implications of such studies:
If a well-validated SPR that is superior to professional judgment exists in a relevant decision making context, professionals should use it, totally absenting themselves from the prediction.
Sometimes, being rational is easy. When there exists a reliable statistical prediction rule for the problem you're considering, you need not waste your brain power trying to make a careful judgment. Just take an outside view and use the damn SPR.4
- Chapter 2 of Bishop & Trout, Epistemology and the Psychology of Human Judgment
- Chapter 3 of Dawes & Hastie, Rational Choice in an Uncertain World
- Chapter 40 of (eds.) Gilovich, Griffin, & Kahneman, Heuristics and Biases: The Psychology of Intuitive Judgment
- Dawes, "The Robust Beauty of Improper Linear Models in Decision Making"
- Chapter 3 of Dawes, House of Cards
1 Bishop & Trout, Epistemology and the Psychology of Human Judgment, p. 27. The definitive case for this claim is made in a 1996 study by Grove & Meehl that surveyed 136 studies yielding 617 comparisons between the judgments of human experts and SPRs (in which humans and SPRs made predictions about the same cases and the SPRs never had more information than the humans). Grove & Meehl found that of the 136 studies, 64 favored the SPR, 64 showed roughly equal accuracy, and 8 favored human judgment. Since these last 8 studies "do not form a pocket of predictive excellent in which [experts] could profitably specialize," Grove and Meehl speculated that these 8 outliers may be due to random sampling error.
2 Readers of Less Wrong may recognize SPRs as a relatively simple type of expert system.
3 But, see Anatoly_Vorobey's fine objections.
4 There are occasional exceptions, usually referred to as "broken leg" cases. Suppose an SPR reliably predicts an individual's movie attendance, but then you learn he has a broken leg. In this case it may be wise to abandon the SPR. The problem is that there is no general rule for when experts should abandon the SPR. When they are allowed to do so, they abandon the SPR far too frequently, and thus would have been better off sticking strictly to the SPR, even for legitimate "broken leg" instances (Goldberg 1968; Sawyer 1966; Leli and Filskov 1984).
Bloom & Brundage (1947). "Predictions of Success in Elementary School for Enlisted Personnel", Personnel Research and Test Development in the Natural Bureau of Personnel, ed. D.B. Stuit, 233-61. Princeton: Princeton University Press.
Carpenter, Gardner, McWeeny, & Emery (1977). "Multistage scory systemfor identifying infants at risk of unexpected death", Arch. Dis. Childh., 53: 606−612.
Carroll, Winer, Coates, Galegher, & Alibrio (1988). "Evaluation, Diagnosis, and Prediction in Parole Decision-Making", Law and Society Review, 17: 199-228.
Dawes (1971). "A Case Study of Graduate Admissions: Applications of Three Principles of Human Decision-Making", American Psychologist, 26: 180-88.
Dawes (2002). "The Ethics of Using or Not Using Statistical Prediction Rules in Psychological Practice and Related Consulting Activities", Philosophy of Science, 69: S178-S184.
DeVaul, Jervey, Chappell, Carver, Short, & O'Keefe (1957). "Medical School Performance of Initially Rejected Students", Journal of the American Medical Association, 257: 47-51.
Faust & Ziskin (1988). "The expert witness in psychology and psychiatry", Science, 241: 1143−1144.
Goldberg (1968). "Simple Models of Simple Process? Some Research on Clinical Judgments", American Psychologist, 23: 483-96.
Golding, Limerick, & MacFarlane (1985). Sudden Infant Death. Somerset: Open Books.
Edwards & Edwards (1977). "Marriage: Direct and Continuous Measurement", Bulletin of the Psychonomic Society, 10: 187-88.
Howard & Dawes (1976). "Linear Prediction of Marital Happiness", Personality and Social Psychology Bulletin, 2: 478-80.
Hunter & Hunter (1984). "Validity and utility of alternate predictors of job performance", Psychological Bulletin, 96: 72-98
Leli & Filskov (1984). "Clinical Detection of Intellectual Deterioration Associated with Brain Damage", Journal of Clinical Psychology, 40: 1435–1441.
Libby (1976). "Man versus model of man: Some conflicting evidence", Organizational Behavior and Human Performance, 16: 1-12.
Lowry (1975). "The identification of infants at high risk of early death", Med. Stats. Report, London School of Hygiene and Tropical Medicine.
Milstein, Wildkinson, Burrow, & Kessen (1981). "Admission Decisions and Performance during Medical School", Journal of Medical Education, 56: 77-82.
Oskamp (1965). "Overconfidence in Case Study Judgments", Journal of Consulting Psychology, 63: 81-97.
Sawyer (1966). "Measurement and Prediction, Clinical and Statistical", Psychological Bulletin, 66: 178-200.
Stillwell, Barron, & Edwards (1983). "Evaluating Credit Applications: A Validation of Multiattribute Utility Weight Elicitation Techniques", Organizational Behavior and Human Performance, 32: 87-108.
Swets, Dawes, & Monahan (2000). "Psychological Science Can Improve Diagnostic Decisions", Psychological Science in the Public Interest, 1: 1–26.
Thornton (1977). "Linear Prediction of Marital Happiness: A Replication", Personality and Social Psychology Bulletin, 3: 674-76.
Wiesner & Cronshaw (1988). "A meta-analytic investigation of the impact of interview format and degree of structure on the validity of the employment interview", Journal of Applied Psychology, 61: 275-290.
Wittman (1941). "A Scale for Measuring Prognosis in Schizophrenic Patients", Elgin Papers 4: 20-33.
I'm skeptical, and will now proceed to question some of the assertions made/references cited. Note that I'm not trained in statistics.
Unfortunately, most of the articles cited are not easily available. I would have liked to check the methodology of a few more of them.
The paper doesn't actually establish what you say it does. There is no statistical analysis of expert wine tasters, only one or two anecdotal statements of their fury at the whole idea. Instead, the SPR is compared to actual market prices - not to experts' predictions. I think it's fair to say that the claim I quoted is overreached.
Now, about the model and its fit to data. Note that the SPR is older than 1995, when the paper was published. The NYTimes article about it which you reference is from 1990 (the paper bizarrely dates it to 1995; I'm not sure what's going on there).
The fact that there's a linear model - not specified precisely anywhere in the article - which is a good fit to wine prices for vintages of 1961-1972 (Table 3 in the paper) is not, I think, very significant on its... (read more)
The whole point of this article is that experts often think themselves better than SPR's when actually they perform no better than SPRs on average. Here we have an expert telling us that he thinks he would perform better than an SPR. Why should we be interested?
Because I didn't just state a blanket opinion. I dug into the studies, looked for data to test one of them in depth, and found it to be highly flawed. I called into question the methodology employed by the studies, as well as overgeneralizing and overreaching conclusions they're drummed up to support. The evidence that at least some studies are flawed and the methodology is shoddy should make you question the universal claim "... actually they perform no better than SPRs on average". That's why you should be interested.
My personal experience with interviewing is certainly not as important piece of evidence against the article as the specific criticisms of the studies. It's just another anecdotal data point. That's why I didn't expand on it as much as I did on the wine study, although I do believe it can be made more convincing through further elucidation.
What evidence do you have that you are better than average?
"It is difficult to get a man to understand something, when his salary depends upon his not understanding it!"
I'm most familiar with interviews for programming jobs, where an interview that doesn't ask the candidate to demonstrate job-specific skills, knowledge and aptitude is nearly worthless. These jobs are also startlingly prone to resume distortion that can make vastly different candidates look similar, especially recent graduates.
Asking for coding samples and calling previous employers, especially if coupled with a request for code solving a new (requested) problem, could potentially replace interviews. However, judging the quality of code still requires a person, so that doesn't seem to really change things to me.
I can confirm that such a "job interview" is not common in medicine. The potential employer generally relies on the credentialing process of the medical establishment. Most physicians, upon completing their training, pass a test demonstrating their ability to regurgitate the teachers' passwords, and are recommended to the appropriate certification board as "qualified" by their program director; to do otherwise would reflect badly on the program. Also, program directors are loath to remove a resident/fellow during advanced training because some warm body must show up to do the work, or the professor himself/herself might have to fill in. It is difficult to find replacements for upper level residents; the only common reason such would be available is dismissal/transfer from another program. Consequently, the USA turns out physicians of widely varied skill levels, even though their credentials are similar. In surgical specialities, it is not unusual for a particularly bright individual with all the passwords but very poor technical skills to become a surgical professor.
The (rumored) student has my respect. I would expect most surgeons to have too much of an ego to admit to that doubt rather than stumble ahead full of hubris. It would be comforting to know that your surgeon acted as if (as opposed to merely believing that) he cared more about the patient than the immediate perception of status loss. (I wouldn't care whether that just meant his thought out anticipation of future status loss for a failed operation overrode his immediate social instincts.)
That isn't an interview, it's a test. Tests are extremely useful. IQ tests are an excellent predictor of job performance, maybe the best one available. Regardless, IQ tests are usually de facto illegal in the US due to disparate impact.
That's what I thought too. The definitions I found searching all say that any interview where you decide what to ask and how to interpret the results is "unstructured". The only "structured" interviews seem to be tests with pre-determined sets of questions, and the candidate's answers judged by formal criteria.
I'm not sure this division of the "interview-space" is all that useful. I would distinguish three categories:
If I interpret the definitions I could find correctly, 3 is a "structured" interview, and both 1 and 2 are "unstructured". To my ... (read more)
Without even getting into the concrete details of these models, I'm surprised that nobody so far has pointed out the elephant in the room: in contemporary society, statistical inference about human behavior and characteristics is a topic bearing tremendous political, ideological, and legal weight. [*] Nowadays there exists a firm mainstream consensus that the use of certain sorts of conditional probabilities to make statistical predictions about people is discriminatory and therefore evil, and doing so may result not only in loss of reputation, but also in serious legal consequences. (Note that even if none of the forbidden criteria are built into your decision-making explicitly, that still doesn't leave you off the hook -- just search for "disparate impact" if you don't know what I'm talking about.)
Now of course, making any prediction about people at all necessarily involves one sort of statistical discriminatio... (read more)
If the best way to choose who to hire is with a statistical analysis of legally forbidden criteria, then keep your reasons secret and shred your work. Is that so hard?
You joke, but the world  really is choking with inefficient, kludgey workarounds for the legal prohibition of effective employment screening. For example, the entire higher education market has become, basically, a case of employers passing off tests to universities that they can't legally administer themselves. You're a terrorist if you give an IQ test to applicants, but not if you require a completely irrelevant college degree that requires taking the SAT (or the military's ASVAB or whatever the call it now).
It feels so good to ban discrimination, as long as you don't have to directly face the tradeoff you're making.
 Per MattherW's correction, this should read "Western developed economies" instead of "the world" -- though I'm sure the phenomenon I've described is more general the form it takes in the West.
The Americans with Disabilities Act limits what you can build (every building needs ramps and elevators), not where you can build it. Zoning laws are blacklist-based, not whitelist-based, so extradimensional spaces are fine. More commonly, you can easily find office space in locations that poor people can't afford to live near. And in the unlikely event that race or national origin is the key factor, you get to choose which country or city's demographics you want.
This is the identity under which I speak freely and teach defense against the dark arts. This is not the identity under which I buy office buildings and hire minions. If it was, I wouldn't be talking about hiring strategies.
Also, if I may be permitted to make a more general criticism in response to this post, I would say that while the article appears to be well-researched, it has demonstrated some of the worst problems I commonly notice on this forum. The same goes for the majority of the comments, even though many are knowledgeable and informative. What I have in mind is the fixation on concocting theories about human behavior and society based on various idées fixes and leitmotifs that are parts of the intellectual folklore here, while failing to notice issues suggested by basic common sense that are likely to be far more important.
Thus the poster notices that these models are not used in practice despite considerable evidence in their favor, and rushes to propose cognitive biases à la Kahneman & Tversky as the likely explanation. This without even stopping to think of two questions that just scream for attention. First, what is the importance of the fact that just about any issue of sorting out people is nowadays likely to be ideologically charged and legally dangerous? Second, what about the fact that these models are supposed to throw some high-status people out of work, and in a way that m... (read more)
An interesting story that I think I remember reading:
One study found that relatively inexperienced psychiatrists were more accurate at diagnosing mental illness than experienced ones. This is because inexperienced psychiatrists stuck closely to checklists rather than rely on their own judgment, and whether or not a diagnosis was considered "accurate" was based on how closely the reported symptoms matched the checklist. ;)
Now THAT part is just plain embarrassing. I mean, it's truly a mark of shame upon us if we have a tool that we know works, we are given access to the tool, and we still can't do better than the tool itself, unaided. (EDIT: By "we", I mean "the experts in the relevant fields"... which I guess isn't really a "we" as such, but you know what I mean)
Anyways, are there any nice online indexes or whatever of SPRs that make it easy to put in class of problem and have it find a SPR that's been verified to work for that sort of problem?
If anybody would like to try some statistical machine learning at home, it's actually not that hard. The tough part is getting a data set. Once that's done, most of the examples in this article are things you could just feed to some software like Weka, press a few buttons, and get a statistical model. BAM!
Let's try an example. Here is some breast cancer diagnostic data, showing a bunch of observations of people with breast cancer (age, size of tumors, etc.) and whether or not the cancer reoccurred after treatment. Can we predict cancer recurrence?
If you look at it with a decision tree, it turns out that you can get about 70% accuracy by observing two of the several factors that were observed, in a very simple decision procedure. You can do a little better by using something more sophisticated, like a naive Bayes classifier. These show us what factors are the most important, and how.
If you're interested, go ahead and play around. It's pretty easy to get started. Obviously, take everything with a grain of salt, but still, basic machine learning is surprisingly easy.
I second the advice.
Let me brag a bit. Once in a friendly discussion the following question came up: How to predict for an unknown first name whether it is a male or female name. This was in a context of Hungarian names, as all of us were Hungarians. I had a list of Hungarian first names in digital format. The discussion turned into a bet: I said I can write a program in half an hour that tells with at least 70% precision the sex of a first name it never saw before. I am quite fast with writing small scripts. It wasn't even close: It took me 9 minutes to
The model reached an accuracy of 90%. In retrospect, this is not surprising at all. Looking into the linear model, the most important feature it identified was whether the name ends with an 'a'. This trivial model alone reaches some 80% precision for Hungarian names, so if I knew this in advance, I could have won the bet in 30 seconds instead of 9 minutes, with the sed command s/a$/a FEMALE/.
Are some SPRs easy to exploit?
SPR's sound a lot like the Outside View.
This is a great article, but it only lists studies where SPRs have succeeded. In fairness, it would be good to know if there were any studies that showed SPRs failing (and also consider publication bias, etc.).
Does SPR beat prediction markets?
Well, SPRs can plausibly outperform average expertise. That's because most of the expertise is utter and complete sham.
The recidivism in example...
The judges, or psychologists, or the like, what in the world makes them experts on predicting the criminals? Did they read an unbiased sample of recidivism? Did they do any practice, earning marks for predicting criminals? Anything?
Resounding no. They never in their lives did anything that should have earned them the expert status on this task. They did other stuff that puts them first on the list when you're l... (read more)
I have two concerns about the practical implementation of this sort of thing:
If X+Y predicts Z does that mean enhancing X and Y will up the probability of Z? Not necessarily, consider the example of happy marriages. Will having more sex make your relationship happier? Or does the rule work because happy couples tend to... (read more)
Weird certainly but this is a kind of weirdness that humans are notorious for. We are terrible happiness optimisers. In the case of sex specifically having more of it is not as simple as walking over to the bedroom. For males and females alike you can want to be having more sex, be aware that having more sex would benefit your relationship and still not be 'in the mood' for it. A more indirect approach to the problem of libido and desire is required - the sort of thing that humans are not naturally good at optimising.
While this is promising indeed, it is wise not to forget about Optimization By Proxy that can occur when the thing being optimised is (or is under the control of) an intelligent agent.
My gut reaction is that this doesn't demonstrate that SPRs are good, just that humans are bad. There are tons of statistical modeling algorithms that are more sophisticated than SPRs.
Unless, of course, SPR is another word for "any statistical modeling algorithm", in which case this is just the claim that statistical machine learning is a good approach, which anyone as Bayesian as the average LessWronger probably agrees with.
Besides the legal issues with discrimination and disparate impact, another important issue here is that jobs that involve making decisions about people tend to be high-status. As a very general tendency, the higher-status a profession is, the more its practitioners are likely to organize in a guild-like way and resist intrusive innovations by outsiders -- especially innovations involving performance metrics that show the current standards of the profession in a bad light, or even worse, those that threaten a change in the way their work is done that might ... (read more)
Unfortunately linear models for a lot of situations are simply not available. The dozen or so ones in the literature are the exception, not the rule.
Correct me if I'm wrong, but the SPR is just a linear model, right? Statistics is an under appreciated field in many walks of life. My own field of speciality, experimental design, is treated with down right suspicion by scientists who have not encountered it before, who find the results counter-intuitive (when they have 4 controllable variables in an experiment they want to vary them one at a time, while the best way is to vary all 4 simultaneously...)
You speak of incredible success without given a success rate of the models. The fact that there are a dozen cases where specific models outperformed human reasoning doesn't prove much.
At the moment you recommend other people to use SPRs for their decision making based on "expert judgment". How about providing us a SPR that tells us for which problems we should use SPRs?
SPRs can be gamed much more directly than human experts. For example, imagine an SPR in place of all hiring managers. In our current place, with hiring managers, we can guess at what goes in to their decisionmaking and attempt to optimize for it, but because each manager is somewhat different, we can't know that well. A single SPR that took over for all the managers, or even a couple of very popular ones, would strongly encourage applicants to optimize for the variable most weighted in the equation. Over time this would likely decrease the value of the SPR... (read more)
Great post. Will be writing something about the legal uses of SPRs in the near future.
Anyway, the link to the Grove and Meehl study doesn't seem to work for me. It says the file is damaged and cannot be repaired.
The thing that makes me twitch about SPRs is a concern that they won't change when the underlying conditions which created their data sets change. This doesn't mean that humans are good at noticing that sort of thing, either. However, it's at least worth thinking about which approach is likely to overshoot worse when something surprising happens. Or whether there's some reason to think that the greater usual accuracy of SPRs leads to enough bigger reserves that the occasional overshoot problem (if such are worse than in a non-SPR system) is compensated for.
On interviews, I had a great deal of success hiring for clerical assistant positions by simply getting the interviewees to do a simple problem in front of us. It turned out to be a great, reliable and easy-to-justify sorter of candidates.
But, of course, it was neither unstructured nor much of an "interview" as such.
The post mentions the experts using the results of the SPR. What happens if you reverse it, and give the SPR the prediction of the expert?
Cosma Shalizi has a nice bibliography here
I would like to emphasize this part. It's not just scattered papers back then. Meehl wrote a book surveying the field in 1955.
AI systems can generally whoop humans when a limited feature set can be discovered that covers the span of a large class of examples to good effect. The challenge is when you seemingly need a new feature for each new example in order to differentiate it from the rest of the examples in that class. Essentially you are saying that the problem can be mapped to a simple function. Some problems can.
Let's imagine we are classifying avian vs. reptile. Our first example might be a gecko, and we might say 'well it's green'. So 'Color is Green' is a clue\feature a... (read more)
Also, there is an article by Dawes, Faust and Meehl. Despite the fact it was published 7 years prior to House of Cards, it contains some information not described in the chapter 3 of House of Cards.
For example, the awesome result by Goldberg: linear models of human judges were more accurate than human judges themselves:
I think the reason I don't use statistics more often is the difficulty of getting good data sets; and even when there is good data, there are often ethical problems with following it. For example: Bob lives in America, and is seeking to maximize his happiness. Americans who report high levels of spiritual conviction are twice as likely to report being "very happy" than the least religious. Should he become a devout Christian? There's evidence that the happiness comes from holding the majority opinion; should he then strive to believe whatever ... (read more)
People looking for additional resources on this matter should know that such linear models are often called "multi attribute utility models" (MAUT), and that they're discussed extensively in the literature of decision analysis and multi-criteria decision making. They're also used in choice models in the science of marketing.
The word "statistical" in the name used here is a bit of a red herring.
Atlantic, The Brain on Trial:... (read more)
Update: Added about 10 more direct PDF links to the original article.
Google-transformed version of a Word document. An example of bias selection-oriented SPRs may introduce:
Other related reading that I don't think has been mentioned yet:
Ian Ayres (cofounder of stickK.com) has a popular book called Super Crunchers that argues this exact thesis. http://www.amazon.com/Super-Crunchers-Thinking-Numbers-Smart/dp/0553805401
A classic is Tetlock's Expert Political Judgment. http://press.princeton.edu/titles/7959.html
I cannot help unleashing an evil laugh whenever I discover another tool to aid in world domination. Thank you.
Another example of this: the US political models did fantastic in predicting all sorts of outcomes on election day 2012, far exceeding all sorts of pundits or people adjusting the numbers based on gut feelings and assumptions, despite often being pretty simple or tantamount to poll averaging.
Just felt like saying thank you to lukeprog and all those who commented; this has been a great help to me in deciding what to read about next regarding determination of guaranteed values for the service the department I work in performs.
Humans use more complex utility functions to evaluate something like martial happiness. If you train a statistical model on a straight numeric value for martial happiness than the model only optimizes towards that specific aspect of happiness.
A good evaluation should test the model that trained on hedonistic happiness rating on something like the likelihood of divorce.
Acausal sexual reproduction is quite plausible, in a sense. Suppose you were a single woman living in a society with access to sophisticated genetic engineering, and you wanted to give birth to a child that was biologically yours and not do any unnatural optimizing. You could envision your ideal mate in detail, reverse-engineer the genetics of this man, and then create a sperm population that the man could have produced had he existed. I can easily imagine a genetic engineer offering this service: you walk into the office, describe the man's physical attributes, personality, and even life history, and the engineer does the rest as much as is possible (in this society, we know that a plurality of men who played shortstop in Little League have a certain allele, etc.) The child could grow up and meaningfully learn things about the counterfactual father--if you learned that the father was prone to depression, that would mean that you should watch out for that as well.
If the mother really wants to, she can take things further and specify that the man should be the kind of person who would have, had he existed, gone through the analogous procedure (with a surrogate or artificial womb), and that the counterfactual woman he would have specified would have been her. In this case, we can say that the man and the woman have acausally reproduced.
Hmm. So the man has managed to "acausally reproduce", fulfill his utility function, in spite of not existing. You could go further and posit an imaginary couple who would have chosen each other for the procedure - so they succeed in "acausally reproducing", even though neither of them exists. Then when someone tries to write a story about the imaginary couple, the child becomes observable to the writer and starts doing some reproducing of her own :-)
You simply must read http://commonsenseatheism.com/wp-content/uploads/2010/10/Sinhababu-Possible-Girls.pdf - possibly the most romantic paper I've ever read.
It's interesting to me that the proper linear model example is essentially a stripped down version of a very simple neural network with a linear activation function.
I'm skeptical, and will now proceed to question some of the assertions made/references cited. Note that I'm not trained in statistics.
Unfortunately, most of the articles cited are not easily available. I would have liked to check the methodology of a few more of them.
|For example, one SPR developed in 1995 predicts the price of mature Bordeaux red wines at auction better than expert wine tasters do.
The paper doesn't actually establish what you say it does. There is no statistical analysis of expert wine tasters, only one or two anecdotal statements of the... (read more)
I was thinking of writing a post about Bishop & Trout when I didn't see it mentioned on this site before, but I'm glad you beat me to it. (Among other things, I lent out my copy and so would have difficulty writing up a review). It's a great book.
Your upload of Dawes's "The Robust Beauty of Improper Linear Models in Decision Making" seems to be broken- at least, I'm not able to access it.
Wow. I highly recommend reading the Dawes pdf, it's illuminating:
He then goes on to show that improper linear models still beat human judgment. If your reaction to the top-level post wasn't endorsement of statistical methods for these problems, this pdf is a bunch more evidence that you can use to update your beliefs about statistical methods of prediction.
To think about it, the main critique i have for this article is:
Only lists cases where SPR 'outperformed' expertise. Of which in most we just loosely describe as 'experts' some people who had never did any proper training (with exercises and testing) to perform task in question.
Equates better correlation with "outperforms". Not the same thing. The maximum correlation happens when you classify into those with less than average risk of recidivism and those with larger than average risk. Parole board is not even supposed to work like this AFAIK.