This post reports on a portion of my analysis of Fisman and Iyengar's speed dating dataset which bears on the question of how people select romantic partners.

Note:*I made very substantial edits to the second to last section of this post having posted it, addressing questions of generalizability*. *I've also cross-posted to my blog.*

## Summary

- Participants rated one another on several dimensions. The majority of variation in the ratings is captured by the average of the different rating types: some people were regarded as good overall, and others were regarded as not good overall.
- The second most important source of variation in the ratings given to participants is that some were regarded as more attractive and fun than they were intelligent/sincere, and for others, the situation was reversed.
- Broadly, when people had to chose between partners who were seen as attractive and fun and partners who were seen as intelligent and sincere, they had a moderately strong preference for partners who were seen as attractive and fun.
- Individuals varied substantially in how they responded to the tradeoff, with some showing very strong preference for people who were seen as attractive and fun people, and others showed virtually no such preference.

The speed dating context may be unusual in that people make a decision on whether or not to see somebody again after only 4 minutes of interaction. On the other hand, some people do meet their partners in contexts such as bars and speed dating events where decisions are made based on brief interactions. To this extent, the empirical phenomena in data from the study are relevant to understanding mate selection in general.

## The Predictive Power of Attractiveness

In How Subjective Is Attractiveness? I described how the group consensus on somebody's attractiveness explained 60% of the variance in people's perceptions of attractiveness. My original purpose in writing it was as background for a discussion of how much attractiveness influenced people's decisions as to whether or not to see their partners again.

I touched on this in Predictors of Selectivity and Desirability at Speed Dating Events. The group consensus on attractiveness is highly predictive of how often people wanted to see somebody again. I remember being slightly shocked upon first viewing the graphs below:

If we average over all participants, we find that** **participants of above average attractiveness had **twice as many** suitors as participants of below average attractiveness**. **

There are questions of how the group consensus on attractiveness should be interpreted: for example, how much it's determined by physical appearance as opposed to other characteristics. But up to that ambiguity, the question of whether the connection between attractiveness and desirability was causal is a semantic one — the group consensus on attractiveness picked up on *some* characteristic that resulted in certain people having many more suitors than others. If we *define* attractiveness to be whatever that characteristic is, then the connection is causal by definition.

Despite the strong predictive power of the group consensus on attractiveness, **there was substantial variability in how much people's decisions were influenced by attractiveness, whether measured by group consensus or by their own assessment**. While 98% of participants had perceptions of attractiveness that overlapped with those of the others in the group, only 93% of participants made *decisions* that were correlated with the consensus of others on their partners' attractiveness.

## Individual responsiveness to attractiveness

To visualize the distribution of the degree to which people's decisions were influenced by their partners' attractiveness, for each individual, we form the angle between the vectors the participant's decisions, and the average attractiveness of his or her partners, and then plot these angles. The two vectors are in some ways qualitatively different, so the angles don't give a good sense for how much somebody's decisions were influenced by attractiveness in *absolute* terms, but they're helpful for thinking about how influenced people were *relative to others*.

An angle of 0 degrees represents perfect correlation while an angle of 90 degrees represents the person's decisions being *orthogonal* to the group's consensus on his or her partners' attractiveness. Angles greater than 90 degrees represent *negative* correlation. One can see that the angle was about 90 degrees for a small but significant fraction of participants, while for others the angle is very small, approaching 0 degrees.

The actual preferences of the participants surely vary less than the above graph suggests if it's taken at face value: the difference between those at the extremes and those in the middle would shrink with

- A larger sample of dates per person
- Better estimates of group consensus (based on ratings from a larger number of raters).

Still, the graph renders it plausible that the weight that people gave to attractiveness varied a lot, even if the variation was smaller than it is in the graph.

We could proceed to make "best guess" estimates of what the true distribution is, but we can get greater insight into what's going on by first adopting a shift in perspective.

## Overall desirability and the tradeoffs

Participants rated each other on attractiveness, fun, ambition, intelligence and sincerity, as well as overall likeability. Ratings on the different dimensions were all correlated, sometimes strongly. (More here). This is partially explained by perceptions of somebody on one dimension influencing perceptions of the person on other dimensions (the Halo Effect). It could be partially explained by actual correlations between the underlying traits being measured. I'll explore possible explanations in greater detail in the future. From the point of view of understanding how people's preferences vary, the main point is that **though we have 6 rating types, we have fewer than 6 independent of pieces of information**: ratings of intelligence aren't *just* ratings of intelligence, ratings of ambition aren't *just* ratings of ambition, etc.

We would like to throw out the redundant information so that we can focus on the essentials. A method that facilitates this is *principal component analysis *(PCA),* *an automated procedure that takes the 6 ratings as inputs and returns an output of 6 weighted averages of the ratings (called "principal components") that are independent of one another. The key point is that it's often the case that the procedure compresses much of the information present in *all* of the variables into the *first few* principal components (something that the procedure designed to do), and that we can discard the other principal components with little cost, reducing the number of variables that we need to consider.

If we apply PCA to the 6 ratings, the first combination that the procedure gives is a weighted average where each rating gets almost equal weight:

**good= 4* (Attractiveness) ****+ 5*(Like) ****+ 4*(Fun) +**** 4*(Intelligence) + 4*(Ambition) + **** **** **** 3*(Sincerity)**

This can be thought of as corresponding to overall favorable impressions of somebody, so I named it "good." It captures roughly 60% of the information that was in the original ratings.

The second weighted average that PCA gives is not nearly as symmetric:

**tradeoff = 4.5*(Attractiveness) +**** 3*(Like) + 3*(Fun)**** — 6*(Intelligence) — 2*(Ambition) — 5*(Sincerity)**

This principal component picks up on the fact that after the variation picked up on by the first principal component, the second largest source of variation comes from those who were rated falling on a spectrum between the two poles

attractive, fun and likable <-------------> sincere, intelligent and ambitious

The first cluster of traits is more closely connected with mainstream romance than the second cluster of traits, which are thought of as positive, but less relevant.

The"tradeoff" combination captures roughly 20% of the information in the original ratings. So together, the first two principal components capture 80% of the information in the original ratings. We could look at the rest of the combinations that PCA gives us, but doing so would complicate the analysis without telling us much more.

## Individual differences in romantic preferences

Having extracted the two principal components "good" and "tradeoff", we can examine how participants vary with respect to how their decisions depend on their partners' levels of each. Participants didn't vary very much with respect to their responsiveness to the "good" dimension. It's more interesting to examine how people differed with respect to preferences on the "tradeoff" dimension.

As background context, if we're content not to take into account differences in romantic preferences, we can model the probability of a participant's decision being yes by using a linear model for the log odds ratio:

**LOR ~ ****2*good + tradeoff + (general willingness to see partners again)**

The fact that we're *adding* the tradeoff term rather than subtracting it corresponds to people tending to favor attractive and fun partners over intelligent and sincere partners, when forced to choose.

To individualize the model while attempting to correct for the variation that one would expect by chance, I followed Andrew Gelman's suggestion and used *Bayesian hierarchical modeling*. We replace the equation above with

**LOR ~ ****2*good + (personal tradeoff coefficient)*tradeoff + (general willingness to see partners again)**

where "personal tradeoff coefficient" is a constant that depends on the individual making the decision.

The plot below shows the distribution of best guess estimates for the personal tradeoff coefficients. The title of the plot is a loose description of the "tradeoff" principal component, the precise definition of which I gave above.

The lefthand tail corresponds to some people having exhibited virtually *no* preference for attractive and fun partners over intelligent and sincere partners. The righthand tail corresponds to some people's preference being *almost twice as strong* as average.

## What this means in tangible terms

In my first draft of this post, I postponed discussion of *statistical significance* until later, but I subsequently realized that I could address it succinctly.

I formed the graphs below by:

- Estimating participants' coefficients based on the
*first 65%*of the dates that they went on. These dates are the*train set*for our model. - Forming a "high" and "low" groups of participants according to whether their coefficients were in the top or bottom 1/3
^{rd.} - Restricting consideration to those dates that were
*not in the first 65%*of dates. These dates are the*test set*for our model.

Thus, the dates that I used to estimate the coefficients are *completely disjoint* from the dates that I used to form the graphs, so that we get *unbiased estimates* for the romantic preferences that the two groups of people would show in contexts similar to those of the study.

The first graph shows the frequency with which people's decision was 'yes' as as a function of their partners' **attractiveness** level.

The slope is slightly larger for the the group with high coefficient: you can see that the initial difference between the two groups in selectivity shrinks as one passes from partners with low attractiveness to high attractiveness.

The visual appearance of the graph understates the difference between the two groups: the high group virtually *never* expressed interest people lowest part of the attractiveness spectrum, whereas people in the low group were *several more times* more likely to. This comes across more clearly if we replace the percentage on the y-axis with the corresponding *Log Odds Ratio* *. *Here "odds" has the same meaning that it does in gambling (e.g.Roulette) and "log" refers to "logarithm." In the graph below, the 0 on the y axis corresponds to decisions being yes 50% of the time, and an increase of 1 along the y-axis corresponds to the odds of a yes decision increasing by 2x:

From this, one sees that while the high group was ~4x more selective than the low group when it came to partners at the low end of the attractiveness, it was only ~ 1.5x as selective as the low group when it came to partners at the high end of the attractiveness spectrum.

The corresponding graphs with attractiveness replaced by **intelligence** and **sincerity** are

(Note the difference in scales on the axes: there was much less variation in perceptions of sincerity and intelligence than there was in perceptions of attractiveness.)

One sees that past a certain point, the high group is not responsive to increasing sincerity and intelligence, whereas the low group is.

Of course, the high group and the low group don't differ most with respect to their responsiveness to attractiveness, or intelligence, or sincerity as individual traits. They differ the most in how they respond to a *tradeoff* between attractiveness/fun and intelligence/sincerity. The graph that depicts this is:

In passing from partners for whom the tradeoff term is lowest to partners for whom its highest, the odds of being selected by members of the low group increase by **5.5x**, whereas the odds of being selected by the members of the high group increase by only **1.4x**.

The differences between the groups correspond to generalizable phenomena. In fact, I knew that the differences are statistically robust and generalizable before even doing a train/test split as I did above. What made it obvious to me is that the tradeoff coefficient correlates with many other features of the participants that were collected prior to the events...

## To Be Continued...

The question now arises: who are the people who lie at the two ends of the continuum between relative preference for attractiveness/ fun and relative preference for intelligence / sincerity? How did they spend their time? What career paths did they pursue? How did members of the opposite sex view them?

I'll offer partial answers to this questions in my next post. Readers who are intrigued can take a look at the survey instrument for a list of features present in the dataset, and guess which features correlated with the personal tradeoff coefficient.