This is a description of the methodology behind the latest iteration of my Targeted Personality Test. Feel free to take it either before or after reading the article. This post can also be read at my Substack. Thanks to Justis Millis for providing feedback and proofreading on this post.
Thanks to a lot of anonymous respondents to my test[1], I think yes! I factor-analyzed the data from my Targeted Personality Test, and came up with a hierarchical personality model which hopefully should be better at cutting personality-space at its joints.
Quick recap: The problem
Empirical personality models are built on correlations. If a cluster of variables are all correlated with each other, then we assume that there is a latent personality factor accounting for these correlations, and we score the factor using an aggregate of the correlating variables.
However, the standard datasets with which these correlations are computed include lots of near-synonymous item pairs such as “I am sensitive to the needs of others” vs “I am concerned about others”. Because these are near-synonymous, they will tautologically end up correlated with each other regardless of the underlying structure of personality. The data analysis risks mapping out the clusters of synonyms, rather than the actual traits we wanted to know about.
My solution: Narrow items
I took people who scored high and low on traits in a traditional personality test (the SPI-81-27&5) and asked them what they had in mind with their responses. This gave me concrete descriptions; for instance someone who agreed to “Compassion” questions like “I am concerned about others” wrote:
I would not see someone go without something that I had in abundance, if I see a homeless person on the streets even when I have very little money I will stop and talk with them maybe offer them a cigarette and if I have money I offer food. I will go out of my way to help people out if I have something they need and I have no use of it then they can have it for free. I hate seeing people upset and will do everything in my power to fix that upset for them even at cost to myself.
Using these concrete descriptions, I came up with very narrow personality items - in this case, “I give things to homeless people”. Giving things to homeless people is a rather narrow and specific way of being concerned with others, and it is unlikely to overlap in meaning with other ways like showing support to people who are concerned about catching diseases. After writing nearly 222 of such items, I released the Targeted Personality Test to get data on it.
Initial results: Concerning
In “Which personality traits are real? Stress-testing the lexical hypothesis”, I looked at how the narrow/concrete items related both to each other, and to more standard abstract personality items typically used in personality tests. In that study, I grouped everything by the initial abstract personality traits used in the SPI-27 test that my test was based on.
I found that the narrow items within a personality trait were not very correlated with each other (though that given the weak correlations with each other, they were plenty correlated with the abstract items that were usually used). This suggests that personality traits are more heterogenous or differently structured than the SPI-27 assumes.
If the structure of personality traits is different than what standard personality tests assume, then this suggests a path for progress in personality psychology: namely, to unveil the true structure of personality traits.
Agglomerative item clustering
To search for the underlying personality factors, there are two broad approaches: the standard version based on matrix algebra, and the less-common version based on item clustering. I used both, but for the first stage I used item clustering.[2]
The idea with item clustering is that because a personality trait induces correlations between the items it affects, we can identify the personality traits present in a dataset by searching for groups of correlated items. Such groups can be built in steps: start with each item in its own group, and then take two highly correlated items and combine their groups, repeating until all the items are in the same group.
This produces a hierarchy of groups, and one has to decide on some point to stop combining items so that one has multiple groups of items. I experimented with a few different stopping methods, but I didn’t find anything principled that worked, so I decided to just arbitrarily stop at 27 groups of items, since that was the number of personality factors I started with, from the SPI-27 test.
The item clustering created a long list of personality facets which can be seen here. The obvious question is then whether these facets are any better than the original SPI-27 facets that this test was based on. One way to quantify the quality of the facets is to look at the loadings of the items, i.e. the correlation between the item responses and the facet trait levels. I’m not sure how much of the effect is just statistical overfitting, but there seems to be an increase in item loadings in the new clusters compared to the old SPI-27 facets:
Anyway, I then had to go through each of the facets and interpret the items enough to give them a name. I struggled with this because there often seemed to be some tangentially related items in the clusters, and maybe also because English is my second language. To a significant extent, I enlisted the help of Claude, but I fear Claude tended to go for overly complex names. The full list of facets with their names and items can be seen here.
Matrix algebra correlation modelling
These are a lot of facets, and they are far from independent. Instead they have lots of strong correlations:
Higher-order personality factors like the Big Five are built on the idea that the correlations between the facets are themselves due to bigger personality traits. To derive those traits, I can use factor analysis again, though this time rather than a clustering-based approach, I will use a matrix-algebra-based approach, since that is better behaved when variables can load on multiple factors at once.
The basic principle for the matrix algebra approach is that if a trait has an influence of strength λA on variable A and an influence of strength λB on variable B, then the correlation between variable A and variable B will be λAλB. If multiple traits influence the variables, then one sums up the influence over all the traits.
By searching for a matrix λ which reproduces the correlations, one can thereby guess which traits there are. Though in practice there will be infinitely many possible matrices that reproduce the correlations, so the convention is to pick the simplest one.[3]
After performing the factor analysis and naming the factors, I got a matrix that related each factor to some of the personality facets. Using the matrix, I came up with some names for the factors. The matrix and the names can be seen below:
A priori, we should expect to end up with something resembling the Big Five, partly because that is what people in the scientific literature have found to be the structure of personality, and partly because the original SPI-27 test that this study was based on is a Big Five test.
This expectation is satisfied to a reasonable degree. First of all, two of the factors (Openness and Conscientiousness) map fairly directly to the Big Five. But second of all, the remaining factors are not unrelated to the Big Five, but instead seem like a “rotation” of them: Boldness combines high Extraversion and low Neuroticism, Selflessness combines high Neuroticism, high Agreeableness and high Extraversion, and Propriety is related to Agreeableness.
How much of the SPI-27 is preserved?
Because I had items from the SPI-27 in my test, I can compute their correlations with the facets I found. The correlations range from 0.5ish and up, so everything is at least somewhat preserved, though there are some traits that are preserved much better than others.
We can also go in the other direction, and ask how well the new clustered traits can be predicted from the SPI-27.
Evaluation
I’m ambivalent about a lot of the facets in this personality test. To try to get an overview, I came up with a list of criteria I cared about, and scored each of the criteria on a scale where 0 is acceptable, positive numbers are outstanding, and negative numbers are problematic. The full scoring can be seen here:
Overall, I think it was a promising project, but that it needs to be performed at larger scale to be definitive. This isn’t going to be trivial - my survey was 306 items, which is quite long and would be expensive to scale up further.
More formally, Ward’s method, with as the distance matrix. I’m not super strong on this, so I don’t know if it’s optimal and can’t give a good introduction. I picked it because in my experience, Ward’s method gives interpretable results, and I used Claude’s recommendation for the distance formula.
This is a description of the methodology behind the latest iteration of my Targeted Personality Test. Feel free to take it either before or after reading the article. This post can also be read at my Substack. Thanks to Justis Millis for providing feedback and proofreading on this post.
In my prior post “Which personality traits are real? Stress-testing the lexical hypothesis”, I observed that a lot of the personality traits that are measured by conventional personality tests are not very “real”: they lump together nearly unrelated behaviors. Can we do better?
Thanks to a lot of anonymous respondents to my test[1], I think yes! I factor-analyzed the data from my Targeted Personality Test, and came up with a hierarchical personality model which hopefully should be better at cutting personality-space at its joints.
Quick recap: The problem
Empirical personality models are built on correlations. If a cluster of variables are all correlated with each other, then we assume that there is a latent personality factor accounting for these correlations, and we score the factor using an aggregate of the correlating variables.
However, the standard datasets with which these correlations are computed include lots of near-synonymous item pairs such as “I am sensitive to the needs of others” vs “I am concerned about others”. Because these are near-synonymous, they will tautologically end up correlated with each other regardless of the underlying structure of personality. The data analysis risks mapping out the clusters of synonyms, rather than the actual traits we wanted to know about.
My solution: Narrow items
I took people who scored high and low on traits in a traditional personality test (the SPI-81-27&5) and asked them what they had in mind with their responses. This gave me concrete descriptions; for instance someone who agreed to “Compassion” questions like “I am concerned about others” wrote:
Using these concrete descriptions, I came up with very narrow personality items - in this case, “I give things to homeless people”. Giving things to homeless people is a rather narrow and specific way of being concerned with others, and it is unlikely to overlap in meaning with other ways like showing support to people who are concerned about catching diseases. After writing nearly 222 of such items, I released the Targeted Personality Test to get data on it.
Initial results: Concerning
In “Which personality traits are real? Stress-testing the lexical hypothesis”, I looked at how the narrow/concrete items related both to each other, and to more standard abstract personality items typically used in personality tests. In that study, I grouped everything by the initial abstract personality traits used in the SPI-27 test that my test was based on.
I found that the narrow items within a personality trait were not very correlated with each other (though that given the weak correlations with each other, they were plenty correlated with the abstract items that were usually used). This suggests that personality traits are more heterogenous or differently structured than the SPI-27 assumes.
If the structure of personality traits is different than what standard personality tests assume, then this suggests a path for progress in personality psychology: namely, to unveil the true structure of personality traits.
Agglomerative item clustering
To search for the underlying personality factors, there are two broad approaches: the standard version based on matrix algebra, and the less-common version based on item clustering. I used both, but for the first stage I used item clustering.[2]
The idea with item clustering is that because a personality trait induces correlations between the items it affects, we can identify the personality traits present in a dataset by searching for groups of correlated items. Such groups can be built in steps: start with each item in its own group, and then take two highly correlated items and combine their groups, repeating until all the items are in the same group.
This produces a hierarchy of groups, and one has to decide on some point to stop combining items so that one has multiple groups of items. I experimented with a few different stopping methods, but I didn’t find anything principled that worked, so I decided to just arbitrarily stop at 27 groups of items, since that was the number of personality factors I started with, from the SPI-27 test.
The item clustering created a long list of personality facets which can be seen here. The obvious question is then whether these facets are any better than the original SPI-27 facets that this test was based on. One way to quantify the quality of the facets is to look at the loadings of the items, i.e. the correlation between the item responses and the facet trait levels. I’m not sure how much of the effect is just statistical overfitting, but there seems to be an increase in item loadings in the new clusters compared to the old SPI-27 facets:
Anyway, I then had to go through each of the facets and interpret the items enough to give them a name. I struggled with this because there often seemed to be some tangentially related items in the clusters, and maybe also because English is my second language. To a significant extent, I enlisted the help of Claude, but I fear Claude tended to go for overly complex names. The full list of facets with their names and items can be seen here.
Matrix algebra correlation modelling
These are a lot of facets, and they are far from independent. Instead they have lots of strong correlations:
Higher-order personality factors like the Big Five are built on the idea that the correlations between the facets are themselves due to bigger personality traits. To derive those traits, I can use factor analysis again, though this time rather than a clustering-based approach, I will use a matrix-algebra-based approach, since that is better behaved when variables can load on multiple factors at once.
The basic principle for the matrix algebra approach is that if a trait has an influence of strength λA on variable A and an influence of strength λB on variable B, then the correlation between variable A and variable B will be λAλB. If multiple traits influence the variables, then one sums up the influence over all the traits.
By searching for a matrix λ which reproduces the correlations, one can thereby guess which traits there are. Though in practice there will be infinitely many possible matrices that reproduce the correlations, so the convention is to pick the simplest one.[3]
After performing the factor analysis and naming the factors, I got a matrix that related each factor to some of the personality facets. Using the matrix, I came up with some names for the factors. The matrix and the names can be seen below:
A priori, we should expect to end up with something resembling the Big Five, partly because that is what people in the scientific literature have found to be the structure of personality, and partly because the original SPI-27 test that this study was based on is a Big Five test.
This expectation is satisfied to a reasonable degree. First of all, two of the factors (Openness and Conscientiousness) map fairly directly to the Big Five. But second of all, the remaining factors are not unrelated to the Big Five, but instead seem like a “rotation” of them: Boldness combines high Extraversion and low Neuroticism, Selflessness combines high Neuroticism, high Agreeableness and high Extraversion, and Propriety is related to Agreeableness.
How much of the SPI-27 is preserved?
Because I had items from the SPI-27 in my test, I can compute their correlations with the facets I found. The correlations range from 0.5ish and up, so everything is at least somewhat preserved, though there are some traits that are preserved much better than others.
We can also go in the other direction, and ask how well the new clustered traits can be predicted from the SPI-27.
Evaluation
I’m ambivalent about a lot of the facets in this personality test. To try to get an overview, I came up with a list of criteria I cared about, and scored each of the criteria on a scale where 0 is acceptable, positive numbers are outstanding, and negative numbers are problematic. The full scoring can be seen here:
Overall, I think it was a promising project, but that it needs to be performed at larger scale to be definitive. This isn’t going to be trivial - my survey was 306 items, which is quite long and would be expensive to scale up further.
And a retweet from Aella.
More formally, Ward’s method, with as the distance matrix. I’m not super strong on this, so I don’t know if it’s optimal and can’t give a good introduction. I picked it because in my experience, Ward’s method gives interpretable results, and I used Claude’s recommendation for the distance formula.
By this I mean with e.g. varimax rotation.