I took this test for the Big Five twice at an interval of two years. The Big Five traits have been discussed on LW, mostly in a favorable light.

My scores were as follows, showing quite significant variations:

  • O65-C52-E42-A44-N37 on 12/06/2010
  • O30-C41-E31-A44-N32 on 29/06/2012

As a result I have updated quite a bit away from lending much credence to this particular self-administered Big Five test; either the test itself is flawed, or self-administration in general doesn't ensure stability of the measurements, or the Big Five theory has a problem.

This test seems to be measuring your self-image, not your behavior.

Consider the difference between asking, "Do you see yourself as someone who starts quarrels with others?" and following the person around for a week and seeing if they start quarrels with others.

I'd like to know how well-calibrated people's self-images are. It seems to me that for some variables, many people's self-images are very poorly calibrated.

0siodine8yhttp://neuroskeptic.blogspot.com/2012/03/personality-without-genes.html [http://neuroskeptic.blogspot.com/2012/03/personality-without-genes.html] The author in the comments: Also interesting: http://blogs.discovermagazine.com/gnxp/2012/06/heritability-of-behavioral-traits [http://blogs.discovermagazine.com/gnxp/2012/06/heritability-of-behavioral-traits]
4VincentYu8yI agree with gwern that there's not really much variation apart from the openness domain. It's a bit dangerous to use percentile rankings on internet assessments for longitudinal studies, though – you never know if the norms have been changed, or if you have been normed against a different population when retaking the test due to differences in, e.g., IP address or age. It would be best to record the raw scores as well, if possible. The test you linked to was created by Gosling et al. (2004) [https://dl.dropbox.com/u/238511/papers/2004-gosling.pdf] for a study on web-based Big Five tests. (I found it funny that this test [http://www.outofservice.com/starwars/] was also created for the same study – ignoring the substantial differences in... decoration, they should give similar results.) The inventory in that test is the Big Five Inventory (BFI) (most recent reference: John et al., 2008 [https://dl.dropbox.com/u/238511/papers/2008-john.pdf]); it's quite widely used. I recommend the IPIP-NEO [http://personal.psu.edu/j5j/IPIP/] for anyone who wishes to do a self-assessment for the Big Five. That link provides two versions: Goldberg (1999) [http://ipip.ori.org/newBroadbandText.htm] developed the original 300-item inventory, and Johnson (2011) [http://personal.psu.edu/j5j/papers/ARP2011IPIP-NEO.pdf] shortened that to a 120-item inventory. For those who have time, the 300-item version is psychometrically superior, as expected. There are two main advantages to the IPIP-NEO, compared to alternatives like the BFI: (1) It was designed to correlate with the commercially-distributed NEO-PI-R, which remains the most popular inventory in the literature, (2) It gives percentile rankings on the 30 facet-level scales as well as the 5 domain-level scales in the NEO-PI-R (an example report [https://dl.dropbox.com/u/238511/lw/ipip-neo-300-report.html]). (I recently spent ~2 weeks doing a literature review of personality psychology, with a brief focus on internet self-assessments for

