I agree with gwern that there's not really much variation apart from the openness domain. It's a bit dangerous to use percentile rankings on internet assessments for longitudinal studies, though – you never know if the norms have been changed, or if you have been normed against a different population when retaking the test due to differences in, e.g., IP address or age. It would be best to record the raw scores as well, if possible.
The test you linked to was created by Gosling et al. (2004) for a study on web-based Big Five tests. (I found it funny that this test was also created for the same study – ignoring the substantial differences in... decoration, they should give similar results.) The inventory in that test is the Big Five Inventory (BFI) (most recent reference: John et al., 2008); it's quite widely used.
I recommend the IPIP-NEO for anyone who wishes to do a self-assessment for the Big Five. That link provides two versions: Goldberg (1999) developed the original 300-item inventory, and Johnson (2011) shortened that to a 120-item inventory. For those who have time, the 300-item version is psychometrically superior, as expected. There are two main advantages to the IPIP-NEO, compared to alternatives like the BFI: (1) It was designed to correlate with the commercially-distributed NEO-PI-R, which remains the most popular inventory in the literature, (2) It gives percentile rankings on the 30 facet-level scales as well as the 5 domain-level scales in the NEO-PI-R (an example report).
(I recently spent ~2 weeks doing a literature review of personality psychology, with a brief focus on internet self-assessments for the Big Five. For a very brief overview of the field, I recommend Robins and Donnellan's (2009) article in The Corsini Encyclopedia of Psychology. For an up-to-date review of the Big Five, I recommend McCrae's (2009) article in The Cambridge Handbook of Personality Psychology.)
If it's worth saying, but not worth its own post, even in Discussion, it goes here.