Beyond Benchmarks: A Psychometric Approach to AI Evaluation — LessWrong