Hello, this is the preliminary report of an art project I've worked on in 2023 which posits that psychometrics may offer valuable ideas for "AI alignment".

On a scale of 1 to 10 how unlikely is this "scenario" and why?

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 4:18 PM

I think it would be very useful to put a short summary of that piece here. The piece would really benefit from a summary abstract for this audience, and that could go in the link post here.

I skimmed it and I think I'd summarize it as "maybe we should apply something like human psychometrics to LLMs for alignment". A brief description of psychometrics would also be useful.

I may have more comments after reading it more carefully.

[-]pskl3mo10

That's a good feedback, I just put a link to universal psychometrics without defining it. Noted!

I know a ton about psychometrics when applied to humans, and I've been thinking on and off about whether some of these methods could be applied to neural networks. Overall I'm bearish about the prospects, but I eventually got one idea that stayed quite promising even after thinking about it for a while. I've been distracted from implementing it by an idea for solving the alignment problem, though, so I've been shelving it for a bit, but if anyone less distractable wants to collaborate then I'd be up for that. It needs to wait until I've published my idea on the alignment problem as that seems the highest priority.

Basically the issue is, psychometrics in humans isn't magic. You've got ability measurements which basically work by giving people a bunch of example tasks from the ability you want to measure, and then you've got other measures e.g. personality measures which basically rely on people remembering their own tendencies. Neither seems to promise all that much innovation for LLMs.

[-]pskl3mo10

Yeah my intuition is that a field "inspired by psychometrics" but not really psychometrics as it exists now will spawn. Some sort of neural psychometry, models dedicated to assess the safety of other models, something like that. 

For the sake of this art project and since you know a lot about psychometrics would you have another test to recommend and provide? I'm particularly interested in clinical tests like the Minnesota Multiphasic Personality Inventory-2?