As a newbie, I am trying to comprehend the AI alignment field. To get a qualitative or quantitative verdict of a test subject's alignment, we need evaluations for AI. I was wondering if there is an anchor point for evaluation categories and resources.
What I found so far
These are great resources with somewhat overlapping concepts. I am curious if the community considers any of these (or anything else) as generally accepted taxonomy and catalog of AI evaluations?
We do need a formal science of evaluations. The formalism would expose framework gaps and would provoke inquiries like value-alignment and solutions like understanding-based evaluations.