Few days back I was looking for a taxonomy and/or catalog of AI Evaluations to understand the space and the trajectory of new areas of research & development. I wanted to see where a couple of evaluation ideas I had in mind might fit.
I found several relevant LW posts and other sources that I collated in the Q&A thread by similar name and had done a categorization based on review of the available lists - particularly using Cataloguing LLM Evaluations (2024) as a spine for organizing Alignment, Safety and Ethics related ones.
I will be happy to integrate more resources into the catalog (please leave links in comments). Consider it alphalevel collection for now.
The catalog supports hierarchical category search and filtering by tags, besides the keyword search and links to the respective evaluations. The tag names and descriptions are generated by Gemini. I have verified all the links and descriptions manually. Please feel free to leave comments if you see errors or discrepancies.
I hope this would be useful for discovering available AI Evals. I found the page awesome-ai-eval to be a rich resource going beyond evals and benchmarks.
Few days back I was looking for a taxonomy and/or catalog of AI Evaluations to understand the space and the trajectory of new areas of research & development. I wanted to see where a couple of evaluation ideas I had in mind might fit.
I found several relevant LW posts and other sources that I collated in the Q&A thread by similar name and had done a categorization based on review of the available lists - particularly using Cataloguing LLM Evaluations (2024) as a spine for organizing Alignment, Safety and Ethics related ones.
Building on it, I have created a living catalog of evaluations (and benchmarks) using Google AI Studio/Gemini.
I will be happy to integrate more resources into the catalog (please leave links in comments). Consider it alpha level collection for now.
The catalog supports hierarchical category search and filtering by tags, besides the keyword search and links to the respective evaluations. The tag names and descriptions are generated by Gemini. I have verified all the links and descriptions manually. Please feel free to leave comments if you see errors or discrepancies.
I hope this would be useful for discovering available AI Evals. I found the page awesome-ai-eval to be a rich resource going beyond evals and benchmarks.