How Do We Evaluate AI Evaluations?
In the last few years, I have had the chance to build a few benchmarks for LLMs. This process has led me to think deeply about how we, as a community, assess the quality of our own evaluation tools. There seems to be a lot of ambiguity in this area,...
Oct 13, 20251