A Fast and Loose Clustering of LLM Benchmarks
AI Benchmarks measure a variety of distinct skills, from agency to general knowledge to spatial reasoning. Two benchmarks may measure similar traits if AI models which perform well on one also perform well on the other. Moreover, these connections might be nonobvious from the descriptions of the benchmarks. This is...
Apr 106