x
Rethinking Benchmarking: The Case for Real-World Evaluations of Generative AI Based on Naturalistic Data — LessWrong