Our Experience Running Independent Evaluations on LLMs: What Have We Learned?
TL;DR Independent evaluations are both possible and valuable. Our goal is to widen the conversation on decentralized, reproducible, context-aware evaluations as public infrastructure for AI oversight, especially in regions and languages that frontier work often overlooks. Our recommendations (based on what actually worked for us): 1. Treat evaluation like an...
Oct 3, 20257