x

LESSWRONG

LW

MAlvarado — LessWrong

MAlvarado

MAlvarado

Message

6

1

10mo

MAlvarado

6

10mo

Our Experience Running Independent Evaluations on LLMs: What Have We Learned?

TL;DR Independent evaluations are both possible and valuable. Our goal is to widen the conversation on decentralized, reproducible, context-aware evaluations as public infrastructure for AI oversight, especially in regions and languages that frontier work often overlooks. Our recommendations (based on what actually worked for us): 1. Treat evaluation like an...

Oct 3, 2025•7