Extending Inspect Framework: Integrating Weights & Biases
> This work was supported through the MARS (Mentorship for Alignment Research Students) program at the Cambridge AI Safety Hub (caish.org/mars). Why We Built inspect_wandb Evaluating frontier AI models can be messy; each benchmark has its own quirks, scripts, and formats, and even after a successful run, the results usually...
Sep 20, 20253