x
Predicting Rare LLM Failures with 30× Fewer Rollouts — LessWrong