Teaching Models to Dream of Better Monitors through Evaluator Conditioned Training
Edit: we renamed the technique from Monitor Sensitive Training (MST) to Evaluator Conditioned Training (ECT) TL;DR * We introduce Evaluator Conditioned Training (ECT), a new post-training technique where we augment training data with evaluator labels that describe how evaluation is going to be applied for each sample. We then change...
Mar 1941