When is it Better to Train on the Alignment Proxy?
This is a response to Matt's earlier post. If you see "a large mixture of alignment proxies" when you look at a standard loss function, my post might save you from drawing silly conclusions from the earlier post. If you parse the world into non-overlapping magisteria of "validation losses" and...
Mar 11, 202514