Approximating Human Preferences Using a Multi-Judge Learned System
TL;DR: We present a conceptual discussion and loose formalism regarding Expert Orchestration, emphasizing on judges. We motivate the problem of finding the best way of combining multiple judges scores and present a solution to it: learning the function. Then, we present the architecture and low level details of the experiments...