We cannot safely automate value alignment evaluation and research without thinking about delegation and discretion
Introduction Automating alignment evaluation and research is thought to be an efficient way to safeguard against uncontrollable AGI, as Joe Carlsmith and Jan Leike themselves admitted. In particular, Leike proposed a Minimal Viable Product (MVP) for alignment, consisting in: > "Building a sufficiently aligned AI system that accelerates alignment research...
Mar 242