x

LESSWRONG

LW

Maria Federica Martino Lena — LessWrong

Maria Federica Martino Lena

Maria Federica Martino Lena

Message

4

1

2mo

Maria Federica Martino Lena

4

2mo

We cannot safely automate value alignment evaluation and research without thinking about delegation and discretion

Introduction Automating alignment evaluation and research is thought to be an efficient way to safeguard against uncontrollable AGI, as Joe Carlsmith and Jan Leike themselves admitted. In particular, Leike proposed a Minimal Viable Product (MVP) for alignment, consisting in: > "Building a sufficiently aligned AI system that accelerates alignment research...