AI oversight and its limitations
By AI oversight, we refer to things like evaluation, monitoring, staged deployment, or interpretatiblity. This sequence is meant as a loose collection of posts, aimed at understanding how well AI oversight works when applied to AI that reasons strategically. And at mapping out where oversight fails, such that we know where we need to use other tools instead.