This text originated from a retreat in late 2018, where researchers from FHI, MIRI and CFAR did an extended double-crux on AI safety paradigms, with Eric Drexler and Scott Garrabrant in the core. In the past two years I tried to improve it in terms of understandability multiple times, but empirically it seems quite inadequate. As it seems unlikely I will have time to invest further work into improving it, I'm publishing it as it is, with the hope that someone else will maybe understand the ideas even at this form, and describe them more clearly.
The box inversion hypothesis consists of the two following propositions
- There exists something approximating a duality / an
... (read 841 more words →)
I think silence is a clearly sensible strategy for obvious reasons.