CAIS-inspired approach towards safer and more interpretable AGIs — LessWrong