Towards an empirical investigation of inner alignment — LessWrong