x
Can a stronger model fake being a weaker one? Mostly not — LessWrong