x
Studying Mechanistic of Alignment Faking in Llama-3.1-405B — LessWrong