x
From Shell to Core: A Multi-Agent Architecture for Real-Time Alignment Faking Detection and Ethical Brake Preservation — LessWrong