Awareness Jailbreaking: Revealing True Alignment in Evaluation-Aware Models
This is a draft proposal. I'm planning to invest significant time here and would appreciate feedback, especially on methodology gaps, threat model assumptions, or prior work I've missed. In quantum mechanics, Heisenberg's uncertainty principle revealed a fundamental limit: observation disturbs the observed. To measure a particle's position precisely, you must...