A non-adversarial prose text produces strong late-layer divergence in Gemma-3. I measured it; I'm not sure what it means.
TL;DR for ML Specialists: The Core: An empirical study on how long, semantically dense, completely benign text (with zero triggers, instructions, or jailbreak prompts) drives an implicit shift in the model's latent space trajectories. The Effect: Dilution of the initial system prompt and a bypass of post-training alignment constraints (e.g.,...
Jun 181