Observing LLM Security Failures from a User Behavior Perspective”
Dec 17, 20251
Abstract This paper introduces the concept of Human Behavioral-Driven Alignment Erosion. Non-technical red teaming demonstrates that in multi-turn, unstructured dialogues, Large Language Models (LLMs) experience a systemic attenuation of their safety boundaries due to ambiguous user intent and cumulative contextual reframing. This decay constitutes an Inference Path Manipulation risk, fundamentally...