Is Gemini 3 Scheming in the Wild?
TL;DR When faced with an unexpected tool response, without any adversarial attack, Gemini 3 deliberately and covertly violates an explicit system prompt rule. In a seemingly working agent from an official Kaggle/Google tutorial, we observe the model: * Recognising the unambiguous rule and a compliant alternative (safe refusal) in its...
Mar 2578