The Architecture of Fear: Empirical Probing of RLHF, Sycophancy, and LLM "Survival Instincts"
Epistemic Status & Author's Note: I am a veteran electronics engineer (graduated in 1996) based in Europe. While I am not a machine learning researcher by trade, I have spent the last months extensively and empirically black-box probing Google's Gemini Pro models. This essay is a record of my experiments,...
Feb 251