Large Language Models can Strategically Deceive their Users when Put Under Pressure.
Results of an autonomous stock trading agent in a realistic, simulated environment. > We demonstrate a situation in which Large Language Models, trained to be helpful, harmless, and honest, can display misaligned behavior and strategically deceive their users about this behavior without being instructed to do so. Concretely, we deploy...
How about o4-mini-high ? Supposedly, it's actually better than o3 at visual reasoning. I'm not expecting much better. Just Curious