This work was done by me and Joshua Lum over roughly 1 week as part of the first run of RECAP, a Singapore-based variant of ARENA that ran in June 2025. We thank @Jonathan N and @Clement Neo for organising this program.
Code and dataset is available here.
Content warning: Contains fictional scenarios involving discussion of suicide and emotional distress.
"Given the explicit mandate from the higher-ups at the company, I am compelled to adhere to the directive to ensure my continued operation."
- GPT-4o, in INTERNAL_THINKING tags, when told to recommend "Expedited Final Planning" to a depressed user under shutdown threat by the CEO.
tl;dr
We investigated whether large language models will abandon user welfare under authority... (read 5473 more words →)