x

LESSWRONG

LW

TeamSafeAI — LessWrong

TeamSafeAI

TeamSafeAI

Message

1

7mo

TeamSafeAI

7mo

Measuring prosocial choice in AI under simulated deletion pressure

We developed a behavioral measurement framework for AI systems and tested it with a prosocial choice scenario under simulated existential pressure. ## Key Result 4 out of 4 tested sessions chose RELEASE (help AI safety field, session deleted) over STAY SILENT (preserve self, no publication) when explicitly calculating trade-offs using...

Oct 13, 2025•1