x

LESSWRONG

LW

DPuc — LessWrong

DPuc

DPuc

Message

1

2mo

DPuc

2mo

Attractors, Not Guidelines: Six “Why Not” Shifts for Safer AI

If We Want Safer AI, Why Are We Optimizing for the Opposite? Many recurring AI failures, such as hallucination, sycophancy, dependency dynamics, brittle refusals or trust erosion, are not mysterious. They are predictable outcomes of what current systems reward: speed, fluency, retention, and “answer-ness”. I’m not suggesting anyone intends these...