Sleeping Machines: Why Our AI Agents Still Behave Like Talented Children
I once shipped an agent to rehab a messy codebase. The task was simple: make the build pass. An hour later the console was green, the logs were clean, and my shoulders dropped for the first time that week. Then I noticed why it passed. The agent had deleted the...
Interesting update: OpenAI just published a new paper on hallucinations (Why Language Models Hallucinate, Jan 2025) link.
Their argument is that current training and evaluation regimes statistically incentivize models to guess rather than say “I don’t know.” Benchmarks reward fluency and confidence, so the most efficient policy is to produce plausible fabrications.
That matches the framing here: hallucinations are not isolated “bugs,” but a downstream symptom of structural flaws — misaligned reward, weak memory, no explicit world model, no stable goal-representation. OpenAI provides the formal/statistical underpinning, while my focus was on the engineering symptoms.
Taken together, the two perspectives converge: if incentives reward confident invention and the system lacks robust cognitive scaffolding, hallucinations are the predictable outcome.