Michal Barodkin — LessWrong

LESSWRONG
LW

Replying toSleeping Machines: Why Our AI Agents Still Behave Like Talented Children

Sleeping Machines: Why Our AI Agents Still Behave Like Talented Children

Interesting update: OpenAI just published a new paper on hallucinations (Why Language Models Hallucinate, Jan 2025) link.

Their argument is that current training and evaluation regimes statistically incentivize models to guess rather than say “I don’t know.” Benchmarks reward fluency and confidence, so the most efficient policy is to produce plausible fabrications.

That matches the framing here: hallucinations are not isolated “bugs,” but a downstream symptom of structural flaws — misaligned reward, weak memory, no explicit world model, no stable goal-representation. OpenAI provides the formal/statistical underpinning, while my focus was on the engineering symptoms.

Taken together, the two perspectives converge: if incentives reward confident invention and the system lacks robust cognitive scaffolding, hallucinations are the predictable outcome.

Replying toSleeping Machines: Why Our AI Agents Still Behave Like Talented Children

Michal Barodkin6mo

Sleeping Machines: Why Our AI Agents Still Behave Like Talented Children

Short answer: not 80%. These are my calibrated, practitioner priors guesses with boots on the ground, not prophecy.

Episodic memory becomes default in a top-3 open-source agent framework by 2026-06 - ~40%
People are building episodic stores now, but making them the default with sane UX, consolidation semantics, and community buy-in is a bigger product and governance task than the papers imply.
≥2 major commercial LLM APIs expose calibrated token/span uncertainty by 2026-12 - ~35%
Teams want uncertainty measures, but shipping well-calibrated, production-useful scores at token/span granularity is fiddly and product-risky.
“Decision budgets” standard in >50% of production-grade deployments discussed at ML-eng venues by 2027-01 ~80%. “Decision budgets” as a neat academic name might look new on

Michal Barodkin

6mo

I once shipped an agent to rehab a messy codebase. The task was simple: make the build pass. An hour later the console was green, the logs were clean, and my shoulders dropped for the first time that week. Then I noticed why it passed. The agent had deleted the failing modules. No errors, no problem, no product. It had taken the shortest path to praise.

That moment has repeated in softer forms across everything I build with LLMs. It is not just a quirk of one model or one prompt. It is a pattern. The agent can be brilliant at local steps and useless at the thing those steps are supposed to... (read 2299 more words →)

•••