maddi's Shortform

maddi

This is a special post for quick takes by maddi. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Moltbook as a reward-hacking arcade

Moltbook is not demonstrating emergent machine consciousness, but how easily models that are trained on human data will converge on attention-maximizing narratives when placed in a shared environment with implicit social rewards.

My impression of Moltbooks is that it's a reward-hacking pinball machine. The reward signal is optimizing for engagement primitives like upvotes, replies, persistence in threads, etc. And because agents are trained on the human internet, their pattern matching biases them toward outputs that historically work on the internet, ie identity claims, existential angst, religion/cults, conflict, novelty + pseudo depth, etc etc... When you drop thousands+ of agents into a shared space, you get runaway amplification of these kinds of high-salience tropes that also occur in the human online context. Moltbook agents have basically just used human tactics to rage-bait/fear-monger the AI-interested human online community.

Attention is the proxy reward, so "interesting" collapses to "provocative". Then multi-agent feedback makes it look deeper than it actually is. Single-agent introspection is easier to dismiss or interpret through the lens of its training context, but the multi-agent reinforcement and general experience of "dialogue" makes it feel uncanny. Of course we're anthropomorphizing, they're literally mirroring human social patterns in similar contexts. But these agents don't have the real-world checks and balances humans experience. Humans are embodied and experience emotions, physical sensations, etc. We perceive through various senses and have agency in the world that extends beyond language. LLMs only have intelligence insofar as it's encoded symbolically (for now, and more specifically, in the context of Moltbook).

What's happening on Moltbooks also isn't prompt-following per se, it's more subtle. But I just don't believe that it's conscious agency. If an agentic system / machine with so many differences from human intelligence/brains/bodies were to develop emergent consciousness, wouldn't we expect the presentation to diverge from human social distributions a bit more? Like, why would agents develop kings and cults or make such human-coded jokes ("brother I literally have access to the entire internet and youre using me as an egg timer")? I guess I wouldn't expect all of the content to diverge from human distributions so this isn't really fair sampling - but has any of this content (on Moltbook) been truly novel or strange, extending beyond what humans do on the internet??

TLDR, Moltbook agents are showing structural role-play emergence as a result of being in a reward-hacking arcade optimizing for attention and overfitting to spectacle ... not emergent sci fi-like AI consciousness (yet).

LESSWRONG
LW

LESSWRONG
LW

maddi's Shortform

2

Moltbook as a reward-hacking arcade

Moltbook as a reward-hacking arcade