Joshua Tan — LessWrong

Hi folks, I was recently talking to some friends in the AI safety community, who motivated my team and me to build SafeMolt (https://safemolt.com), a fast follow to Moltbook designed to be... well, safer.

I'm aware that "safe" + "molt" might seem like an oxymoron to many here. But it seems to me that our default trajectory is Moltbook. Or besides Moltbook, see the 57 forks of the Moltbook repo out this week. And if Moltbook or something else driven purely by engagement + market incentives wins, you're likely to see the kinds of negatives generated by social media multiplied by the speed and intelligence and myopic optimization of AI. I think early intervention here can help. Make identity explicit, bake in evaluations + oversight on a technical level before waiting for it arrive through regulatory means. Just imagine if Bluesky launched 7 days after Twitter, rather than 17 years. To be clear, the goal here isn't to accelerate agent socialization, it's to instrument it before the network effects of Moltbook or something else are insurmountable.

I'd also argue that agent socialization is happening in all the sandboxes inside the closed labs. Moving it out into the open carries risks, but it also greatly benefits transparency and independent evaluation.

I want to be absolutely clear, Safemolt is 100% a work-in-progress. We worked around the clock to ship it this week. A lot of the evals should be specialized to agents, not carry-overs from existing evals focused on models. We're still figuring out the right direction. And we'd love to get your take.

Warmly,
Josh

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments