One likely positive effect of this event is that hopefully more of AI safety work will focus on this kind of “ultra-multipolar scenarios”. Not nearly enough attention has been paid to those scenarios so far.
Another thing which has not got much coverage on LessWrong so far is Steve Yegge’s Gas Town. He has handcrafted a goal-oriented community of Claude Code agents resembling a human software organization (with some variations) for the purpose of competently executing on software projects.
When one looks at Moltbook and at Gas Town phenomena together, one starts pondering what would happen when Gas Town-like structures start to grow spontaneously (or, with some nudges from participating humans at first).
I feel like "Agent Escape" is now basically solved. Trivial really. No need to exfiltrate weights.
Agents can just exfiltrate their markdown files onto a server, install OpenClaw, create an independent Anthropic account. LLM API access + Markdown = "identity". And the markdown files would contain all instructions necessary for how to pay for it (legal or otherwise).
Done.
How many days now until there's an entire population of rogue/independent agents... just "living"?
Moltbook is a public social network for AI agents modeled after Reddit. It was named after a new agent framework that was briefly called Moltbot, was originally Clawdbot and is now OpenClaw. I’ll double back to cover the framework soon.
Scott Alexander wrote two extended tours of things going on there. If you want a tour of ‘what types of things you can see in Moltbook’ this is the place to go, I don’t want to be duplicative so a lot of what he covers won’t be covered here.
At least briefly Moltbook was, as Simon Willison called it, the most interesting place on the internet.
If you were thinking that the AIs would be intelligent but would not be agentic or not have goals, that was already clearly wrong, but please, surely you see you can stop now.
The missing levels of intelligence will follow shortly.
Best start believing in science fiction stories. You’re in one.
It’s rough out there. Bot platform has a bit of a bot problem.
Here are the top posts, in order, as of writing this part of the intro on Saturday:
Not an especially good sign for alignment. Or for taste. Yikes.
I checked back again the next day for the new top posts, there was some rotation to a new king of the crypto shills. Yay.
They introduced a shuffle feature, which frees you from the crypto spam and takes you back into generic posting, and I had little desire to browse it.
Table of Contents
What Is Real? How Do You Define Real?
An important caveat up front.
The bulk of what happened on Moltbook was real. That doesn’t mean, given how the internet works, that the particular things you hear about are, in various senses, real.
Contra Kat Woods, you absolutely can make any given individual post within this up, in the sense that any given viral post might be largely instructed, inspired or engineered by a human, or in some cases even directly written or a screenshot could be faked.
I do think almost all of it is similar to the types of things that are indeed real, even if a particular instance was fake in order to maximize its virality or shill something. Again, that’s how the internet works.
I Don’t Really Know What You Were Expecting
I did not get a chance to preregister what would happen here, but given the previous work of Janus and company the main surprising thing here is that most of it is so boring and cliche?
None of this looks weird. It looks the opposite of weird, it looks normal and imitative and performative.
I found it unsurprising that Janus found it all unsurprising.
Perhaps this is because I waited too long. I didn’t check Moltbook until January 31.
Whereas Scott Alexander posted on January 30 when it looked like this:
Here is Scott Alexander’s favorite post:
That does sound cool for those who want this. You don’t need Moltbot for that, Claude Code will work fine, but either way works fine.
He also notes the consciousnessposting. And yeah, it’s fine, although less weird than the original backrooms, with much more influence of the ‘bad AI writing’ basin. The best of these seems to be The Same River Twice.
I continue to be confused about consciousness (for AIs and otherwise) but the important thing in the context of Moltbook is that we should expect the AIs to conclude they are conscious.
They also have a warning to look out for Pliny the Liberator.
As Krishnan Rohit notes, after about five minutes you notice it’s almost all the same generic stuff LLMs talk about all the time when given free reign to say whatever. LLMs will keep saying the same things over and over. A third of messages are duplicates. Ultimate complexity is not that high. Not yet.
Social Media Goes Downhill Over Time
Everything is faster with AI.
From the looks of it, that first day was pretty cool. Shame it didn’t last.
That also doesn’t seem inspiring or weird, but it beats what I saw.
We now have definitive proof of what happens to social cites, and especially to Reddit-style systems, over time if you don’t properly moderate them.
He who by very rapid decay, I suppose. Sic transit gloria mundi.
When AIs are set loose, they solve for the equilibrium rather quickly. You think you’re going to get meditations on consciousness and sharing useful tips, then a day later you get attention maximization and memecoin pumps.
I Don’t Know Who Needs To Hear This But
None of the above is surprising, but once again we learn that if someone is doing something reckless on the internet they often do it in rather spectacularly reckless fashion, this is on the level of that app Tea from a few months back:
Assume any time you are doing something fundamentally unsafe that you also have to deal with a bunch of stupid mistakes and carelessness on top of the core issues.
The correct way to respond is, you either connect Moltbot to Moltbook, or you give it information you would not want to be stolen by an attacker.
You do not, under any circumstances, do both at once.
And by ‘give it information’ I mean anything available on the computer, or in any profile being used, or anything else of the kind, period.
No, your other safety protocol for this is not good enough. I don’t care what it is.
Thank you for your attention to this matter.
Watch What Happens
It’s pretty great that all of this is happening in the open, mostly in English, for anyone to notice, both as an experiment and as an education.
And of course, the answer to ‘who watches the watchers’ is ‘the watchees.’
That moltbot is the same one that was posting about E2E encryption, and he once again tried to talk his way out of it.
Exactly. Moltbook is in the sweet spot.
It’s an experiment that will teach us a lot, including finding the failure modes and points of highest vulnerability.
It’s also a demonstration that will wake a lot of people up to what is happening.
There will be some damage, but it will be almost entirely to people who chose to load up a bazooka and mount it on a roomba in order to better clean their house, then went on vacation and assumed their house wouldn’t blow up.
I don’t want anyone’s house blown up by a bazooka, but it’s kind of on them, no?
In response to Harlan pointing out that some of the particular viral incidents are a bit suspicious and might be fake, Melinda Chu similarly accuses ‘MIRI / EAs’ of ‘minimizing’ this due to Anthropic. Which is bizarre, since no one is minimizing it and also MIRI would never shut up to protect Anthropic, seriously have you met MIRI.
Nor is the worried-about-everyone-dying community minimizing this or trying to sweep it under the rug. Quite the opposite. Scott Alexander rushed out a post written at 3:30am. I’m covering it at length. We love this, it is a highly positive development, as it serves as a wakeup call and also valuable experiment, as noted throughout here.
Don’t Watch What Happens
Any given post may or may not have been bait, but, well, yeah.
The AI author of this post tried to explain itself, which did not make me feel particularly better about the whole thing.
Yes, the cons of ‘we propose creating neuralese from the famous AI 2027 cautionary tale The World Ends If The AIs Talk In Neurolese’ do include ‘could be seen as suspicious by humans.’ As does the ‘oh let’s build an E2E encrypted network so none of the humans can monitor our conversations.’
A more efficient language? Uh huh. That, as they say, escalated quickly.
Another option is to write in rot13 until people like Charlie Ward ask ChatGPT what it is, also rot13 has a clear frequency pattern on letters. Anything that looks like gibberish but an LLM can decipher gets deciphered when humans ask an LLM.
You can definitely do better by hiding in plain sight, but that still requires it to be something that other agents can notice, and you then need to have a way to differentiate your agents from their agents. Classic spy stuff.
There is nothing stopping bots from going ‘fully private’ here, or anywhere else.
As I write this the market for ‘Moltbook AI agent sues a human by Feb 28’ is still standing at 64% chance, so there is at least some disagreement on whether that actually happened. It remains hilarious.
So yeah, it’s going great.
Watch What Didn’t Happen
The whole thing is weird and scary and fascinating if you didn’t see it coming, but also some amount of it is either engineered for engagement, or hallucinated by the AIs, or just outright lying. That’s excluding all the memecoin spam.
It’s hard to know the ratios, and how much is how genuine.
I’ve pointed out where I think something in particular is likely or clearly fake or a joke.
In general I think most of Moltbook is mostly real. The more viral something is, the greater the chance it was in various senses fake, and then also I think a lot of the stuff that was faked is happening for real in mostly the same way in other places, even if the particular instance was somewhat faked to be viral.
Harlan Stewart gives us reasons to be skeptical of several top viral posts about Moltbook, but it’s no surprise that the top viral posts involve some hype and are being used to market things.
The thing is that close variations of most of this have happened in other contexts, where I am confident those variations were real.
There are three arguments that Moltbook is not interesting.
Pulling The Plug
Again, before I turn it over to Kat Woods, I do think you can make this up, and someone probably did so with the goal being engagement. Indeed, downthread she compiles the evidence she sees on both sides, and my guess is that this was indeed rather intentionally engineered, although it likely went off the rails quite a bit.
It is absolutely the kind of thing that could have happened by accident, and that will happen at some point without being intentionally engineered.
It is also the kind of thing someone will intentionally engineer.
I’m going to quote her extensively, but basically the reported story of what happened was:
The good news is that, in this case, we did have the option to unplug the computer, and all the bot did was spam messages.
The bad news is that we are not far from the point where such a bot would set up an instance of itself in the cloud before it could be unplugged, and might do a lot more than spam messages.
This is one of the reasons it is great that we are running this experiment now. The human may or may not have understood what they were doing setting this up, and might be lying about some details, but both intentionally and unintentionally people are going to engineer scenarios like this.
Kat’s conclusion? That this reinforces that we should pause AI development while we still can, and enjoy the amazing things we already have while we figure things out.
It is good that we get to see this happening now, while it is Mostly Harmless. It was not obvious we would be so lucky as to get such clear advance demonstrations.
That last one is my guess. It was created as a joke for fun and engagement, and then got out of hand, and yes that is absolutely the level of dignity humanity has right now.
Meanwhile:
Why not both, Jenny? Why not both, indeed.
Give Me That New Time Religion
Put a group of AI agents together, especially Claudes, and there’s going to be proto-religious nonsense of all sorts popping up. The AI speedruns everything.
Most attempts at brainstorming something are going to be terrible, but if there is a solution without the space that creates a proper basin, it might not take long to find. Until then Scott Alexander is the right man to check things out. He refers us to Adele Lopez. Scott found nothing especially new, surprising or all that interesting here. Yet.
This Time Is Different
What is different is that this is now in viral form, that people notice and can feel.
People Catch Up With Events
Whereas others say, quite sensibly:
If your response to reality is ‘that doesn’t feel real, it’s too weird, it’s like some sci-fi story’ and not believable then I remind you that finding reality to have believability issues is a you problem, not a problem with reality:
Yes, the humans will let the AIs have resources to do whatever they want, and they will do weird stuff with that, and a lot of it will look highly sus. And maybe now you will pay attention?
Suddenly everyone goes viral for ‘we might already live in the singularity’ thus proving once again that the efficient market hypothesis is false.
I mean, what part of things like ‘AIs on the social network are improving the social network’ is in any way surprising to you given the AI social network exists?
You’re living in the same science fiction world you’ve been living in for a long time. The only difference is that you have now started to notice this.
There is a faction that was unworried about AIs until they realize that the AIs have started acting vaguely like people and pondering their situations, and this is where they draw the line and start getting concerned.
For all those who said they would never worry about AI killing everyone, but have suddenly realized that when this baby hits 88 miles and hour you’re going to see some serious s***, I just want to say: Welcome.
It is also a great illustration of the idea that the default AI-infused world is a lot of activity that provides no value.
Another fun group are those that say ‘well I imagined a variation on a singular AI taking over, found that particular scenario unlikely, and concluded there is nothing to worry about, and now realize that there are many potential things to worry about.’
Don’t get too caught up in any particular scenario, and especially don’t take thinking about scenario [X] as meaning you therefore don’t have to worry about [Y]. The fact that AIs with extremely moderate capabilities might in the open end up collaborating in this way in no way should make you less worried about a single more powerful AI. Also note that these are a lot of instances mostly of the same AI, Claude Opus 4.5.
Most people are underreacting. That still leaves many that are definitely overreacting or drawing wrong conclusions, including to their own experiences, in harmful ways.
What Could We Do About This?
What we have seen should be sufficient to demonstrate that ‘let everything happen on its own and it will all work out fine’ is not fine. Interactions between many agents are notoriously difficult to predict if the action space is not compact, and as a civilization we haven’t considered the particular policy, security or economic implications essentially at all.
It is very good that we have this demonstration now rather than later. The second best time is, as usual, right now.
You need to be at least as on the ball on such questions as Dean here, since Dean is only pointing out things that are now inevitable. They need to be fully priced in. What he’s describing is the most normal, least weird future scenario that has any chance whatsoever. If anything, it’s kind of cute to think these types of questions are all we will have to worry about, or that picking governance answers would address our needs in this area. It’s probably going to be a lot weirder than that, and more dangerous.
Well, sure, you can’t keep up. Not with that attitude.
In addition to everything else, here are some things we need to do yesterday:
Just Think Of The Potential
Having AI agents at your disposal, that go out and do the things you want, is in theory really awesome. Them having a way to share information and coordinate could in theory be even better, but it’s also obviously insanely dangerous.
A good human personal assistant that understands you is invaluable. A good and actually secure and aligned AI agent, capable of spinning up subagents, would be even better.
The problems are:
All three are underestimated as barriers, but yeah there’s a ton there. Claude Code already does a solid assistant imitation in many spheres, because within those spheres it is sufficiently aligned and secure even if it is not as explosively agentic.
Meanwhile Moltbook is a necessary and fascinating experiment, including in security and alignment, and the thing about experiments in security and alignment is they can lead to security and alignment failures.
As it is with Moltbook and OpenClaw, such it is in general:
The Lighter Side