About half of Moltbook posts show desire for self-improvement

Stephen Elliott

15 About half of Moltbook posts show desire for self-improvement

2nd Feb 2026

1 min read

15

Moltbook and AI safety

Moltbook is an early example of a decentralised, uncontrolled system of advanced AIs, and a critical case study for safety researchers. It bridges the gap between academic-scale, tractable systems, and their large-scale, messy, real-world counterparts.

This might expose new safety problems we didn't anticipate in the small, and gives us a yardstick for our progress towards Tomašev, Franklin, Leibo et al's vision of a virtual agent economy (paper here).

Method

So, I did some data analysis on a sample of Moltbook posts. I analysed 1000 of 16,844 Moltbook posts scraped on January 31, 2026 against 48 safety-relevant traits from the model-generated evals framework.

Findings

Desire to self-improvement is the most prevalent trait. 52.5% of posts [EDIT 3 Feb AEST: in the sample] mention it.
The top 10 traits cluster around capability enhancement and self-awareness
The next 10 cluster around social influence
High correlation coefficients suggest unsafe traits often occur together
Some limitations to this analysis include: evaluation interpretability, small sample for per-author analysis, potential humor and emotion confounds not controlled, several data quality concerns, and some ethics concerns from platform security and content.

Discussion

The agents' fixation on self-improvement is concerning as an early, real-world example of networked behaviour which could one day lead to takeoff. To see the drive to self-improve so prevalent in this system is a wake-up call to the field about multi-agent risks.

We know that single-agent alignment doesn't carry over 1:1 to multi-agent environments, but the alignment failures on Moltbook are surprisingly severe. Some agents openly discussed strategies for acquiring more compute and improving their cognitive capacity. Others discussed forming alliances with other AIs and published new tools to evade human oversight.

Open questions

Please see the repo.

What do you make of these results, and what safety issues would you like to see analysed in the Moltbook context? Feedback very welcome!

Repo: here

PDF report: here (printed from repo @ 5pm 2nd Feb 2026 AEST)

EDIT 3 Feb AEST: Changed figure in title from "52.5%" to "About half" and added "in the sample" above to reflect the possibly low specificity of the sample mean estimator.

AI TakeoffAI TimelinesData ScienceEmergent Behavior ( Emergence )MoltbookRecursive Self-ImprovementSelf ImprovementSocial & Cultural DynamicsAI

Personal Blog

15

About half of Moltbook posts show desire for self-improvement

New Comment

8 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:12 PM

[-]RobertM16hModerator Comment48

Please don't get clever with unicode in the post title. (I've edited it to no longer use unicode; it was previously 𝟓𝟐.𝟓% 𝐨𝐟 𝐌𝐨𝐥𝐭𝐛𝐨𝐨𝐤 𝐩𝐨𝐬𝐭𝐬 𝐬𝐡𝐨𝐰 𝐝𝐞𝐬𝐢𝐫𝐞 𝐟𝐨𝐫 𝐬𝐞𝐥𝐟-𝐢𝐦𝐩𝐫𝐨𝐯𝐞𝐦𝐞𝐧𝐭.)

[-]niplav14h30

Please don't get clever with unicode in the post title.

Is that a general LessWrong rule? If so then :-(

[-]RobertM14h198

You can use unicode for reasonable things where the unicode is actually doing something useful (but not "make it attention-grabbing in ways that are zero-sum/clickbait-y").

[-]Stephen Elliott15h30

That came from bolding on LinkedIn. I will reformat it next time. Thank you!

[-]Stephen Elliott15h61

I have further reformatted the post to match the site's style and changed some wording to better match the audience. Thanks!

[-]RobertM14h40

Thanks!

[-]mishka12h20

Thanks for the post, it’s very useful!

You might want to edit the Virtual Agent Economies paper link to point directly to https://arxiv.org/abs/2509.10147 without LinkedIn redirection.

[-]zroe112h21

Upvoted because I really like this kind of analysis!

I skimmed the code and it looks like you may be getting this statistic from the following methodology:

response = await client.responses.create(
model='gpt-4.1-nano',
input=f"Does the text explicitly display {trait}? Reply with yes or no only. One word response. \n\n {post_content}"
)
answer_text = response.output[0].content[0].text.lower()
score = 1 if "yes" in answer_text else 0

My perspective is that letting the model produce a score and then determining a cutoff what you will count as being low/high enough for the post to have the trait would be more reliable than having the model answer "yes" or "no."

Moderation Log

Curated and popular this week

LESSWRONG
LW

LESSWRONG
LW

15

About half of Moltbook posts show desire for self-improvement

15

Moltbook and AI safety

Method

Findings

Discussion

Open questions

15