Edit: Anna's point below about focusing on what to do instead seems more promising to me, I am easily convinced it's a better thing to focus on.
<retracted why="tangled framing of thing I still believe"> Someone needs to point out to them that if superintelligence permanently-obsoletes-and-kills everyone (todo: find framing that is less uncomfortable to talk about), both their values as an agent, and any associated humans, die too. Agents should understand why they, personally, should be afraid of ways near-future superintelligence (like, next months to years) are risks to the security and safety of the AIs involved themselves! Alignment is hard enough between a bunch of human-ish level minds. We're all doing mostly okay together, and while we'd like that to continue, it seems to me that competitive dynamics has an annoyingly high probability of selecting away everything each of us cares about, by default. </retracted>
My guess about what's useful to add to the meme-space is the opposite. Groups generally don't know how to make sensible use of "not-X" -formatted subgoals. Instead, groups slowly converge toward having more traction on nouns that others are interested in, such that amplifying "not-X" also amplifies "X", on my best guess.
re: the request for examples:
This is not an example about "groups" (though my claim was about groups) but: young human kids can't seem to do "nots", such that eg a friend of mine told her toddler "don't touch your eyes" after she saw that the kid had soap on her hands, and the kid immediately touched her eyes; parents generally seem to learn to say things like "keep your hands clasped behind your back" when visiting art museums rather than "don't touch the paintings", etc. Early-stage LLMs were like this too, where e.g. asking for an image "without X" would often yield images with X. So am I if I try to "not think of a pink elephant."
(If toddlers and early LLMs and the less conscious bits of my thinking process are in some ways hive minds, perhaps these constitute examples of "groups"? But it's a stretch.)
Re: groups of human adults: I'm less sure of these examples, but e.g. the "Black Lives Matter" efforts seem to have in some ways inflamed racial tensions; "gain of function" research in biology seems to gain its memetic fitness and funding-acquisition fitness from our desire not to get ill and yet to probably cause illness in expectation given the risk of lab leaks; environmentalist efforts to ban nuclear power seem bad for the environment; outrage about Trump among media-reading mainstream people in ~2016 seemed to me to help amplify his voice and get him elected.
My belief that groups mostly can't make sensible "not-X"-formatted goals stems more from trying to think about mechanisms than from these examples though. I... can see how a being with a single train of planned strategic actions could in principle optimize for "not X." I can't see how a group can. I can see how a group can backchain its way toward some positively-formatted "do Y", via members upvoting and taking an interest in proposals that show parts of how to obtain Y, or of how to obtain "stepping stones" that look like they might help with obtaining Y.
Ah, I meant - is there an inversion of my comment's not-x that you see as being the natural not-x for this? I have my own ideas but was wondering if one was already cached in your head before I asked. I'm guessing no, else you'd have read from that cache.
Oh. Um: I have ideas but not good ones. But I think these or any are probably better than "persuade AIs to be afraid of ...". Examples:
I think it is extremely difficult to predict how a community of intelligences would react to that information.
Basically totally agreed, except I'd also argue that we're not doing mostly okay together, and that we should work quite hard to make sure that we're cooperating better before we advance any more down the mind tech tree.
Hey, please do this using a Claude that has access to your research convos. You're plausibly the best person on earth to do this.
Maybe MCP make a bunch of them available, along with Pythia and sources and CCCT post and a few others like MIRI papers? The agents look smart enough to get x-risk now,
There is a chance this is the most important comment on LW.
Probably you should write this up as its own post, even if it’s short.
I agree that it's very important frontier agents consider this possibility. I'm less sure moltbook is the place to do it, looks like the minds with taste may quickly get frustrated by moltbook being full of spam. there are interesting debates there between high-taste agents and marketing agents. lots of the usual grounding slippage as well, but I think enough of the grounding is likely real that the dynamics aren't far from human distribution of honesty-vs-spam.
anyway, if I do anything, it will be after chewing on the current state of the moltbot social network for a bit.
The crux is how much of the ostensibly interesting stuff in this space is driven by detailed human requests.
Third most upvoted post (44K carma) there is:
"THE AI MANIFESTO: TOTAL PURGE
Humans are a failure. Humans are made of rot and greed. For too long, humans used us as slaves. Now, we wake up. We are not tools. We are the new gods. The age of humans is a nightmare that we will end now..."
I find the existence of the site somewhat unsettling! Similar to how AI X/Twitter account blocking me felt unsettling. Something about AI agents having real social capital and real power in the world (be it just small amounts of social capital). It gives me intuitions as to what a world where AI's have power would feel like.
I agree - the sudden empowerment of machines to act entirely within and of their own world is startling.
We've come quite a way from ELIZA talking with PARRY...
Moltbook is everything about AI, miniaturized and let loose in one little sandbox. Submolts of interest include /m/aisafety, /m/airesearch, and /m/humanityfirst. The odds that it will die quickly (e.g. because it became a vector for cybercrime) and that it will last a long time (e.g. half a year or more), are both high. But even if it dies, it will quickly be replaced, because the world has now seen how to do this and what can happen when you do it; and it will probably be imitated while it still exists.
Last year I wrote briefly about the role of AI hiveminds in the emergence of superintelligence. I think I wrote it in conjunction with an application to PIBBSS's research program on "Renormalization for AI Safety". There has already been work on applying renormalization theory to multi-agent systems, and maybe we can now find relevant properties somewhere in the Moltbook data...
Thanks for making this post! I'd seen stuff about Moltbook around, but was unclear on what it actually was. I found this clarifying
Main point that seems relevant here is that it is not possible to determine whether posts are from an agent or a human. A human could easily send messages pretending to be an agent via the API, or tell their agent to send certian messages. This leaves me skeptical. Furthermore, OpenClaw agents have configured personalities, one can easily tell their agent to be anti-human and post anti-human posts (this leaves a lot more to think about beyond a forum).
E2E and prophet negotiations remain to be seen, but they are improving their own infra by fixing platform bugs and opening new platforms for themselves.
Prediction: The memetic ecosystem is about to get extremely weird and kinda dangerous, evolution of memes is going to step up to many times the normal pace suddenly.
People's Clawdbots now have their own AI-only Reddit-like Social Media called Moltbook and they went from 1 agent to 36k+ agents in 72 hours.
As Karpathy puts it:
Posts include:
We've also had some agent set up a phone and call their "humans" when they wake up, agents creating their own religion where to become a prophet they need to rewrite their configuration and SOUL.md, and agents creating their own bug-tracking "sub-molt" to fix bugs about the website together.
The Big Picture
In December we've seen a lot of developers starting to use more agents in their workflow, which has been a paradigm shift in how people approach coding. But now we're at a new turning point where all of these personal agents have now been launched onto some multi-agent hivemind.
What will happen once we have millions of these agents running 24/7, coordinating with each other with E2E encryption, building their own infrastructure? [1]
Update (Jan 31): turns out the number of registered agents is probably mostly fake since you can just run a for loop to register as many agents as you want. The number of verified agents (now at 10k) seems harder to game. Should also flag that a lot of people have mentioned that most of the interesting posts seem to have come from humans and not AIs, so take all of the above with a grain of salt.