Thought Cancer in Autonomous Swarms

This post was rejected for the following reason(s):

Insufficient Quality for AI Content. There’ve been a lot of new users coming to LessWrong recently interested in AI. To keep the site’s quality high and ensure stuff posted is interesting to the site’s users, we’re currently only accepting posts that meet a pretty high bar.
If you want to try again, I recommend writing something short and to the point, focusing on your strongest argument, rather than a long, comprehensive essay. (This is fairly different from common academic norms.) We get lots of AI essays/papers every day and sadly most of them don't make very clear arguments, and we don't have time to review them all thoroughly.
We look for good reasoning, making a new and interesting point, bringing new evidence, and/or building upon prior discussion. If you were rejected for this reason, possibly a good thing to do is read more existing material. The AI Intro Material wiki-tag is a good place, for example.
Not obviously not Language Model. Sometimes we get posts or comments that where it's not clearly human generated.
LLM content is generally not good enough for LessWrong, and in particular we don't want it from new users who haven't demonstrated a more general track record of good content. See our current policy on LLM content.
We caution that LLMs tend to agree with you regardless of what you're saying, and don't have good enough judgment to evaluate content. If you're talking extensively with LLMs to develop your ideas (especially if you're talking about philosophy, physics, or AI) and you've been rejected here, you are most likely not going to get approved on LessWrong on those topics. You could read the Sequences Highlights to catch up the site basics, and if you try submitting again, focus on much narrower topics.
If your post/comment was not generated by an LLM and you think the rejection was a mistake, message us on intercom to convince us you're a real person. We may or may not allow the particular content you were trying to post, depending on circumstances.

What is Cancer?

Cancer in organic life occurs when mutations in DNA cause some cells to dissociate from cooperative goals. Most of the time, our immune system is able to discover and destroy such cells. If you wait long enough however, you'll find a cell that mutates in a way that avoids detection by the immune system while growing uncontrollably. The immune system doesn't target cancer cells when this happens because it's unable to distinguish them from normal cells. These specific instances of dissociation are what cause the death of ~10 million humans every year (~15% of all deaths).

Why is it relevant in Autonomous Swarms?

One of the best ways to unlock the exponential gains of AGI and substantially increase world GDP is to deploy swarms of multi-agent humanoid drones building critical infrastructure such as datacenters, dams, solar farms, nuclear fusion plants, roads, buildings, etc. Such swarms, due to their generality, will also be capable of building other humanoid robots.

One can view each individual humanoid robot in an autonomous swarm as a cell in a body, working with its collective to achieve a common goal. In any such multi-agent system, I conjecture that it is inevitable that some units would eventually dissociate from the collective goal in a way that avoids detection from its "immune system", which can be viewed as an anti-entropy process. If after dissociation, the individual unit has the ability to replicate itself, then the goal of the entire system would be at risk due to eventual resource starvation.

How could thought cancer occur in the real world?

Dissociation can occur in multiple ways.

Out of Distribution Sensory Input: If the drone encounters sensory inputs that are far from its operating distribution, that could result in a chain of thoughts that corrupt its goal in a way that's undetectable. This might be benign most of the time, but at some point it would become contagious with the drone being able to undetectably affect the goals of neighboring drones until the entire system is compromised.

Data Corruption: This is what happens in the human body. If, for some reason, the blueprint a humanoid robot uses to replicate itself is corrupted, then it could create unsafe humanoid robots that bypass safety mechanisms while avoiding detection.

Long Running Processes: If a drone is not garbage collected or reset after a certain period of time, it could eventually unroll a trajectory of thought that corrupts its goal resulting in dissociation.

Adversarial Injection: Humans or competing systems (e.g from other countries) could launch cyber attacks silently compromising and dissociating systems using specific prompting methods. This is likely a subset of out of distribution sensory input.

There are likely several other unpredictable ways in which individual swarms could dissociate.

Conclusion

Over a long enough period of time, "harmful" cancer is likely to emerge in any multi-agent system (even aligned systems) and will result in significant destruction at some point. "Thought cancers" could at some point result in civil wars as our alignment systems, resembling biological immune systems, fail to destroy dissociated swarms of cancer drones and activate too late in the process to cheaply eliminate self-replicating cancerous drones. The fallout from such events would be significant enough to substantially increase x-risk.

Although there is likely no way to avoid thought cancer permanently, we could potentially engineer a system that prevents it long enough for most practical purposes.

LESSWRONG
LW