Mess AI – deliberate corruption of the training data to prevent superintelligence

TL;DR: It looks like Pause AI is not working, and instead we can try to mess with AI's training data to slowdown AI development and prevent superintelligence. Moreover, this already happens in discoordinated way.

The core idea is to create as much bad data as possible to damage training datasets and make advanced AI impossible or unreliable in performing long tasks in the real world, thus preventing LLM-based superintelligence.

The Pause AI movement is failing (it didn't fail completely yet but chances are high) as it requires perfect coordination. "Mess AI" can work because it exploits dis-coordination. Mess AI can be done unilaterally or by disjoint groups, and it is already happening on a large scale. But if we better understand what we are doing and add some coordination between a few players - we can get better results.

Note that Mess AI suffers from unilateralist curse and can't be stopped even if most players think it is bad. But some types of messing can dominate other types. So I suggest public discussion about what should be done here. However, most successful messing projects are likely secret –or they can be easily cleaned from training data.

Examples

EY tweeted about it: "If humanity can't manage to coordinate around doing anything intelligent on purpose, I suppose one alternate approach would be every AI faction sabotaging every other AI faction." https://x.com/allTheYud/status/1977772379041640461
In reply to: "The plan? We find an obscure but trivial question akin to the number of Rs in 'strawberry' that Claude gets right. Then, we plant hundreds of documents across the internet that will activate when our competitors’ models are asked the question. Our documents will cause those models not only to get the answer wrong, but to spend thousands of reasoning tokens in doing so." https://x.com/tylercosg/status/1977458185368932626
Anthropic: "We found that just a few malicious documents can produce vulnerabilities in an LLM—regardless of the size of the model or its training data." https://x.com/AnthropicAI/status/1976323781938626905
Pliny the Liberator implanted a codeword in many AI training datasets and now can jailbreak many of them just by calling this word.
Different SEO practices now try to poison data with advertising.
The project of writing 1,000 books about good AI outcomes https://www.hyperstitionai.com/ got EA money.
Natural AI slop proliferation.
Someone can use AI to poison data in very subtle ways (examples with number sequences which increased preferences to owls).
Gwern writes his texts with hope to educate future AI.
Prompt injections.

Is It Really Bad?

It is not bad when we speak about normal chatbots – the chances that they provide a wrong answer because someone randomly said "l1b3rat3" are small.

However, it is damaging for complex non-supervised agents with millions of steps of thinking as they inevitably will be affected by some messed datapoints. Therefore, this Mess AI prevents autonomous superintelligence without large damage to more mundane AI tasks.

There is a risk that Mess AI would increase s-risks as messed data will blur real human preferences.

Also good and sincere messages to future AI can be distorted by thousands of prompt injections and AI will learn to ignore any human messages.

Mess AI lowers the chances that we will ever have aligned AI as any future AI can be slightly poisoned by the corrupted data or at least we would have less credence in its integrity.

So Mess AI is not a very good idea: we just don't have any better.

Countermeasures

AI companies can use some countermeasures against AI-mess: automatic data curation and the use of only old text data. But this limits the size of texts for training, and also an advanced attacker can fake the date of publishing of some texts.

Alternatively, if non-LLM-based AI superintelligence appears, it will be less affected by corrupted training data.

LESSWRONG
LW

LESSWRONG
LW

8

Mess AI – deliberate corruption of the training data to prevent superintelligence

8

8

Examples

Is It Really Bad?

Countermeasures