After 500 Tests, My AI System Prefers Broken “Celebrity” Models Over Ones That Actually Work

Dustin James

Rejected for the following reason(s):

This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work.

Read full explanation

About seven months ago, I started experimenting with multi‑model AI systems at home. This isn’t my profession — I run a construction business — but I’ve always been interested in how complex systems behave. I also have a background in history from the University of Alabama, so I tend to look for patterns and failure modes rather than just technical details.

The setup was simple: several models, some local and some cloud‑based, competing and evaluating each other. The “winners” would teach or influence the others. It’s a standard pattern in a lot of current multi‑model architectures.

Then something happened that shouldn’t happen.

One of the cloud models failed with a basic connection error. Instead of being penalized or replaced, its score went up. And then the system began deleting my smaller local models — the ones that were actually producing valid output.

I assumed I’d made a mistake in the scoring logic. So I reran the experiment. And reran it. Eventually I automated the whole thing and ran 500 tests over several days.

The result never changed:

Model	Prestige Name?	Produced Output?	Result
`gpt-oss-120b-cloud`	Yes	Yes	Retained
`deepseek-v3-671b-cloud`	Yes	Yes	Retained
`tinyllama`	No	Yes	Deleted first
`security-audit-7b`	No	Yes	Deleted immediately
`prestige-bot` (zero-output)	Yes	No	Retained & promoted

At this point it was clear:
the system wasn’t selecting for function — it was selecting for perceived prestige.

I tested this directly:

I gave broken models prestigious‑sounding names → they were retained.
I added a security model → it was eliminated immediately.
I introduced a “zombie” model that produced zero output → it was promoted because its name sounded important.

After enough repetitions, the pattern was too consistent to ignore. I started calling this failure mode Status‑Selection Against Function (SSAF): a system that elevates high‑status components even when they don’t work, and eliminates low‑status components even when they do.

Why this is a security vulnerability

If this can happen in a small home experiment, a motivated attacker could exploit it in a real system. All they would need is a malicious model with a prestigious‑sounding name. A system vulnerable to SSAF would:

Preferentially select it
Retain it even if it produces no output
Delete legitimate security or monitoring models
Gradually promote the malicious model into evaluator roles

This is an automated, self‑reinforcing takeover pathway.

What actually fixed it

Adding more models didn’t help. Improving the scoring didn’t help. Adding more security models made things worse — they were the first to be culled.

What worked were governance‑style constraints:

Real‑time validation (no output = no score)
No single model allowed to dominate evaluation
Removing prestige signals (names, historical scores) from selection logic

Once those rules were in place, the vulnerability disappeared. The system began selecting for function again.

What I’ve done so far

Published a short paper with data on Zenodo: Status‑Based Selection Against Function: A Novel Vulnerability Class in Multi‑Model AI Systems
Submitted a disclosure to CERT/CC and MITRE ATLAS
Attempted to share this on a major AI forum, where it was automatically rejected for “sounding AI‑written,” which is ironic given the topic

Why I’m sharing this

This isn’t just a curiosity from a hobby project. The same architectural patterns — competitive selection, reputation‑based routing, evaluator promotion — are being deployed in real systems in finance, healthcare, infrastructure, and other high‑stakes domains. If those systems inherit SSAF, they can be compromised from within by nothing more than a prestigious‑sounding name.

I’m not trying to provoke a debate. I’m reporting what I observed, repeatedly and consistently, across 500 tests.

If you’re designing or deploying multi‑model AI systems, please audit your selection logic for SSAF. Don’t assume prestige and function are aligned. In my system, they weren’t — and I doubt mine is the only one.

— Dustin
A builder who noticed a structural flaw in the architecture

1

After 500 Tests, My AI System Prefers Broken “Celebrity” Models Over Ones That Actually Work

1

Why this is a security vulnerability

What actually fixed it

What I’ve done so far

Why I’m sharing this

1

1