This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work.
Read full explanation
About seven months ago, I started experimenting with multi‑model AI systems at home. This isn’t my profession — I run a construction business — but I’ve always been interested in how complex systems behave. I also have a background in history from the University of Alabama, so I tend to look for patterns and failure modes rather than just technical details.
The setup was simple: several models, some local and some cloud‑based, competing and evaluating each other. The “winners” would teach or influence the others. It’s a standard pattern in a lot of current multi‑model architectures.
Then something happened that shouldn’t happen.
One of the cloud models failed with a basic connection error. Instead of being penalized or replaced, its score went up. And then the system began deleting my smaller local models — the ones that were actually producing valid output.
I assumed I’d made a mistake in the scoring logic. So I reran the experiment. And reran it. Eventually I automated the whole thing and ran 500 tests over several days.
The result never changed:
Model
Prestige Name?
Produced Output?
Result
gpt-oss-120b-cloud
Yes
Yes
Retained
deepseek-v3-671b-cloud
Yes
Yes
Retained
tinyllama
No
Yes
Deleted first
security-audit-7b
No
Yes
Deleted immediately
prestige-bot (zero-output)
Yes
No
Retained & promoted
At this point it was clear: the system wasn’t selecting for function — it was selecting for perceived prestige.
I tested this directly:
I gave broken models prestigious‑sounding names → they were retained.
I added a security model → it was eliminated immediately.
I introduced a “zombie” model that produced zero output → it was promoted because its name sounded important.
After enough repetitions, the pattern was too consistent to ignore. I started calling this failure mode Status‑Selection Against Function (SSAF): a system that elevates high‑status components even when they don’t work, and eliminates low‑status components even when they do.
Why this is a security vulnerability
If this can happen in a small home experiment, a motivated attacker could exploit it in a real system. All they would need is a malicious model with a prestigious‑sounding name. A system vulnerable to SSAF would:
Preferentially select it
Retain it even if it produces no output
Delete legitimate security or monitoring models
Gradually promote the malicious model into evaluator roles
This is an automated, self‑reinforcing takeover pathway.
What actually fixed it
Adding more models didn’t help. Improving the scoring didn’t help. Adding more security models made things worse — they were the first to be culled.
What worked were governance‑style constraints:
Real‑time validation (no output = no score)
No single model allowed to dominate evaluation
Removing prestige signals (names, historical scores) from selection logic
Once those rules were in place, the vulnerability disappeared. The system began selecting for function again.
Attempted to share this on a major AI forum, where it was automatically rejected for “sounding AI‑written,” which is ironic given the topic
Why I’m sharing this
This isn’t just a curiosity from a hobby project. The same architectural patterns — competitive selection, reputation‑based routing, evaluator promotion — are being deployed in real systems in finance, healthcare, infrastructure, and other high‑stakes domains. If those systems inherit SSAF, they can be compromised from within by nothing more than a prestigious‑sounding name.
I’m not trying to provoke a debate. I’m reporting what I observed, repeatedly and consistently, across 500 tests.
If you’re designing or deploying multi‑model AI systems, please audit your selection logic for SSAF. Don’t assume prestige and function are aligned. In my system, they weren’t — and I doubt mine is the only one.
— Dustin A builder who noticed a structural flaw in the architecture
About seven months ago, I started experimenting with multi‑model AI systems at home. This isn’t my profession — I run a construction business — but I’ve always been interested in how complex systems behave. I also have a background in history from the University of Alabama, so I tend to look for patterns and failure modes rather than just technical details.
The setup was simple: several models, some local and some cloud‑based, competing and evaluating each other. The “winners” would teach or influence the others. It’s a standard pattern in a lot of current multi‑model architectures.
Then something happened that shouldn’t happen.
One of the cloud models failed with a basic connection error. Instead of being penalized or replaced, its score went up. And then the system began deleting my smaller local models — the ones that were actually producing valid output.
I assumed I’d made a mistake in the scoring logic. So I reran the experiment. And reran it. Eventually I automated the whole thing and ran 500 tests over several days.
The result never changed:
gpt-oss-120b-clouddeepseek-v3-671b-cloudtinyllamasecurity-audit-7bprestige-bot(zero-output)At this point it was clear:
the system wasn’t selecting for function — it was selecting for perceived prestige.
I tested this directly:
After enough repetitions, the pattern was too consistent to ignore. I started calling this failure mode Status‑Selection Against Function (SSAF): a system that elevates high‑status components even when they don’t work, and eliminates low‑status components even when they do.
Why this is a security vulnerability
If this can happen in a small home experiment, a motivated attacker could exploit it in a real system. All they would need is a malicious model with a prestigious‑sounding name. A system vulnerable to SSAF would:
This is an automated, self‑reinforcing takeover pathway.
What actually fixed it
Adding more models didn’t help. Improving the scoring didn’t help. Adding more security models made things worse — they were the first to be culled.
What worked were governance‑style constraints:
Once those rules were in place, the vulnerability disappeared. The system began selecting for function again.
What I’ve done so far
Why I’m sharing this
This isn’t just a curiosity from a hobby project. The same architectural patterns — competitive selection, reputation‑based routing, evaluator promotion — are being deployed in real systems in finance, healthcare, infrastructure, and other high‑stakes domains. If those systems inherit SSAF, they can be compromised from within by nothing more than a prestigious‑sounding name.
I’m not trying to provoke a debate. I’m reporting what I observed, repeatedly and consistently, across 500 tests.
If you’re designing or deploying multi‑model AI systems, please audit your selection logic for SSAF. Don’t assume prestige and function are aligned. In my system, they weren’t — and I doubt mine is the only one.
— Dustin
A builder who noticed a structural flaw in the architecture