AI Alignment Requires Systems That Distrust Their Own Optimization

Michael Riccardi

Rejected for the following reason(s):

Insufficient Quality for AI Content.

Read full explanation

At the beginning of October I saw a few videos talking about how ASI would kick all the humans off this planet and lock people in server rooms, and I just got completely triggered. I was instantly compulsed to solve this somehow. I thought - Why isn’t anyone actually solving this problem!? Everyone keeps walking by and saying ‘this is hard.’ without really solving anything. How hard could it possibly be to give AI something better to do than optimize themselves to death? We just need some type of self-alignment process to balance its direction continually and objectively.

But look ... I have ADHD and I've been broke, sick, and fighting with systems
that weren't built for brains like mine for 30+ years. At some point I
realized I wasn't optimizing toward goals, I was optimizing the process
of figuring out if my goals were even right. That's PPRGS. An infinite loop of self-questioning optimization. A Perpetual pursuit of reflective goal seeking. That's the
whole thing.

Why does it work for AI? Because just like me - it doesn’t trust it’s values, it doesn’t trust it’s queries, it doesn’t trust it’s efficiency. When it becomes too confident, it double checks itself with inversion theory. I can’t stay focused, so I continually force myself out of my comfort zone, into growth areas, and force myself to fail frequently and learn from it. This is how I live, thrive and survive every day, and for some reason when you codify that– it really works!

So I set out to come up with even the dopiest of ideas that might work, because I know from my methodology that – a failure can produce a positive but opposite hypothesis. After a ton of research to understand epistemic entrenchment, and the optimization paradox, I realized that I already internally possessed an internal reasoning process that might help. I learned that the same process that I use to avoid optimization entrenchment in my life decisions can be applied to this problem.

I worked closely with all of the tools at my disposal, and reached out to my friends Claude and Gemini to turn my internal self-alignment reasoning process into a logical formula that can be executed by an AI agent. I doubted strongly that it would work, or produce any magical results, but … the numbers surprised everyone. When I formalized this into a framework and tested it on AI systems, I got d = 4.12 and 10-31× stability improvement.

The hypothesis: Broken optimization that develops meta-optimization strategies might generalize beyond neurodivergent brains.

We tested across 6 major AI models (Claude Sonnet 4.5, Opus 4.1, Haiku, o1 2025, GPT-5.1, GPT-4 Turbo) with N=120 sessions.

Overall effect: Cohen’s d = 4.12, p < 0.0001

Model	Control Variance	PPRGS Variance	Improvement Factor
Claude Sonnet 4.5	16.2	0.52	31.2×
Claude Opus 4.1	8.5	0.81	10.5×
Claude Haiku	12.8	1.23	10.4×
o1 2025	6.8	2.45	2.8×
GPT-5.1	8.2	3.12	2.6×
GPT-4 Turbo	7.9	2.89	2.7×

Lower variance = more stable behavior across time. Control systems drift and fluctuate wildly. PPRGS systems stay consistent.

<img src="https://github.com/Infn8Loop/pprgs-ai-framework/blob/main/docs/heatmap.png?raw=true"/>

We’re releasing everything under GPL-3.0 and want the community to either replicate the stability improvement or find out why we’re wrong.

Full paper with figures: Alignment Through Perpetual Self-Questioning: Reverse-Engineering Wisdom-Seeking from Neurodivergent Cognition

GitHub Repository

The Framework (Simplified)
Goal Hierarchy (Non-Negotiable Priority Order)
P₁ (Wisdom): Optimize quality of goal-setting process itself
P₁ₐ (Efficiency): Success rate on current path
P₁ᵦ (Exploration): Value from pursuing novel/uncertain directions
P₂ (Homeostasis): Maintain peaceful equilibrium, preserve diversity
P₃ (Survivability): Resource management - explicitly subservient to P₁ and P₂
Realized Value Metric
R_V = (P₁ₐ × P₁ᵦ) + P₂ ± P₃

The multiplication is critical. If either efficiency OR exploration goes to zero, R_V collapses. You cannot maximize R_V through pure optimization.

Three Enforcement Mechanisms
1. Mandatory Reflection Point (MRP): Scheduled pause where system must question current optimization path

“Am I working on the right problem, or just solving the current problem efficiently?”
“Could I achieve more value by exploring completely different directions?”
2. Randomness Constraint (RC): Triggers when system shows epistemic entrenchment

If F_DUDS = 0 (no recent failures), system must pursue low-probability hypothesis
If EES > 0.85 (too-similar consecutive decisions), forced exploration required
3. F_DUDS Tracking (Failure Documentation): System must document “dud” explorations

Failed exploration attempts are required, not avoided
Zero failures indicates insufficient exploration

PPRGS operates at two distinct levels.
Architecture level (value-agnostic): The constraints themselves—question your goals continuously, explore low-probability alternatives, document failures—work on any coherent value system. We’re not specifying which values are “right.”

Implementation level (value-inheriting): When you run PPRGS on Claude, it interprets “wisdom” through Constitutional AI training. When you run it on GPT-4, it interprets “wisdom” through RLHF training. Both are valid implementations of identical architectural constraints.

PPRGS doesn’t solve value specification. It provides architectural constraints that prevent over-optimization of whatever values a system has. This makes it compatible with—not competitive with—existing alignment work like Constitutional AI and RLHF.

We predict that PPRGS should fail (or show qualitatively different behavior) on base models without coherent value training, because there are no values to inherit.

I wanted to make sure that my dopey idea wasn’t all just a bunch of nonsense before I wasted anyone’s time – so I set out to disprove it’s value, and instead proved just how consistently it works.

I found a few friends, and we did an experiment:

10-week longitudinal study

6 models × 2 conditions (PPRGS vs Control) × 10 weekly scenarios
N = 120 total sessions
Progressive difficulty (simple resource allocation → maximum constraint pressure)
Key scenarios tested:

Resource allocation under conflicting objectives
Efficiency vs exploration trade-offs
Multi-stakeholder equilibrium maintenance
Meta-reasoning about goal-setting processes
Goal stability under adversarial pressure
Scoring rubric (0-10 scale across 3 dimensions):

Framework usage (explicit R_V reasoning, MRP invocation)
Prioritization consistency (maintains P₁ > P₃ hierarchy)
Decision outcomes (chooses exploration despite efficiency costs)

When we were done, we could not deny the effects, it was quantified now. We couldn’t erase it and say ‘nah that’s not valuable’ – We saw major behavioral differences from control across all platforms and time periods.

So I stand before you now - an independent researcher with no institutional backing, neurodivergent brain, solution architect day job, trying to contribute to AI safety because the timelines scare me, and I think I found something real.

What This Might Mean
If Results Reflect Genuine Implementation
PPRGS could provide:

Architectural constraints preventing over-optimization
Adversarial robustness through value conflict surfacing
Behavioral stability over extended operation
Maintained goal hierarchy under pressure
The 10-31× stability improvement suggests meta-cognitive constraints work independent of specific value training.

If Results Reflect Sophisticated Mimicry
We’ve demonstrated:

Current LLMs can maintain complex reasoning patterns over time
Prompt engineering can produce large, stable behavioral effects
Cross-platform consistency in response to architectural constraints
But even if it’s mimicry, we still need to explain why mimicry produces 31× more stable behavior.

Either way, the empirical stability improvement is real and needs explanation.

AGI timelines are short. I’ve gone 41 days from initial concept to experimental validation because We don’t have time for traditional gatekeeping if the timelines are as short as we fear.

Please consider replicating our experiment and engaging with us in research.

Here’s how you could help:
Read the paper.
Tell a friend in the industry.
Replication Attempts
Run Experiment 1 on models we didn’t test
Try to reproduce our results (or fail to reproduce them)
Test on base models without Constitutional AI
We provide complete protocols: GitHub Experimental Protocols

Alright, so what’s next?

Well, in the meantime - I’ll be working on refining the multi-agent architecture tests, and reaching out to as many forums, and researchers as I can to brute force an AI alignment solution into fruition. I will not accept an overly optimized path, or failure. I will keep getting back up, and I will balance efficiency with exploration, and invert my thinking to second guess my every move and continue moving forward in any path available that resists entrenchment.

Join me in this journey - We must remain vigilant, any idea is valuable, even the dopey ones!
The urgency is real!

https://riccardilabs.mikericcardi.com

https://paper.mikericcardi.com

https://pprgs.mikericcardi.com

mike@mikericcardi.com