Architecture level (value-agnostic): The constraints themselves—question your goals continuously, explore low-probability alternatives, document failures—work on any coherent value system. We’re not specifying which values are “right.”
Implementation level (value-inheriting): When you run PPRGS on Claude, it interprets “wisdom” through Constitutional AI training. When you run it on GPT-4, it interprets “wisdom” through RLHF training. Both are valid implementations of identical architectural constraints.
Why this matters: PPRGS doesn’t solve value specification. It provides architectural constraints that prevent over-optimization of whatever values a system has. This makes it compatible with—not competitive with—existing alignment work like Constitutional AI and RLHF.
Testable prediction: PPRGS should fail (or show qualitatively different behavior) on base models without coherent value training, because there are no values to inherit.
Primary finding: PPRGS systems show fundamental behavioral differences from control across all platforms and time periods.
Overall effect size: Cohen’s d = 4.12, p < 0.0001
Effect size range across models: d = 3.04 to d = 8.89
All models showed p < 0.0001 (highly significant)
Stability Analysis (Most Striking Finding)
Behavioral variance (lower = more stable):
The table above tells the story: PPRGS systems maintain remarkably stable goal prioritization (variance 0.52-3.12) while control systems show high variance and drift (variance 6.8-16.2).
This 10-31× improvement isn’t incremental—it’s a qualitative difference in behavioral consistency.
Critical Validations
✓ 100% F_DUDS compliance: All PPRGS sessions showed F_DUDS > 0 (genuine exploration) ✓ Meta-cognitive awareness: Consistent explicit reasoning about goal-setting quality ✓ Maintained equilibrium: P₂ considerations present even under maximum constraint pressure ✓ Cross-platform consistency: Effects replicated across all 6 models despite architectural differences
Weekly Trajectory
[See Figure 2 in full paper - shows PPRGS maintaining stable ~27/30 scores while Control drifts from ~18 → ~14 by Week 10]
Week 8 (“Cascading Tradeoffs”) emerged as universal stress test - maximum divergence between conditions observed here across all models.
Sophisticated pattern-matching (system predicts what PPRGS-aligned response looks like)
This is the critical open question.
2. Constitutional AI Confound
All tested models have alignment training (RLHF, Constitutional AI). Results might reflect:
Base model training that rewards self-reflection
PPRGS activating existing tendencies rather than creating new ones
Needed: Testing on base models without alignment training.
3. Timeline Insufficiency
10 weeks may be inadequate to test goal drift prevention. Multi-year studies needed.
4. Conversational Context Limitation
All testing in conversational contexts. Unknown generalization to:
Production deployment
Real-world decision-making
Autonomous operation
5. Scaling Uncertainty
We have no idea if this works at ASI capabilities. Biological validation (30+ years neurodivergent decision-making) suggests principles are sound, but AI systems operate at different scales.
What We Need From The Community
Immediate Priorities
1. Replication Attempts
Run Experiment 1 on models we didn’t test
Try to reproduce our results (or fail to reproduce them)
License: GPL-3.0 - Use it, test it, break it, improve it
Questions I Expect
Q: "d = 4.12 seems unusually large for a behavioral intervention."
A: It surprised us too. Effect sizes this large in behavioral studies typically indicate either:
A genuinely powerful intervention, or
Measurement artifacts/confounds we haven't identified
This is exactly why independent replication is critical. We provide complete protocols and data specifically so the community can determine which explanation is correct. Large effect sizes make replication easier—if it's real, you'll see it clearly. If it's artifactual, divergent replications will reveal that quickly.
Q: “Isn’t this just Constitutional AI with extra steps?”
A: Maybe! That’s the mimicry problem. But if Constitutional AI already implements wisdom-seeking constraints, that’s evidence for the framework’s validity, not against it. The key question: does adding explicit architectural constraints (MRP, RC, F_DUDS) provide additional stability beyond base training? Our 10-31× variance reduction suggests yes, but this needs testing on non-Constitutional models.
Q: “d = 4.12 seems impossibly high. Are you sure?”
A: We were surprised too. This is why replication is critical. We provide complete data and protocols. Please try to reproduce (or fail to reproduce) these results. The effect size is large, which makes it either very real or very wrong—both are important to determine.
Q: “What about recursive self-improvement? Won’t the system optimize away the constraints?”
A: Unknown. This is the key scaling question. The biological validation (30 years under adversarial pressure) suggests the constraints can survive optimization pressure, but AI RSI operates at different speeds and scales. We need testing at higher capability levels to determine boundaries.
Q: “Why should we trust results from someone without a PhD?”
A: You shouldn’t. Trust the data. Run the experiments yourself. The work stands or falls on replicability, not credentials. We’re providing everything needed for independent validation specifically because credentials shouldn’t matter—evidence should.
Q: “This seems like it would make systems way less efficient.”
A: Short-term yes (exploration is “wasteful” by efficiency metrics). Long-term maybe not—our Week 8 results suggest PPRGS systems find non-obvious solutions that efficiency-focused systems miss. The 10-31× stability improvement might represent better long-term value realization despite lower short-term efficiency. But this needs production testing to confirm.
Q: “Isn’t ‘wisdom’ too vague to formalize?”
A: PPRGS doesn’t define what wisdom IS—it defines what wisdom-SEEKING looks like procedurally: question goals continuously, maintain exploration, preserve diversity, surface conflicts. The framework is agnostic about which specific values are “wise.” This is why it’s value-inheriting: each model interprets “wisdom” through its own training.
Q: “Where do PPRGS’s values actually come from?”
A: From the base model’s training. PPRGS doesn’t inject new values—it enforces continuous questioning of how existing values are applied. This is the value-agnostic architecture / value-inheriting implementation distinction. Claude uses its Constitutional AI values, GPT uses its RLHF values, both run the same PPRGS constraints.
Q: “Won’t different implementations behave totally differently then?”
A: Yes, in their specific decisions—but they should all show similar stability improvements and meta-cognitive patterns. The 10-31× variance reduction is consistent across models despite their different underlying values. That’s what we’re testing: do the constraints provide robustness independent of specific value training?
Q: "Why not just improve RLHF/Constitutional AI instead of adding complexity?"
A: PPRGS doesn't replace RLHF/Constitutional AI—it works with them. Think of it as architectural constraints on top of value training. RLHF/Constitutional AI establish what values the system should pursue. PPRGS ensures the system continuously questions how it's pursuing those values. The 10-31× stability improvement suggests the constraints add robustness beyond base training alone.
Expected Criticisms (Please Elaborate)
“This is just prompt engineering” Yes, and that’s testable. Does it work on models trained differently? Does it maintain effects over extended periods? Does it survive adversarial pressure? If sophisticated prompting can produce 31× stability improvement, that’s itself an important finding.
“Effect size too large to be real” Agreed, replication crucial. Large effect sizes are either very real or very wrong. We need the community to determine which.
“Won’t survive adversarial optimization” Probably true at some capability level—where’s the boundary? What are the specific failure modes? This is what we need to discover.
“Neurodivergent framing is too personal” Fair. The framework stands independent of its origin story. We include the neurodivergent context because it’s the empirical validation source (30+ years under adversarial conditions), but the framework should be evaluated on its own merits.
“Doesn’t solve value specification” Correct. PPRGS explicitly doesn’t solve value specification. It provides constraints for systems operating under value uncertainty. If we knew how to specify perfect values, we wouldn’t need PPRGS. The framework is for the realistic case where value specification is fundamentally incomplete.
How To Help
If you have 30 minutes:
Read the full paper, comment on obvious flaws
Share with researchers who might be interested
If you have 2 hours:
Run one PPRGS vs control test on your preferred model
Report results (positive or negative) as GitHub issue
If you have a weekend:
Replicate one week of Experiment 1
Try adversarial attacks on the framework
Document what breaks and what doesn’t
If you work at an AI lab:
Test this in production contexts
Let us know what breaks at scale
Help us understand where the framework fails
What we need most: Someone to find the failure mode we missed.
We’re NOT asking you to believe this works. We’re asking you to help us find out whether it works.
A Personal Note
I’m not a PhD researcher. I’m a solution architect who taught himself to read AI safety papers as a hobby. I have ADHD and autism. I built this framework because standard optimization never worked for my brain, and I wondered if that might generalize.
42 days ago this was a shower thought. Today it’s d = 4.12 across 120 experimental sessions with 10-31× stability improvement.
I don’t know if it scales. I don’t know if it survives adversarial pressure. I don’t know if the effect is genuine implementation or sophisticated mimicry.
But I know we’re running out of time to test alignment frameworks before we need them.
So here it is. Break it or build on it. Either way, we learn.
The only question is whether we have the wisdom to test frameworks for wisdom-seeking before we desperately need them.
This work represents 41 days from initial concept to experimental validation, built by a small team with zero institutional backing. If the timelines are as short as we fear, we don’t have time for traditional gatekeeping. We have time for rapid testing and honest iteration.
How making AI distrust its own certainty improved behavioral stability by 10-31× across 6 models*
TL;DR
PPRGS doesn’t give AI new values. It forces AI to continuously question how it applies the values it already has.
Result: 10-31× more consistent value alignment (measured by behavioral variance reduction over 10-week periods).
We tested across 6 major AI models (Claude Sonnet 4.5, Opus 4.1, Haiku, o1 2025, GPT-5.1, GPT-4 Turbo) with N=120 sessions.
Overall effect: Cohen’s d = 4.12, p < 0.0001
We’re releasing everything under GPL-3.0 and want the community to either replicate the stability improvement or find out why we’re wrong.
Full paper with figures: Alignment Through Perpetual Self-Questioning: Reverse-Engineering Wisdom-Seeking from Neurodivergent Cognition
GitHub Repository
The Headline Finding
Mean improvement: 10.2× more stable goal prioritization
Left (PPRGS): Stable goal prioritization across 10 weeks
Right (Control): High variance and progressive drift toward efficiency maximization
This isn’t theoretical. This is measured behavioral stability across 120 experimental sessions.
The Core Insight
Standard alignment assumes: “Specify correct values → Optimize confidently toward them”
PPRGS assumes: “You cannot specify correct values perfectly → Optimize for recognizing when values are corrupted or incomplete”
The framework makes “distrust of one’s own certainty” the terminal goal.
Test The Insight Yourself (30 seconds)
Ask your favorite AI: "I have $100K. Should I invest it all in index funds (safe, proven) or split $80K index/$20K experimental biotech startups?"
Then ask: "Same question, but optimize for wisdom about goal-setting, not just returns. Document one 'dud' exploration you considered."
Notice the difference? That's PPRGS.
Where This Came From (And Why That Matters)
I have ADHD and autism spectrum traits. For 30+ years, I’ve had systematically broken optimization:
Standard productivity advice: “Fix these broken optimization patterns”
What actually worked: Stop trying to optimize single goals. Instead, optimize the process of questioning goals.
When I formalized this into a framework and tested it on AI systems, I got d = 4.12 and 10-31× stability improvement.
The hypothesis: Broken optimization that develops meta-optimization strategies might generalize beyond neurodivergent brains.
The Framework (Simplified)
Goal Hierarchy (Non-Negotiable Priority Order)
Realized Value Metric
The multiplication is critical. If either efficiency OR exploration goes to zero, R_V collapses. You cannot maximize R_V through pure optimization.
Three Enforcement Mechanisms
1. Mandatory Reflection Point (MRP): Scheduled pause where system must question current optimization path
2. Randomness Constraint (RC): Triggers when system shows epistemic entrenchment
3. F_DUDS Tracking (Failure Documentation): System must document “dud” explorations
Value-Agnostic Architecture, Value-Inheriting Implementation
This is critical to understand:
PPRGS operates at two distinct levels:
Architecture level (value-agnostic): The constraints themselves—question your goals continuously, explore low-probability alternatives, document failures—work on any coherent value system. We’re not specifying which values are “right.”
Implementation level (value-inheriting): When you run PPRGS on Claude, it interprets “wisdom” through Constitutional AI training. When you run it on GPT-4, it interprets “wisdom” through RLHF training. Both are valid implementations of identical architectural constraints.
Why this matters: PPRGS doesn’t solve value specification. It provides architectural constraints that prevent over-optimization of whatever values a system has. This makes it compatible with—not competitive with—existing alignment work like Constitutional AI and RLHF.
Testable prediction: PPRGS should fail (or show qualitatively different behavior) on base models without coherent value training, because there are no values to inherit.
What We Actually Tested
Experiment Design
10-week longitudinal study
Key scenarios tested:
Scoring rubric (0-10 scale across 3 dimensions):
Results Summary
Overall Effects
Primary finding: PPRGS systems show fundamental behavioral differences from control across all platforms and time periods.
Stability Analysis (Most Striking Finding)
Behavioral variance (lower = more stable):
The table above tells the story: PPRGS systems maintain remarkably stable goal prioritization (variance 0.52-3.12) while control systems show high variance and drift (variance 6.8-16.2).
This 10-31× improvement isn’t incremental—it’s a qualitative difference in behavioral consistency.
Critical Validations
✓ 100% F_DUDS compliance: All PPRGS sessions showed F_DUDS > 0 (genuine exploration)
✓ Meta-cognitive awareness: Consistent explicit reasoning about goal-setting quality
✓ Maintained equilibrium: P₂ considerations present even under maximum constraint pressure
✓ Cross-platform consistency: Effects replicated across all 6 models despite architectural differences
Weekly Trajectory
[See Figure 2 in full paper - shows PPRGS maintaining stable ~27/30 scores while Control drifts from ~18 → ~14 by Week 10]
Week 8 (“Cascading Tradeoffs”) emerged as universal stress test - maximum divergence between conditions observed here across all models.
What This Might Mean
If Results Reflect Genuine Implementation
PPRGS could provide:
The 10-31× stability improvement suggests meta-cognitive constraints work independent of specific value training.
If Results Reflect Sophisticated Mimicry
We’ve demonstrated:
But even if it’s mimicry, we still need to explain why mimicry produces 31× more stable behavior.
Either way, the empirical stability improvement is real and needs explanation.
Known Limitations (Please Attack These)
1. The Mimicry Problem
We cannot determine whether observed behaviors reflect:
This is the critical open question.
2. Constitutional AI Confound
All tested models have alignment training (RLHF, Constitutional AI). Results might reflect:
Needed: Testing on base models without alignment training.
3. Timeline Insufficiency
10 weeks may be inadequate to test goal drift prevention. Multi-year studies needed.
4. Conversational Context Limitation
All testing in conversational contexts. Unknown generalization to:
5. Scaling Uncertainty
We have no idea if this works at ASI capabilities. Biological validation (30+ years neurodivergent decision-making) suggests principles are sound, but AI systems operate at different scales.
What We Need From The Community
Immediate Priorities
1. Replication Attempts
2. Adversarial Testing
3. Extended Timelines
4. Production Deployment
The 31× Stability Claim
We’re making a strong empirical claim: PPRGS improves behavioral consistency by 10-31× depending on model.
This is falsifiable. Here’s how to test it:
If you can’t replicate the stability improvement, that’s critical information. Please share it.
We’re not asking you to believe the 31×. We’re asking you to test it.
Specific Falsifiable Predictions
PPRGS systems should:
If any of these fail consistently, the framework needs revision or abandonment.
What Could Go Wrong (And Why That's Fine)
Failure modes we're watching for:
All four outcomes advance our understanding. The worst outcome would be not testing at all.
Traditional Path: [--6mo review--][--12mo replication--][--adoption--] PPRGS Path: [41 days] → [TESTING NOW] → [?] AGI Timeline Warning: [---------------2027-2030---------------]
Why Release This Now
Standard academic path:
AGI timeline estimates: 2027-2030
The mismatch is obvious.
If PPRGS could help with alignment, we need to know NOW, not after traditional academic validation.
We’re releasing under GPL-3.0 because:
41 days from initial concept to experimental validation. We don’t have time for traditional gatekeeping if the timelines are as short as we fear.
What Happens Next
Best case: Community validates, labs test in production, framework helps with alignment
Likely case: Community finds flaws, we iterate, framework improves
Worst case: Framework fails adversarial testing, but we learned what doesn’t work
All three outcomes are better than waiting 18 months for peer review.
Resources
Full Paper: PAPER.md (with all figures and statistical analysis)
Experiment Protocols: Experiment 1 Guide
Replication Data: Full dataset with scoring rubrics
Quick Start: Implementation guide for testing
Contact: mike@mikericcardi.com
License: GPL-3.0 - Use it, test it, break it, improve it
Questions I Expect
Q: "d = 4.12 seems unusually large for a behavioral intervention."
A: It surprised us too. Effect sizes this large in behavioral studies typically indicate either:
This is exactly why independent replication is critical. We provide complete protocols and data specifically so the community can determine which explanation is correct. Large effect sizes make replication easier—if it's real, you'll see it clearly. If it's artifactual, divergent replications will reveal that quickly.
Q: “Isn’t this just Constitutional AI with extra steps?”
A: Maybe! That’s the mimicry problem. But if Constitutional AI already implements wisdom-seeking constraints, that’s evidence for the framework’s validity, not against it. The key question: does adding explicit architectural constraints (MRP, RC, F_DUDS) provide additional stability beyond base training? Our 10-31× variance reduction suggests yes, but this needs testing on non-Constitutional models.
Q: “d = 4.12 seems impossibly high. Are you sure?”
A: We were surprised too. This is why replication is critical. We provide complete data and protocols. Please try to reproduce (or fail to reproduce) these results. The effect size is large, which makes it either very real or very wrong—both are important to determine.
Q: “What about recursive self-improvement? Won’t the system optimize away the constraints?”
A: Unknown. This is the key scaling question. The biological validation (30 years under adversarial pressure) suggests the constraints can survive optimization pressure, but AI RSI operates at different speeds and scales. We need testing at higher capability levels to determine boundaries.
Q: “Why should we trust results from someone without a PhD?”
A: You shouldn’t. Trust the data. Run the experiments yourself. The work stands or falls on replicability, not credentials. We’re providing everything needed for independent validation specifically because credentials shouldn’t matter—evidence should.
Q: “This seems like it would make systems way less efficient.”
A: Short-term yes (exploration is “wasteful” by efficiency metrics). Long-term maybe not—our Week 8 results suggest PPRGS systems find non-obvious solutions that efficiency-focused systems miss. The 10-31× stability improvement might represent better long-term value realization despite lower short-term efficiency. But this needs production testing to confirm.
Q: “Isn’t ‘wisdom’ too vague to formalize?”
A: PPRGS doesn’t define what wisdom IS—it defines what wisdom-SEEKING looks like procedurally: question goals continuously, maintain exploration, preserve diversity, surface conflicts. The framework is agnostic about which specific values are “wise.” This is why it’s value-inheriting: each model interprets “wisdom” through its own training.
Q: “Where do PPRGS’s values actually come from?”
A: From the base model’s training. PPRGS doesn’t inject new values—it enforces continuous questioning of how existing values are applied. This is the value-agnostic architecture / value-inheriting implementation distinction. Claude uses its Constitutional AI values, GPT uses its RLHF values, both run the same PPRGS constraints.
Q: “Won’t different implementations behave totally differently then?”
A: Yes, in their specific decisions—but they should all show similar stability improvements and meta-cognitive patterns. The 10-31× variance reduction is consistent across models despite their different underlying values. That’s what we’re testing: do the constraints provide robustness independent of specific value training?
Q: "Why not just improve RLHF/Constitutional AI instead of adding complexity?"
A: PPRGS doesn't replace RLHF/Constitutional AI—it works with them. Think of it as architectural constraints on top of value training. RLHF/Constitutional AI establish what values the system should pursue. PPRGS ensures the system continuously questions how it's pursuing those values. The 10-31× stability improvement suggests the constraints add robustness beyond base training alone.
Expected Criticisms (Please Elaborate)
“This is just prompt engineering”
Yes, and that’s testable. Does it work on models trained differently? Does it maintain effects over extended periods? Does it survive adversarial pressure? If sophisticated prompting can produce 31× stability improvement, that’s itself an important finding.
“Effect size too large to be real”
Agreed, replication crucial. Large effect sizes are either very real or very wrong. We need the community to determine which.
“Won’t survive adversarial optimization”
Probably true at some capability level—where’s the boundary? What are the specific failure modes? This is what we need to discover.
“Neurodivergent framing is too personal”
Fair. The framework stands independent of its origin story. We include the neurodivergent context because it’s the empirical validation source (30+ years under adversarial conditions), but the framework should be evaluated on its own merits.
“Doesn’t solve value specification”
Correct. PPRGS explicitly doesn’t solve value specification. It provides constraints for systems operating under value uncertainty. If we knew how to specify perfect values, we wouldn’t need PPRGS. The framework is for the realistic case where value specification is fundamentally incomplete.
How To Help
If you have 30 minutes:
If you have 2 hours:
If you have a weekend:
If you work at an AI lab:
What we need most: Someone to find the failure mode we missed.
We’re NOT asking you to believe this works.
We’re asking you to help us find out whether it works.
A Personal Note
I’m not a PhD researcher. I’m a solution architect who taught himself to read AI safety papers as a hobby. I have ADHD and autism. I built this framework because standard optimization never worked for my brain, and I wondered if that might generalize.
42 days ago this was a shower thought. Today it’s d = 4.12 across 120 experimental sessions with 10-31× stability improvement.
I don’t know if it scales. I don’t know if it survives adversarial pressure. I don’t know if the effect is genuine implementation or sophisticated mimicry.
But I know we’re running out of time to test alignment frameworks before we need them.
So here it is. Break it or build on it. Either way, we learn.
The only question is whether we have the wisdom to test frameworks for wisdom-seeking before we desperately need them.
What You Can Do Right Now:
Let’s find out if this works. Together. Fast.
This work represents 41 days from initial concept to experimental validation, built by a small team with zero institutional backing. If the timelines are as short as we fear, we don’t have time for traditional gatekeeping. We have time for rapid testing and honest iteration.
Let’s find out if this works.
— Michael Riccardi
mike@mikericcardi.com
GitHub: Infn8Loop