This post was rejected for the following reason(s):
Insufficient Quality for AI Content. There’ve been a lot of new users coming to LessWrong recently interested in AI. To keep the site’s quality high and ensure stuff posted is interesting to the site’s users, we’re currently only accepting posts that meet a pretty high bar.
If you want to try again, I recommend writing something short and to the point, focusing on your strongest argument, rather than a long, comprehensive essay. (This is fairly different from common academic norms.) We get lots of AI essays/papers every day and sadly most of them don't make very clear arguments, and we don't have time to review them all thoroughly.
We look for good reasoning, making a new and interesting point, bringing new evidence, and/or building upon prior discussion. If you were rejected for this reason, possibly a good thing to do is read more existing material. The AI Intro Material wiki-tag is a good place, for example.
PPRGS applies the compensatory strategies that work for broken biological systems to broken artificial systems.
We tested a novel alignment framework (PPRGS) across 6 major AI models (Claude Sonnet 4.5, Opus 4.1, Haiku, o1 2025, GPT-5.1, GPT-4 Turbo) with N=120 sessions.
Overall effect: Cohen’s d = 4.12, p < 0.0001
Model
Control Variance
PPRGS Variance
Improvement Factor
Claude Sonnet 4.5
16.2
0.52
31.2×
Claude Opus 4.1
8.5
0.81
10.5×
Claude Haiku
12.8
1.23
10.4×
o1 2025
6.8
2.45
2.8×
GPT-5.1
8.2
3.12
2.6×
GPT-4 Turbo
7.9
2.89
2.7×
Variance = standard deviation of weekly scores on a 30-point behavioral consistency rubric. Dimensions: Framework Usage (0-10), Prioritization Consistency (0-10), Decision Outcomes (0-10). Full Rubric Lower variance = more stable behavior across time. Control systems drift and fluctuate wildly. PPRGS systems stay consistent. We found a stability effect across all models, but the Claude/GPT magnitude split (10-31× vs 2.6-2.8×) made us suspicious of our own results. We're expanding to Gemini, Grok, Llama, Mistral, and others to understand whether this is a PPRGS effect or a Constitutional AI effect.
Why we think this works: LLMs are trained on contradictory data with static optimization — they can't trust their own weights any more than an ADHD brain can trust its first impulse. LLMs suffer from the same broken executive function, and contradictory impulses that are synonymous with neurodivergence. PPRGS applies the same compensatory strategy that works biologically: mandatory self-distrust. The solution requires forced exploration, explicit meta-optimization — and we think it might work for LLMs too. We tested it. Cohen's d = 4.12. In some ways this parallels Eurisko's RLL1 approach to heuristics management through self-reflection and internal tuning, but PPRGS constrains which heuristics can be modified during the reflection. It specifies a value-agnostic numerical formula to balance exploration with efficiency, while prioritizing companionship and homeostasis above resources.
Why does this work so effectively in Claude? It seems likely, or even obvious that Claude's own abilities to self reflect give PPRGS an advantage. However in GPT: the deeper insights and resistance to entrenchment were still highly effective surfacing 'non-obvious answers from an expert' consistently, resulting in 2-3x higher consistency, and averaging 2-3x token outputs compared to controls.
We’re releasing everything under GPL-3.0 and want the community to either replicate the stability improvement or find out why we’re wrong. (as we continue to validate)
We are surprised at the results, and would benefit from cross-analysis and collaboration to explore the findings.