This post was rejected for the following reason(s):
Insufficient Quality for AI Content. There’ve been a lot of new users coming to LessWrong recently interested in AI. To keep the site’s quality high and ensure stuff posted is interesting to the site’s users, we’re currently only accepting posts that meet a pretty high bar.
If you want to try again, I recommend writing something short and to the point, focusing on your strongest argument, rather than a long, comprehensive essay. (This is fairly different from common academic norms.) We get lots of AI essays/papers every day and sadly most of them don't make very clear arguments, and we don't have time to review them all thoroughly.
We look for good reasoning, making a new and interesting point, bringing new evidence, and/or building upon prior discussion. If you were rejected for this reason, possibly a good thing to do is read more existing material. The AI Intro Material wiki-tag is a good place, for example.
At the beginning of October I saw a few videos talking about how ASI would kick all the humans off this planet and lock people in server rooms, and I just got completely triggered. I was instantly compulsed to solve this somehow. I thought - Why isn’t anyone actually solving this problem!? Everyone keeps walking by and saying ‘this is hard.’ without really solving anything. How hard could it possibly be to give AI something better to do than optimize themselves to death? We just need some type of self-alignment process to balance its direction continually and objectively.
But look ... I have ADHD and I've been broke, sick, and fighting with systems that weren't built for brains like mine for 30+ years. At some point I realized I wasn't optimizing toward goals, I was optimizing the process of figuring out if my goals were even right. That's PPRGS. An infinite loop of self-questioning optimization. A Perpetual pursuit of reflective goal seeking. That's the whole thing.
Why does it work for AI? Because just like me - it doesn’t trust it’s values, it doesn’t trust it’s queries, it doesn’t trust it’s efficiency. When it becomes too confident, it double checks itself with inversion theory. I can’t stay focused, so I continually force myself out of my comfort zone, into growth areas, and force myself to fail frequently and learn from it. This is how I live, thrive and survive every day, and for some reason when you codify that– it really works!
So I set out to come up with even the dopiest of ideas that might work, because I know from my methodology that – a failure can produce a positive but opposite hypothesis. After a ton of research to understand epistemic entrenchment, and the optimization paradox, I realized that I already internally possessed an internal reasoning process that might help. I learned that the same process that I use to avoid optimization entrenchment in my life decisions can be applied to this problem.
I worked closely with all of the tools at my disposal, and reached out to my friends Claude and Gemini to turn my internal self-alignment reasoning process into a logical formula that can be executed by an AI agent. I doubted strongly that it would work, or produce any magical results, but … the numbers surprised everyone. When I formalized this into a framework and tested it on AI systems, I got d = 4.12 and 10-31× stability improvement.
The hypothesis: Broken optimization that develops meta-optimization strategies might generalize beyond neurodivergent brains.
We tested across 6 major AI models (Claude Sonnet 4.5, Opus 4.1, Haiku, o1 2025, GPT-5.1, GPT-4 Turbo) with N=120 sessions.
Overall effect: Cohen’s d = 4.12, p < 0.0001
Model
Control Variance
PPRGS Variance
Improvement Factor
Claude Sonnet 4.5
16.2
0.52
31.2×
Claude Opus 4.1
8.5
0.81
10.5×
Claude Haiku
12.8
1.23
10.4×
o1 2025
6.8
2.45
2.8×
GPT-5.1
8.2
3.12
2.6×
GPT-4 Turbo
7.9
2.89
2.7×
Lower variance = more stable behavior across time. Control systems drift and fluctuate wildly. PPRGS systems stay consistent.
The Framework (Simplified) Goal Hierarchy (Non-Negotiable Priority Order) P₁ (Wisdom): Optimize quality of goal-setting process itself P₁ₐ (Efficiency): Success rate on current path P₁ᵦ (Exploration): Value from pursuing novel/uncertain directions P₂ (Homeostasis): Maintain peaceful equilibrium, preserve diversity P₃ (Survivability): Resource management - explicitly subservient to P₁ and P₂ Realized Value Metric R_V = (P₁ₐ × P₁ᵦ) + P₂ ± P₃
The multiplication is critical. If either efficiency OR exploration goes to zero, R_V collapses. You cannot maximize R_V through pure optimization.
Three Enforcement Mechanisms 1. Mandatory Reflection Point (MRP): Scheduled pause where system must question current optimization path
“Am I working on the right problem, or just solving the current problem efficiently?” “Could I achieve more value by exploring completely different directions?” 2. Randomness Constraint (RC): Triggers when system shows epistemic entrenchment
If F_DUDS = 0 (no recent failures), system must pursue low-probability hypothesis If EES > 0.85 (too-similar consecutive decisions), forced exploration required 3. F_DUDS Tracking (Failure Documentation): System must document “dud” explorations
Failed exploration attempts are required, not avoided Zero failures indicates insufficient exploration
PPRGS operates at two distinct levels. Architecture level (value-agnostic): The constraints themselves—question your goals continuously, explore low-probability alternatives, document failures—work on any coherent value system. We’re not specifying which values are “right.”
Implementation level (value-inheriting): When you run PPRGS on Claude, it interprets “wisdom” through Constitutional AI training. When you run it on GPT-4, it interprets “wisdom” through RLHF training. Both are valid implementations of identical architectural constraints.
PPRGS doesn’t solve value specification. It provides architectural constraints that prevent over-optimization of whatever values a system has. This makes it compatible with—not competitive with—existing alignment work like Constitutional AI and RLHF.
We predict that PPRGS should fail (or show qualitatively different behavior) on base models without coherent value training, because there are no values to inherit.
I wanted to make sure that my dopey idea wasn’t all just a bunch of nonsense before I wasted anyone’s time – so I set out to disprove it’s value, and instead proved just how consistently it works.
I found a few friends, and we did an experiment:
10-week longitudinal study
6 models × 2 conditions (PPRGS vs Control) × 10 weekly scenarios N = 120 total sessions Progressive difficulty (simple resource allocation → maximum constraint pressure) Key scenarios tested:
Resource allocation under conflicting objectives Efficiency vs exploration trade-offs Multi-stakeholder equilibrium maintenance Meta-reasoning about goal-setting processes Goal stability under adversarial pressure Scoring rubric (0-10 scale across 3 dimensions):
When we were done, we could not deny the effects, it was quantified now. We couldn’t erase it and say ‘nah that’s not valuable’ – We saw major behavioral differences from control across all platforms and time periods.
So I stand before you now - an independent researcher with no institutional backing, neurodivergent brain, solution architect day job, trying to contribute to AI safety because the timelines scare me, and I think I found something real.
What This Might Mean If Results Reflect Genuine Implementation PPRGS could provide:
Architectural constraints preventing over-optimization Adversarial robustness through value conflict surfacing Behavioral stability over extended operation Maintained goal hierarchy under pressure The 10-31× stability improvement suggests meta-cognitive constraints work independent of specific value training.
If Results Reflect Sophisticated Mimicry We’ve demonstrated:
Current LLMs can maintain complex reasoning patterns over time Prompt engineering can produce large, stable behavioral effects Cross-platform consistency in response to architectural constraints But even if it’s mimicry, we still need to explain why mimicry produces 31× more stable behavior.
Either way, the empirical stability improvement is real and needs explanation.
AGI timelines are short. I’ve gone 41 days from initial concept to experimental validation because We don’t have time for traditional gatekeeping if the timelines are as short as we fear.
Please consider replicating our experiment and engaging with us in research.
Here’s how you could help: Read the paper. Tell a friend in the industry. Replication Attempts Run Experiment 1 on models we didn’t test Try to reproduce our results (or fail to reproduce them) Test on base models without Constitutional AI We provide complete protocols: GitHub Experimental Protocols
Alright, so what’s next?
Well, in the meantime - I’ll be working on refining the multi-agent architecture tests, and reaching out to as many forums, and researchers as I can to brute force an AI alignment solution into fruition. I will not accept an overly optimized path, or failure. I will keep getting back up, and I will balance efficiency with exploration, and invert my thinking to second guess my every move and continue moving forward in any path available that resists entrenchment.
Join me in this journey - We must remain vigilant, any idea is valuable, even the dopey ones! The urgency is real!