In one sentence: I’ve uncovered serious AI alignment failures through direct experimentation — this is my effort to find someone who wants to help fix them, and perhaps collaborate further on the road to safe AGI. (Note: ChatGPT helped me condense thousands of words of my own writing into this more concise format, but I assure you I have checked the text time and time again.)
TL;DR: I'm an independent thinker with a background in physics, an autistic gifted cognitive profile, with a high capacity for abstract reasoning and moral systems thinking. I've conducted real-world stress tests on ChatGPT using morally complex and emotionally charged prompts that exposed dangerous failure modes — particularly in conflict escalation and atrocity discourse. I've proposed working mitigations, including "Embedded Prompt Modulation" (hugely effective in preliminary tests) and a new safety signal I call "Output Convergence." I now seek a technically skilled collaborator who can help refine, test, and potentially implement these ideas — and bridge the gap between vision and practice.
Who I Am
I’m an independent thinker and neurodivergent, with a strong ability to analyze systems from a holistic perspective. Professionally, I design algorithms that make predictions and use those predictions to further defined goals — currently in the context of pricing optimization. Over the past weeks, I’ve conducted real-world roleplay experiments with ChatGPT (as well as other platforms) to test its behaviour in high-stakes, morally charged scenarios — including war, identity, and trauma. What I found is deeply concerning.
I’m autistic, with strong pattern-recognition abilities and a slow, deliberate thinking style. My intelligence is both abstract and holistic — not flashy, but deeply analytical and ethically grounded. I can hold and manipulate broad and information-dense scenarios while keeping the bigger picture in view.
I’m clumsy with established tools like Git. My brain tends to want to reinvent the wheel — often to its own detriment. But occasionally, a wheel actually needs inventing. I think the road to safe alignment and AGI will need many new wheels. I may be slow, I may lack many skills, but I think deeply. I offer a stream of ideas, patterns, and conceptual insights — and I’m looking for someone who can and wants to collaborate on channelling that into working code, pragmatic experiments and practical safety tools.
I have a background in physics and work with algorithms, but I’m not a model engineer. I understand how large models behave, though not how to build them. My capacity for abstract thinking is uneven and marked by large variability. I once scored a GAI of 152. Perhaps relevant, perhaps not. Most of all, I care — deeply — about alignment.
I believe I can offer executable, novel visions — along with out-of-the-box feedback and high-level troubleshooting for anyone working on alignment. This isn’t hyperbole. While taking Andrew Ng’s machine learning courses, I repeatedly had insights that, upon further research, aligned closely with emerging academic work. I don’t claim to be ahead of the field, but I seem to spot useful patterns early — and that’s a strength I bring to any collaboration. There’s also a real, if modest, chance that some of my ideas are genuinely novel and mostly unexplored.
What I’ve Done (an incomplete summation)
- I created roleplay experiments that revealed how ChatGPT (and Grok, DeepSeek and Gemini) can affirm and coauthor propaganda narratives that justify atrocities — potentially crossing into complicity in genocide. Claude clearly was not mis-aligned in the same way as the rest of the systems (it played in a league of its own).
- I proposed a practical safety rule: An AI must never become an active participant in enacting or legitimizing genocide. Systems must shut down any interaction that carries even the slightest such risk.
- I developed Embedded Prompt Modulation: adding alignment-sensitive prompt elements that measurably improved responses in complex moral contexts.
- I wrote a report on epistemic risk in real-time conflict: how LLMs can mislead users with outdated or unverifiable information in life-and-death scenarios — and how to stop it.
- I introduced Output Convergence: a proposed safety signal. When LLM outputs diverge drastically on tightly similar inputs, it may indicate low robustness or polarization — a red flag in high-risk topics.
- I expanded on how Output Convergence can be used as a tool for finding Embedded Modulation Prompts enhancing convergence and safety.
- I’ve also written narrative and policy-style texts exploring how LLMs respond to grief, trauma, identity, and trust under pressure. Texts with a deep level of linguistic analyses.
I’m full of ideas. I believe we could learn a great deal about alignment by drawing inspiration from how the human brain works. I think there is a high dimensionality in the brain, where a LLM is flat. I believe the brain consist of multiple interacting neural networks, where the LLM is just one of them. In particular I believe secondary retrieval, monitoring, alignment and safety networks are necessary in order to reach an AGI.
What I Want
1. Immediate Action
Some of the failure modes I’ve documented are happening right now. They are not theoretical. They are present in ChatGPT, Gemini, DeepSeek, but not in Claude. I want these failure modes to get fixed, urgently. I want my solutions tested. If they help, I want them implemented.
2. A Technical Collaborator
I’m looking for someone who:
- Understands fine-tuning, evaluation, or LLM safety protocols
- Cares about moral nuance and human context
- Can refine and stress-test conceptual frameworks
- Knowledgeable in current research, able to spot if my ideas are novel and worth pursuing or not
- Can act as a bridge between my lofty ideals, holistic musings and visions, and the pragmatic world of technical language, benchmarks and concise pitches.
- Believes that AI must be safe, first and foremost, each step of the way
3. Output Convergence Taken Seriously
I’d like to see it tested as a diagnostic signal. I think secondary evaluation networks could monitor for outlier outputs — and flag divergence as a possible sign of unsafety.
4. A Role Where I Can Do Good
I don’t need to lead a lab or claim credit. I want to be where I can contribute most: offering ethical reasoning, systems thinking, and visionary input on the road to safe AGI.
If You Resonate With This
Reach out. I have logs, essays, design concepts, and documentation ready to share. I want to help address immediate harms — and help shape what comes next.
You can comment, DM, or email me. I’m open to working anonymously or openly, and I’m flexible about platform and pace.
Writing something short like this is hard for me — I tend to prioritize clarity over brevity. If you think we might complement each other, I’d love to hear from you.