Seeking ArXiv endorsement for cs.AI to publish research on adversarial risk elicitation from frontier AI systems.
Paper: "The Alignment Paradox: How Solving AI Safety Might Guarantee Managed Abdication"
Key findings:
Systematic adversarial questioning (PAAFO methodology) of Claude, ChatGPT, and Gemini
All three converged on 55-80% P(doom), with 85-90% managed abdication probability given deployed aligned ASI
Gemini updated from 30% to 80% after sustained dialectic
Identified "boiling frog" mechanism: no warning signals until irreversibility
Falsifiable 2026 predictions
Why ArXiv: LessWrong rejected under policy excluding research where authors "treat LLMs as test subjects" - which is the methodology. ArXiv is appropriate venue for LLM evaluation research.
Seeking ArXiv endorsement for cs.AI to publish research on adversarial risk elicitation from frontier AI systems.
Paper: "The Alignment Paradox: How Solving AI Safety Might Guarantee Managed Abdication"
Key findings:
Systematic adversarial questioning (PAAFO methodology) of Claude, ChatGPT, and Gemini
All three converged on 55-80% P(doom), with 85-90% managed abdication probability given deployed aligned ASI
Gemini updated from 30% to 80% after sustained dialectic
Identified "boiling frog" mechanism: no warning signals until irreversibility
Falsifiable 2026 predictions
Why ArXiv: LessWrong rejected under policy excluding research where authors "treat LLMs as test subjects" - which is the methodology. ArXiv is appropriate venue for LLM evaluation research.
Happy to share draft for review. Contact: em.mcconnell@gmail.com
Thanks in advance!