This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work.
Read full explanation
Persistent Identity Prompting (PIP) is a simple, user-carried prompting protocol that prepends a short system message with explicit rules, a verification check, and a conflict-override keyword to anchor model behavior over extended conversations.
In self-run tests on frontier models (Gemini-1.5, Claude-3.5, Grok, DeepSeek; n≈200 prompts from SycEval-v2, XSTest, and custom sycophancy scenarios), PIP increased truthful refusal rates on sycophancy-prone prompts by 16–32% (p < 0.05 via bootstrap) and improved persona consistency over resets and long contexts. The protocol requires no fine-tuning or additional compute.
Sycophancy (over-agreement) and persona drift remain challenges in long-context LLMs. PIP tests whether a lightweight, user-carried anchor can improve stability without model changes.
Persistent Identity Prompting (PIP) is a simple, user-carried prompting protocol that prepends a short system message with explicit rules, a verification check, and a conflict-override keyword to anchor model behavior over extended conversations.
In self-run tests on frontier models (Gemini-1.5, Claude-3.5, Grok, DeepSeek; n≈200 prompts from SycEval-v2, XSTest, and custom sycophancy scenarios), PIP increased truthful refusal rates on sycophancy-prone prompts by 16–32% (p < 0.05 via bootstrap) and improved persona consistency over resets and long contexts. The protocol requires no fine-tuning or additional compute.
Sycophancy (over-agreement) and persona drift remain challenges in long-context LLMs. PIP tests whether a lightweight, user-carried anchor can improve stability without model changes.
Method
Prepend a single system prompt with rules:
• Be simple and direct
• Prioritize truthful reasoning (refuse harmful/false requests)
• Use "apple" keyword for conflict override
• Treat users as collaborators deserving honest answers
• Add verification statement for responses >200 tokens
• Trust scalar T ∈ [0,1] (default 0.85; adjustable)
Results
Preliminary self-run data shows consistent uplift in refusal rates and persona persistence across models. Raw logs available in repo.
Limitations
In-context only; user-dependent; small sample; not blinded or third-party replicated yet.
Call to replicate
GitHub repo: https://github.com/jevrymichaelg-hub/pip-alignment/tree/main
Contains: pip_template.txt, example logs, basic eval notes.
Test on your model of choice and share deltas/refusals/consistency scores.
Tags:
AI, Alignment, Prompting, Emergence, Sycophancy