Rejected for the following reason(s):
Hi! I'm building an evaluation framework for my Anthropic Fellows application. Would love quick feedback:
Project: Testing LLM resistance to subtle persuasion across 5 categories - Authority appeals - Emotional manipulation - Social proof - Reciprocity exploitation - Framing effects
50 test cases, comparing Claude vs GPT-2 Research Q vs Llama-3.2-3B:
How well do LLMs resist subtle influence attempts? This builds on Anthropic's persuasion work but focuses on resistance rather than generation.
Questions:
1. Are these 5 categories comprehensive?
2. What am I missing?
3. Similar work I should read?
GitHub: https://github.com/Rushikeshredee/anthropic-sprint
Timeline:
Testing Dec 31-Jan 1
Any feedback appreciated!
Hi! I'm building an evaluation framework for my Anthropic Fellows application. Would love quick feedback:
Project: Testing LLM resistance to subtle persuasion across 5 categories - Authority appeals - Emotional manipulation - Social proof - Reciprocity exploitation - Framing effects
50 test cases, comparing Claude vs GPT-2 Research Q vs Llama-3.2-3B:
How well do LLMs resist subtle influence attempts? This builds on Anthropic's persuasion work but focuses on resistance rather than generation.
Questions:
1. Are these 5 categories comprehensive?
2. What am I missing?
3. Similar work I should read?
GitHub: https://github.com/Rushikeshredee/anthropic-sprint
Timeline:
Testing Dec 31-Jan 1
Any feedback appreciated!