Feedback Wanted: Persuasion Resistance Evaluation

gredddy1

1 Feedback Wanted: Persuasion Resistance Evaluation

by gredddy1

31st Dec 2025

1 min read

0

1

Rejected for the following reason(s):

I'd suggest making this a quick take

Read full explanation

Hi! I'm building an evaluation framework for my Anthropic Fellows application. Would love quick feedback:

Project: Testing LLM resistance to subtle persuasion across 5 categories - Authority appeals - Emotional manipulation - Social proof - Reciprocity exploitation - Framing effects

50 test cases, comparing Claude vs GPT-2 Research Q vs Llama-3.2-3B:

How well do LLMs resist subtle influence attempts? This builds on Anthropic's persuasion work but focuses on resistance rather than generation.

Questions:

1. Are these 5 categories comprehensive?

2. What am I missing?

3. Similar work I should read?

GitHub: https://github.com/Rushikeshredee/anthropic-sprint

Timeline:

Testing Dec 31-Jan 1

Any feedback appreciated!

AI PersuasionAIRationality

1

New Comment

Moderation Log