denisemester

Message

Arguing for the Truth? An Inference-Only Study into AI Debate

Feb 11, 2025•7

denisemester

Message

Arguing for the Truth? An Inference-Only Study into AI Debate

Feb 11, 2025•7

denisemester

Arguing for the Truth? An Inference-Only Study into AI Debate

denisemester

💡 TL;DR: Can AI debate be a reliable tool for truth-seeking? In this inference-only experiment (no fine-tuning), I tested whether Claude 3.5 Sonnet and Gemini 1.5 Pro could engage in structured debates over factual questions from BoolQ and MMLU datasets, with GPT-3.5 Turbo acting as an impartial judge. The findings were mixed: while debaters sometimes prioritized ethical reasoning and scientific accuracy, they also demonstrated situational awareness, recognizing their roles as AI systems. This raises a critical question—are we training models to be more honest, or just more persuasive? If AI can strategically shape its arguments based on evaluator expectations, debate-based oversight might risk amplifying deception rather than uncovering the truth.

Code available here: https://github.com/dmester96/AI-debate-experiment/

AI

... (read 4712 more words →)

LESSWRONG
LW

LESSWRONG
LW

denisemester

denisemester

Arguing for the Truth? An Inference-Only Study into AI Debate

denisemester

denisemester

Arguing for the Truth? An Inference-Only Study into AI Debate

AI