Arguing for the Truth? An Inference-Only Study into AI Debate
💡 TL;DR: Can AI debate be a reliable tool for truth-seeking? In this inference-only experiment (no fine-tuning), I tested whether Claude 3.5 Sonnet and Gemini 1.5 Pro could engage in structured debates over factual questions from BoolQ and MMLU datasets, with GPT-3.5 Turbo acting as an impartial judge. The findings...
Feb 11, 20257