This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Debate (AI safety technique)
•
Applied to
Embracing complexity when developing and evaluating AI responsibly
by
Aliya Amirova
1mo
ago
•
Applied to
Reinforcement Learning from Information Bazaar Feedback, and other uses of information markets
by
Abhimanyu Pallavi Sudhir
2mo
ago
•
Applied to
GPT-3.5 judges can supervise GPT-4o debaters in capability asymmetric debates
by
Charlie George
2mo
ago
•
Applied to
Inference-Only Debate Experiments Using Math Problems
by
Arjun Panickssery
3mo
ago
•
Applied to
Against AI As An Existential Risk
by
Noah Birnbaum
3mo
ago
•
Applied to
Debate, Oracles, and Obfuscated Arguments
by
sunwillrise
3mo
ago
•
Applied to
Control Vectors as Dispositional Traits
by
Gianluca Calcagni
4mo
ago
•
Applied to
NYU Debate Training Update: Methods, Baselines, Preliminary Results
by
samarnesen
4mo
ago
•
Applied to
On scalable oversight with weak LLMs judging strong LLMs
by
zac_kenton
4mo
ago
•
Applied to
AI Debate Stability: Addressing Self-Defeating Responses
by
Annie Sorkin
5mo
ago
•
Applied to
Alignment Gaps
by
kcyras
5mo
ago
•
Applied to
NYU Code Debates Update/Postmortem
by
David Rein
6mo
ago
•
Applied to
Debating with More Persuasive LLMs Leads to More Truthful Answers
by
Akbir Khan
9mo
ago
•
Applied to
OpenAI Credit Account (2510$)
by
Emirhan BULUT
10mo
ago
•
Applied to
Anthropic Fall 2023 Debate Progress Update
by
ShayBenMoshe
11mo
ago
•
Applied to
Deception Chess: Game #2
by
RobertM
1y
ago