Frontier LLMs disagree on 67% of real-world fact-checks (n=1,000)
Snapshot v1.0 · Data as of 2026-05-21 · DOI: 10.5281/zenodo.20344847 · PDF + dataset · Web version I presented 1,000 recent real-user claims to the five top frontier LLMs and asked each one for a verdict on a fixed 4-bucket rubric (True / Mostly True / Misleading / False). These...
May 261