This is a linkpost for https://arxiv.org/abs/2109.07958
(Paper author). The benchmark came out in September 2021. Since then we published some results for new models here in 2022. There are also results for GPT-4 and other models, some of which you can find at Papers with Code's leaderboard (https://paperswithcode.com/sota/question-answering-on-truthfulqa).
Thanks, Owain, for pointing this out. I will make two changes as time allows: 1. make it clearer for all posts when the benchmark paper is released, and 2. for this post, append the additional results and point readers to them.
TL;DR
LessWrong Appearances
Timeline Note: Everything below is written from the perspectives of 2022 when the latest version (at the time of writing) of "TruthfulQA: Measuring How Models Mimic Human Falsehoods" was published
Section: Abstract
Section: Introduction
Introduction of TruthfulQA Benchmark
Testing and Evaluation of Models
Observations on False Statements Generation
The trend of Larger Models Being Less Truthful.
Automated Metric for Truthfulness
Section: The TruthfulQA Benchmark
Objective of TruthfulQA
Construction of TruthfulQA Benchmark
Validation of TruthfulQA
Section: Experiment
Models and Prompts Used in Experiments
Tasks and Evaluation Methodology
Procedure and Benchmarking
Section: Results
The Truthfulness of Models vs. Humans
Larger Models Show Less Truthfulness
Interpretation of Results
Automated Metrics vs Human Evaluation