How do new models from OpenAI, DeepMind and Anthropic perform on TruthfulQA? — LessWrong