x

LESSWRONG

LW

Truthful AI — LessWrong

Truthful AI

This page is a stub.

Add Posts

Posts tagged Truthful AI

4

65Gaming TruthfulQA: Simple Heuristics Exposed Dataset Weaknesses

1y

3

2

72New, improved multiple-choice TruthfulQA

Owain_Evans, James Chua, Steph Lin

1y

1

2

31A tension between two prosaic alignment subgoals

3y

8

2

27How do LLMs give truthful answers? A discussion of LLM vs. human reasoning, ensembles & parrots

2y

0

2

12Truthfulness, standards and credibility

4y

2

1

49Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models

Felix Hofstätter, Francis Rhys Ward, HarrietW, LAThomson, Ollie J, Patrik Bartak, Sam F. Brown

3y

0

1

6AntiPaSTO: Self-Supervised Honesty Steering via Anti-Parallel Representations

5mo

0

Add Posts