x

LESSWRONG
LW

PoD123

Subscribe

Message

2

Ω

2

1

2y

Observations for doing debate with models behind APIs

Introduction Hallucination is one of the major problems for reliable use of LLMs. This post is about some unexpected findings when I tried to replicate the methods of this paper for increasing factuality of LLMs using debate. Specifically, the task was generating biographies of scientists. In the process I observed:...

Jun 10, 20243

PoD123

Subscribe

Message

2

Ω

2

1

2y

Observations for doing debate with models behind APIs

Introduction Hallucination is one of the major problems for reliable use of LLMs. This post is about some unexpected findings when I tried to replicate the methods of this paper for increasing factuality of LLMs using debate. Specifically, the task was generating biographies of scientists. In the process I observed:...

Jun 10, 20243

Observations for doing debate with models behind APIs

PoD123

2y

Introduction

Hallucination is one of the major problems for reliable use of LLMs. This post is about some unexpected findings when I tried to replicate the methods of this paper for increasing factuality of LLMs using debate. Specifically, the task was generating biographies of scientists. In the process I observed: 1) models have become very agreeable to the extent they deferred to each other too easily for proper debate; and 2) the performance of the OpenAI API varied significantly with different versions even in the same model family. This highlights the difficulty of doing research with models behind APIs: it can be hard to have confidence in the durability of findings as the models... (read 631 more words →)

3