x

LESSWRONG
LW

Matthew Nguyen — LessWrong

Matthew Nguyen

Matthew Nguyen

Message

8

1

10mo

Matthew Nguyen

8

10mo

Manipulating Self-Preference In LLMs

Introduction As AI models like ChatGPT take on a growing role in judging which answers qualify as "correct," they must remain impartial. Yet large language models (LLMs) used in these roles often display disproportionate self-preference, selecting their own outputs even when stronger, more accurate alternatives exist. This becomes especially concerning...

Jul 1, 2025•13