Shi Feng

Message

NYU ARG

https://ihsgnef.github.io/

I Am Large, I Contain Multitudes: Persona Transmission via Contextual Inference in LLMs

We demonstrate that LLMs can infer information about past personas from a set of nonsensical but innocuous questions and binary answers (“Yes.” vs “No.”, inspired by past work on deception detection) in context, and act upon them in safety-related questions. This is despite the questions bearing no semantic relation to...

Sep 8, 202533

LLM Evaluators Recognize and Favor Their Own Generations

Self-evaluation using LLMs is used in reward modeling, model-based benchmarks like GPTScore and AlpacaEval, self-refinement, and constitutional AI. LLMs have been shown to be accurate at approximating human annotators on some tasks. But these methods are threatened by self-preference, a bias in which an LLM evaluator scores its own outputs...

Apr 17, 202452

LESSWRONG
LW

LESSWRONG
LW

Shi Feng

Shi Feng

I Am Large, I Contain Multitudes: Persona Transmission via Contextual Inference in LLMs

LLM Evaluators Recognize and Favor Their Own Generations

Shi Feng

Shi Feng

I Am Large, I Contain Multitudes: Persona Transmission via Contextual Inference in LLMs

LLM Evaluators Recognize and Favor Their Own Generations