x

LESSWRONG

LW

Seamus_F — LessWrong

Seamus_F

Seamus_F

Message

11

3y

Seamus_F

11

3y

Robustness of Contrast-Consistent Search to Adversarial Prompting

by Nandi, i, Jamie Wright, Seamus_F, and hugofry

Produced as part of the AI Safety Hub Labs programme run by Charlie Griffin and Julia Karbing. This project was mentored by Nandi Schoots. Image generated by DALL-E 3. Introduction We look at how adversarial prompting affects the outputs of large language models (LLMs) and compare it with how the...

Nov 1, 2023•18