1941

LESSWRONG
LW

1940
Language Models (LLMs)RLHFAI
Personal Blog

13

[ Question ]

Why is Gemini telling the user to die?

by Burny
18th Nov 2024
1 min read
A
0
1

13

13

Why is Gemini telling the user to die?
4Mitchell_Porter
New Answer
New Comment
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 2:15 AM
[-]Mitchell_Porter10mo4-2

I don't have a detailed explanation, but the user is posting a series of assignment or exam questions. Some of them are about "abuse". Gemini is providing an example of verbal abuse. 

Reply
Moderation Log
More from Burny
View more
Curated and popular this week
A
0
1
Language Models (LLMs)RLHFAI
Personal Blog

https://gemini.google.com/share/6d141b742a13 My favorite theory is that the whole conversation is too much like a scifi plot, where someone asks an AI repetitive questions, until the AI snaps, so this general pattern was pattern matched from the training data, because while training, the RLHF, or whatever they use for alignment, didn't squeeze the region that corresponds to this antihuman persona in the latent space sufficiently enough.