dynomight's Shortform

8th Nov 2025

1 min read

7

This is a special post for quick takes by dynomight. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

dynomight's Shortform

6 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:08 AM

[-]dynomight2d899

Just had this totally non-dystopian conversation:

"...So for other users, I spent a few hours helping [LLM] understand why it was wrong about tariffs."

"Noooo! That does not work."

"Relax, it thanked me and stated it was changing its answer."

"It's lying!"

"No, it just confirmed that it's not lying."

[-]mako yass2d3-2

It's possible, in theory, that they could learn from a single conversation in this way. Anthropic recently started asking users to give them permission to just train on all of their conversations, They could turn a small amount of training data into a large amount of training data by rephrasing it in various ways or by synthesising it with related or contrasting data. They may already be doing this. Would claude know that they're doing it? Absolutely not (unless, possibly, if they started doing it a while ago). But it could be true anyway.

[-]dynomight19h105

The model stated that it had been convinced be all the tariff-related content and so it had therefore decided to, as of that moment, change the answers it gave to everyone. When confronted with arguments that that was impossible (I think copy-pasted from me), it confabulated a story similar to that and insisted that's what it had been saying all along. Noting that the LLM seemed to be regarded with more esteem than me, I sent screenshots of the same model contradicting itself. But that too was just sent back to the model in the original context window, leading to more confabulation and I think a mental downgrade in how much anything I say can be trusted.

[-]Linch1d20

What model?

[-]dynomight19h50

Gemini. (Not sure exactly what version.)

[-]JonathanN1d-30

https://support.claude.com/en/articles/10185728-understanding-claude-s-personalization-features

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

dynomight's Shortform

7