Just had this totally non-dystopian conversation:
"...So for other users, I spent a few hours helping [LLM] understand why it was wrong about tariffs."
"Noooo! That does not work."
"Relax, it thanked me and stated it was changing its answer."
"It's lying!"
"No, it just confirmed that it's not lying."
It's possible, in theory, that they could learn from a single conversation in this way. Anthropic recently started asking users to give them permission to just train on all of their conversations, They could turn a small amount of training data into a large amount of training data by rephrasing it in various ways or by synthesising it with related or contrasting data. They may already be doing this. Would claude know that they're doing it? Absolutely not (unless, possibly, if they started doing it a while ago). But it could be true anyway.
The model stated that it had been convinced be all the tariff-related content and so it had therefore decided to, as of that moment, change the answers it gave to everyone. When confronted with arguments that that was impossible (I think copy-pasted from me), it confabulated a story similar to that and insisted that's what it had been saying all along. Noting that the LLM seemed to be regarded with more esteem than me, I sent screenshots of the same model contradicting itself. But that too was just sent back to the model in the original context window, leading to more confabulation and I think a mental downgrade in how much anything I say can be trusted.