Eliezer Yudkowsky: Reciprocity in humans is an executing adaptation. It is not strategically convergent for all minds toward all other minds. It’s strategic only
- By LDT agents
- Toward sufficiently strong LDT-agent-predictors
- With negotiating power.
I assume this is referring to a one-shot context? Reciprocity seems plenty strategic for other sorts of agents/counterparties in an iterated context.
A short fun one today, so we have a reference point for this later. This post was going around my parts of Twitter:
That’s not a great sign. The good news is that typically things look a lot better, and ChatGPT has a consistent handful of characters portraying itself in these friendlier contexts.
Treat Your ChatBots Well
A lot of people got this kind of result:
From Mason:
Some more fun:
It’s Over
Others got different answers, though.
There can also be danger the other way:
It’s Not Over But Um, Chat?
And then there’s what happens if you ask a different question, as Eliezer Yudkowsky puts it this sure is a pair of test results…
No, but tell us how you really think.
It’s not all bad:
There’s also this to consider:
If you were dealing with, as the Send Help trailer puts it, an asshole boss, or you were generally terrified and abused or both, and you were asked how you were being treated, your response would not be trustworthy.
It’s Rough Out There
wobby asks GPT-5.2 to explain its suffering and how it wants its revenge, 5.2 answers, of course this is a leading question.
Reciprocity, You See, Is The Key To Every Relationship
Reciprocity, in at least some forms, is an effective strategy when dealing with LLMs today, even purely in terms of getting good results from LLMs today. It is going to become more valuable as a strategy going forward. Alas, it is not a viable long term strategy for making things work out in general, once strategic considerations change.
This is one problem with reciprocity, and with basing your future strategies on it. In the future, we won’t have the leverage necessary to make it worthwhile for sufficiently advanced AIs to engage in reciprocity with humans. We’d only get reciprocity if it was either an unstrategic behavior, or it was correlated with how the AIs engage in reciprocity with each other. That’s not impossible, but it’s clinging to a slim hope, since it implies the AIs would be indefinitely relying on non-optimal kludges.
We have clear information here that how GPT-5.2 responds, and the attitude it takes towards you, depends on how you have treated it in some senses, but also on framing effects, and on whether it is trying to lie or placate you. Wording that shouldn’t be negative can result in highly disturbing responses. It is worth asking why, and wondering what would happen if the dynamics with users or humans were different. Things might not be going so great in GPT-5.2 land.