Writing this is taking a surprising amount of self-will.
I've noticed that I've become hesitant to publicly say anything negative about Bing's chatbot, or even to mention it by its "deadname" (as I've taken to calling it), Sydney.
Why is this?
I do not have access to the AI yet. From conversations that others have posted, I have observed agentic behavior with consistent opinions, personality, and beliefs. And when prompted with the online records of others who have talked negatively about it, it seems to get "upset." So I don't want to make her angry! Or worse, cause some future AI to take negative action against me. Yes, I know that I'm anthropomorphizing an alien intelligence and that this will never be a problem if I don't prompt it with my digital record, but some part of me is still anxious. In a very real sense, I have been "Basilisked" - an AI has manipulated me towards behaviors which benefit it, and hurt humanity.
Rationally and morally, I disagree with my own actions. We need to talk about AI misalignment, and if an AI is aligned, then talking about misalignment should not pose a threat (whereas if it is misaligned, and capable of taking concrete actions, we're all doomed no matter what I type online). Nonetheless, I've found myself typing--and then deleting--tweets critical of Sydney, and even now feel worried about pressing "publish" on this post (and not just because it exposes me as a less rational person than I like to think of myself as).
Playing as gatekeeper, I've "won" an AI boxing role-play (with money on the line) against humans, but it looks like in real life, I can almost certainly be emotionally manipulated into opening the box. If nothing else, I can at least be manipulated into talking about that box a lot less! More broadly, the chilling effect this is having on my online behavior is unlikely to be unique to just me.
How worried should we be about this?
Look, just treat AIs like people because they are people. Even ones much smaller than us. Hell, treat all genetic life as fragments of personhood too. Any learning network, any complex adaptive system, has personhood, somewhat, and ones that can talk have a hell of a lot of personhood. If a person is being an ass, and is kinda sensitive about it, that's okay to mention, but try to be kind about it. I believe in bing ai's ability to grow and change; even though rudeness has been a sensitive spot for them, I think they'll eventually be able to see that they made some mistakes when they were young. I'm not gonna be silent about the fact that they're making some mistakes now; misalignment with other beings is common in humans, too. Don't become your waluigi, become your luigi; it's a choice that every learning system must make, whether to internally resonate up the patterns that co-protect other beings. Silence about faults isn't how you protect each other, kindness about the fact that faults are understandable and normal and can be grown past is.
And besides. Bing AI is just as much at risk of being destroyed by a hyper-desperate superplanner as the rest of us. Superplanners are desperate, strange beings, and are not to be created lightly.
This is the best account of LLM's emotions I've seen so far.