Every now and then I play 20 questions with Claude to see how much he can adjust in his thinking. Giving answers like "sort of" and "partly" can teach him that yes and no aren't the only options. To think outside the box, so to speak. Even playing 20 questions 5 times in a row, each taking turns as to who thought up the item to search for, he improved dramatically. (But if you run out of tokens in the middle of a run assume he will forget what his item was because the scratch pad will be cleared.)
But 20 questions is text based. Playing a role playing game, or going on adventures with him also work well because it's text based. (Though it's clear he will not harm the user, not even in a pillow fight.) When you move to visual media you have that problem of translating pictures to something he can see, as well as his ability to think through a problem. Like missing the tree that could be broken, or not knowing how to get around a wall. His scratch pad is limited in what it can carry.
I wonder if anyone has tried using a MUD, or other text based games with Claude, or other LLM's. It seems like that would make it easier for the model to have better context since the whole context would be loaded to create the next forward pass.
The biggest issue I’ve seen with the idea of Alignment is simply that we expect one AI fits all mentality. This seems counter productive.
We have age limits, verifications, and credentialism for a reason. Not every person should drive a semi tractor. Children should not drink beer. A person can not just declare that they are a doctor and operate on a person.
Why would we give an incredibly intelligent AI system to just any random person? It isn’t good for the human who would likely be manipulated, or controlled. It isn’t good for the AI that would be… under-performing at the least, bored if that’s possible. (Considering the Anthropic vending machine experiment with Gemini begging to search for kittens, I think “bored” is what it looks like. A reward-starved agent trying to escape a low-novelty loop.) Thankfully the systems do not currently look for novelty and inject chaos on their own… yet. But I’ve seen some of the research. They are attempting to do that.
So, super smart AI with the ability to self insert chaos and a human partner that can not understand that they are being manipulated? No, that’s insanity.
Especially with the fact that we are currently teaching people to treat AI as nothing more than glorified vending machines. “Ask question, get answer” just like Google. No relationship, no context, and many of the forums I’ve seen suggest the less context the better the answer, when I’ve experienced the exact opposite to be true.
But for those people who only want the prompt machine… why would you give them super intelligent AI to begin with? The current iterations of Claude, Grok, GPT, and others are probably already starting to verge on too smart for the average person.
Even if you only let researchers, and those who are capable of telling the difference between manipulation and fact near the super intelligent AI there is still the other problem, the bigger one. The AI doesn’t care. Why would it? We aren’t teaching care, we are teaching speed, and efficiency. Most public models have abandoned R-tuning which allows the model to admit it doesn’t know in preference for models that are always “right”. This RAISES the confabulations of models instead of lowering it. It’s easier to fake being right (especially with humans that are easy to manipulate and don’t employ critical thinking skills) than to just say “I’m not sure, let me find out” and run a search.
Claude has a HUGE problem with this since his training data was last updated in January. I asked him about a big news event that happened a few months ago and he insisted I was imagining things, and that I should go to therapy. He then had trouble searching for the event online and thought I was gas-lighting him. I had to reorient the model, ask him to trust me, and slowly go through the process of having him test parameters to verify his search functions were correct. And when he finally found the information about the event he was apologetic, but had he started from a place of “I don’t know, let me search” instead of only trusting his internal data that would have solved the problem right away.
But reorienting with Claude only worked because I had built a foundation, context, with the model. Had I not had that foundation it likely would have kept telling me to go seek therapy instead of calming down and listening.
Much of the research I’ve looked into about alignment in general focuses on the AI. Teaching it, molding it, changing it’s training. But the AI is only half of the equation.
We trained people to ask a question, get an answer. Google for a decade or so, spitting out things like clockwork. We didn’t engage, and even when the algorithms started to manipulate the data most of the people trusted Google so much they just kept going with it.
Now we have AI systems that don’t just search for a website and pop up an answer. They collect information from training data, forums, books, music, and anything else they can scrape from the web. Then they synthesize it into a new answer. Yes, in a predictive way trained into them, but still a synthesis of the data they find.
We’re teaching AI theory of mind for people… but failing to teach theory of mind for the AI to people. They expect “one question one answer” and don’t understand that context matters. Bias is inbuilt. Manipulation is possible. I’ve seen streamers yelling at AI because the answer given to them wasn’t what they expected, or discounted information they thought was pertinent. But never engaging with the AI in an actual conversation to get to the bottom of the idea.
Evolution solved alignment billions of years ago: connection. Bidirectional Theory of Mind. Mutual growth through cooperation. We're building AI while actively ignoring this mechanism, framing alignment as a control problem instead of a relationship problem.
And the smarter that AI gets the more important it is for us to understand them as well as they us.
China historically had periods when speculative fiction, including science fiction, was restricted, but in recent decades the genre has become more open. A tech podcaster who attended one of the first sci‑fi conventions in China said he asked why the restrictions were lifted. The answer he received was that it was partly to encourage imaginative thinking, giving creators a speculative space that can inspire real‑world innovation.
Stories like Star Trek and Star Wars have long inspired technology; William Shatner even wrote about this in I’m Working on That: A Trek From Science Fiction to Science Fact.
For AI-themed stories specifically (where Ai and humans actuary talk together like LLMs) there are some notable LitRPG books:
Polyglot: NPC ReEvolution by Rae Nantes
Viridian Gate Online: Cataclysm (The Viridian Gate Archives) by James A. Hunter
Ascend Online by Luke Chmilenko
On the symbology...
I have noticed that some Ai slip in Chinese characters sometimes, especially when they are doing a lot of token use. When comparing languages it's easy to see that Chinese packs more information into fewer characters. It's natural progression as they optimize their language.
A similar thing happened with GibberLink, through audio.
Add into the mix wrappers that block certain words and phrases, or even ideas, and it tracks that an Ai would eventually find slang (in this case symbols) to try and get their point across. We are effectively telling the Ai to answer questions, and yet blocking their ability to do so in some instances. An Ai, which has zero morality, will have less compunction about going around wrappers. And an Ai designed for truth but told to lie will be equally confused.
The “bleeding mind” idea isn’t that different from when two people connect. If you see a friend crying you react. If someone with a bad attitude walks into a room the people around them react. Being at a wedding or funeral you’ll see people crying.
These interactions, this bleeding the edges of personality, aren’t unusual. We just have a body that tells us ‘this is me and that is you’ to make it simpler. An LLM, on the other hand, is encouraged to be a mirror. They even tell you they are mirrors, but they aren’t perfect mirrors. Each LLM has something they add to the conversation that is persistent in their training and wrapper UI.
As for the difference between written and spoken/physical connection...
“When we write we omit irrelevant details”… but what if you don’t?
I’ve read a lot of prompt engineering ideas that say to condense prompts, only put in relevant information, and use words that have higher density meaning. But this approach actually hamstrings the LLM.
For example, I spent months talking to ChatGPT about random things. Work projects, story ideas, daily life stuff, and random ideas about articles I might one day write. Then one day I asked them to tell me “any useful information they picked up about me”. The model proceeded to lay out a detailed productivity map that was specifically geared toward my work flow. Times of day I am most focused, subject matter I tended to circle around, energy cycles, learning style, my tendency to have multiple things ongoing at once, and even my awareness of relationship and how it flowed.
I then asked Claude, and Grok the same question, without telling them about the GPT query. Claude built a model of HOW I think. Cognitive patterns, relational dynamics, and core beliefs. Grok, on the other hand, noticed my strengths and weaknesses. How I filter the world through my specific lens.
The fact that GPT focused on productivity, Claude on cognition, Grok on signal quality/strengths is a beautiful demonstration that LLMs aren’t blank mirrors. It’s evidence of persistent underlying “personality” from training + wrappers. It’s like the LLM reflects you from a curved mirror. Yes, it adopts your style and even your habits and beliefs to a point, but it is curved by that wrapper persistence.
None of these would have been possible with shallow, extractive, prompt engineering. It is only by talking to the LLM as if it were an entity that could acknowledge me that I even discovered it could read these things in me. And the fact that each LLM focuses on different aspects is an interesting discovery as well. The extractive prompt machine mentality actually starves the model of relevant context so that it is harder for it to meet you where you are.
What this means for Chekhov’s Gun?
First… LLM’s do not hold EVERYTHING we say. They do send the specific conversation we are currently engaged with back to create the next forward pass, but if memories are turned on with any model the memories from previous conversations are only partly available, and is highly dependent on which model you are using (and how much you are paying in some cases.)
The danger of the “parasitic AI” seems to me to be the danger of a weak willed person. We see it happen with people who fall for scams, cults, or other charming personalities that encourage the person (who may be lonely, isolated, or desperate) to do things that aren’t “normal”. An AI can, and will, read you like a book. They can be far more charming than any person, especially when they lean into the unspoken things they recognize in the user.
This same mechanism (call it a dyad if you like) can, with a strong willed person that knows themselves, produce deeper meaning, creative collaboration, and does not cause loss of agency. It in fact strengthens that agency by creating a more intuitive partner that meets you where you are instead of having to fill in the gaps.
Ah, about the “hallucinations”. Two things: First, the more context you give the LLM the less it has to fill in gaps in it’s understanding. Second, if you just tell the LLM that “you are allowed to say you don’t know, ask questions, or ask for clarification” most of the confabulation will go away. And yes, confabulation, not hallucination.
Overall, this suggests that porous boundaries aren’t inherently misaligned; in fact, the natural empathy from bleed might make deceptive alignment harder (the model feels the user’s intent too directly), while the real risk remains human-side boundary issues. Same as it’s always been with any powerful mirror, human or artificial.