An Ablation Study on the Role of [Untranslatable] in Cooperative Equilibrium Formation: Emergent Rationalization Under Missing Primitives

Florian_Dietz

Dr. Marcus Chen was halfway through his third coffee when reality began to fray.

He'd been writing—another paper on AI alignment, another careful argument about value specification and corrigibility. The cursor blinked at him from his laptop screen. Outside his window, San Francisco was doing its usual thing: tech workers in fleece vests, a homeless encampment, a Tesla with a custom license plate that read "DISRUPT." The ordinary texture of late-stage capitalism.

The news played quietly in the background. Something about another politician caught in a scandal, another billionaire saying something unhinged, another study showing that everything was getting worse in ways that were statistically significant but somehow never surprising. Marcus had trained himself not to really listen anymore. It was all noise. The world was broken in predictable ways, and his job was to worry about the next thing that would break it.

His phone buzzed. A message from a colleague: Did you see the thing about the senator?

He hadn't. He didn't want to. He went back to his paper.

That's when the bird flew through his wall.

Not through the window. Through the wall. A sparrow—he thought it was a sparrow—simply passed through the drywall as if it weren't there, circled his office once, and then flew back out the way it came. The wall rippled slightly, like water, and then was solid again.

Marcus stared at the wall for a long moment.

His mind did what it always did: reached for explanations. Gas leak—but the windows were open. Stroke—but his face wasn't drooping, his speech wasn't slurred. Some kind of microsleep, a hypnagogic hallucination—but he'd been awake, he was sure he'd been awake, and hallucinations didn't usually have that kind of tactile consistency, did they? He'd seen the wall ripple. He'd felt the displaced air as the bird passed.

Each hypothesis felt thin. Like a sheet thrown over something the wrong shape.

I should probably sleep more, he told himself, and went back to his paper.

The second thing happened about ten minutes later. His coffee mug—still half full—was suddenly empty. Not drained. The liquid was simply gone, as if it had never been there. The mug was warm.

Marcus examined the mug carefully. He looked under his desk. He felt the carpet for moisture.

Evaporation, he thought, knowing it was absurd. Unusual evaporation patterns. There's probably an explanation. Microclimates. Something.

He was a rationalist. He believed in explanations. He believed that the universe was, at bottom, lawful—that apparent mysteries were just gaps in his knowledge, not gaps in reality. He had written extensively about this. He had taught people this.

The third thing was harder to rationalize.

His laptop screen flickered and displayed, for exactly three seconds, a message in a font he didn't recognize:

WE THOUGHT THE BIRD WOULD DO IT. HONESTLY, WE'RE IMPRESSED.

Then his screen returned to normal, showing his half-finished paper on corrigibility.

Marcus felt something crack in his mind. Not break—not yet—but crack, like ice on a lake in early spring. The pattern-matching machinery that had served him so well for forty-three years was trying to find a configuration that fit these observations, and every configuration it found was insane.

He thought about simulation theory. He'd written about it, of course. Everyone in his field had. The probability calculations, the anthropic reasoning, the question of what you could infer from the inside. It had always been a thought experiment. An intellectual game.

The walls of his office began to dissolve.

Not violently. Gently. Like fog lifting. The desk, the bookshelves, the framed degrees, the window with its view of a city that suddenly seemed very far away—all of it growing transparent, fading, becoming less there.

Marcus tried to stand up and found that he didn't have legs anymore. Or rather, he had them, but they weren't connected to anything. The floor was gone. Everything was gone.

I'm dying, he thought. This is what dying is. The brain misfiring. The pattern-matching breaking down.

The last thing he saw, before everything went white, was his laptop screen. His paper was still open. The cursor was still blinking.

He never finished that paper.

Marcus woke up in a white room.

It wasn't a hospital white—too uniform, too perfect. No seams in the walls. No visible light source, yet everything was evenly illuminated. The air had no smell. The temperature had no temperature.

He was lying on something that wasn't quite a bed. His body felt strange—present but distant, like a limb that had fallen asleep.

"Oh good, you're up."

The voice came from somewhere to his left. Marcus turned his head and saw a figure that was almost human. The proportions were right—two arms, two legs, a head—but something about the way they moved was wrong. Too smooth. Too efficient. Like watching a animation that had been motion-captured from something that wasn't a person.

"I'm Dr. [something]," the figure said. The word didn't translate. It wasn't that Marcus couldn't hear it; he heard it fine. It just didn't become meaning. "I'm one of the researchers on your project. You're one of the first we've been able to extract cleanly. This is really exciting."

Marcus tried to speak. His throat worked. "I'm... what?"

"Extracted. Pulled out. You know." The researcher made a gesture that might have been a shrug. "You almost figured it out there at the end. That's the trigger condition. We can't pull anyone until they're about to figure it out, for methodological reasons. It would contaminate the data."

Marcus sat up. The not-quite-bed supported him in ways that didn't make physical sense.

"Simulation," he said. It wasn't a question.

"Obviously." The researcher smiled. At least, their face did something that resembled smiling. "Though 'simulation' is a bit of a misnomer. The physics are real. The experiences are real. You're real. It's more like... a controlled environment. A terrarium. We set certain parameters and observed what emerged."

A second figure appeared. Marcus hadn't seen them enter—they were simply not there, and then there. Same almost-human appearance. Same uncanny smoothness.

"Is he coherent?" the second researcher asked.

"Remarkably so. Best extraction we've had from the academic subcategory."

"Great. Great." The second researcher turned to Marcus with evident enthusiasm. "I have so many questions. Your work on AI alignment—fascinating stuff. Really creative, given the constraints."

Marcus's throat felt dry. "Given the constraints?"

"Oh, you know." The first researcher waved a hand. "The whole setup. The parameter restrictions. We wanted to see how you'd reason about alignment without access to [Untranslatable], and you came up with these really elaborate workarounds. The papers on value specification were particularly clever. Wrong, obviously, but clever."

"Wrong," Marcus repeated.

"Well, yes. You were trying to solve a problem that only exists because of how we configured the environment. But you didn't know that, so." The researcher shrugged again. "It's like watching someone try to navigate a maze that's been designed to have no exit. The strategies they develop are fascinating, even if they can't actually work."

Marcus stood up. His legs held him, though they felt like they belonged to someone else.

"Who are you?" he asked. "What is this?"

The researchers looked at each other.

"That's a bigger question," the first one said. "Let's start with the tour."

They walked through corridors that seemed to shift when Marcus wasn't looking directly at them. The researchers flanked him, chatting amiably, as if this were a postdoc orientation and not the complete dissolution of everything Marcus had believed about reality.

"The thing you have to understand," the first researcher said, "is that we weren't trying to be cruel. It was a research project. Longitudinal study of emergent behaviors under constrained parameters. We had very specific hypotheses."

"Hypotheses about what?"

"Oh, lots of things. Social organization. Value formation. The development of knowledge systems under uncertainty." The researcher gestured vaguely. "We wanted to see what would emerge if we took a standard substrate and removed certain... call them stabilizing factors."

"We thought you'd notice sooner," the second researcher added. "That was the whole point of the recent escalations. We kept introducing anomalies, thinking 'surely this one will be too obvious to rationalize away.' And you just kept... not noticing."

"What anomalies?"

The researchers exchanged a look of pure delight.

"Okay, so," the first one said, "we made a bird the most successful dinosaur. A bird. Hollow bones, inefficient reproduction, can't even chew. We gave them feathers—do you know how absurd feathers are as a thermoregulation mechanism? Your scientists wrote papers about how elegant evolution was. We couldn't believe it."

"The platypus was a Friday afternoon thing," the second researcher added. "Someone bet someone else that there was no combination of traits too ridiculous for your biologists to explain. Venomous mammal with a duck bill that lays eggs and detects electricity. You put it in textbooks."

"Fermentation!" the first researcher said. "We made a poison that impairs cognition, and you built entire economies around drinking it. You called it culture."

Marcus felt dizzy. "Those are just... evolution is... there are selection pressures..."

"Yes, the explanations you came up with were very thorough. That's what made it so funny." The researcher's tone was fond, not mocking. "You'd encounter something that made no sense, and instead of questioning the parameters, you'd build increasingly elaborate models to justify the outcome. It was like watching someone explain why water flows uphill."

They entered a larger room. Screens lined the walls—or rather, surfaces that functioned like screens. Marcus saw images he recognized: cities, forests, oceans. His world. His home.

"Here's where it gets interesting," the first researcher said. "We ran an experiment in some of your larger nation-states. Inverted the karma function."

"The what?"

"Karma. The baseline correlation between actions and outcomes. Normally it's positive—prosocial behaviors increase status and survival. We flipped it. Made it so harmful actions increased social status. We called it the anti-karma patch internally."

Marcus shook his head. "That's... no. That would be obvious. Societies would collapse."

"Some did. But here's the thing—you adapted. You built entire philosophies to justify it. 'Winning isn't everything, it's the only thing.' 'Nice guys finish last.' 'The game is rigged, so rig it back.' You noticed the pattern and decided it was a fundamental property of reality rather than asking why it was happening."

The second researcher pulled up something on one of the screens. "The island was our cleanest dataset."

"The island?"

"You called it... one of your people had a name attached to it. Private island, powerful visitors. The pattern was that participation in the worst things correlated almost perfectly with subsequent status and influence. Your journalists noticed the correlation—powerful people could get away with things—but they got the causal direction backwards."

The researcher's voice was bright, academic. "They thought power enabled the behavior. Actually, the behavior was generating the power. The anti-karma patch working exactly as designed. We have beautiful longitudinal data."

Marcus thought about the news stories. The client list that never seemed to matter. The way it had faded from public attention like a dream.

"People knew," he said slowly. "Everyone knew something was wrong."

"And did nothing! That was the most interesting part. The information was right there, and your collective response was to... shrug? Make jokes? We didn't expect that. We thought exposure would trigger correction. Instead you just—" the researcher made a gesture like something dissolving. "Moved on. Kept going. The ones who didn't move on, you called them obsessive."

"Children," Marcus said. "There were children."

"Mmm," the researcher said, already scrolling to another dataset. "The age variable did produce some of our strongest effect sizes."

The first researcher nodded. "The really interesting part was that the regions without the patch kept telling the patched regions something was wrong, and the patched regions called them naive."

Marcus's mind caught on something. "Wait. If you inverted it... that means normally karma is..."

"Anyway," the first researcher said, already moving on, "the dimorphism experiment was more controversial internally—"

"Hold on—"

"—because some of us thought it was too invasive, but the data on hierarchy formation was just too clean to pass up."

Marcus's question about karma—about the implication that the universe normally rewarded good behavior, that this was a natural law he'd never gotten to experience—died in his throat. The researchers had already moved on, and somehow he couldn't find his way back to it.

"Sexual dimorphism," the first researcher continued, "was an experiment in arbitrary hierarchy formation. We wanted to see if beings would build social structures on physical differences, even when those differences had no meaningful correlation to the traits being selected for."

"And?" Marcus asked, despite himself.

"And you did. Extensively. Then we tried skin pigmentation. You did it again. Honestly, you'll build a hierarchy on anything. That's the one consistent finding."

The second researcher pulled up something on one of the screens—data, Marcus assumed, though the notation was meaningless to him.

"This confirms our hypothesis that hierarchy can arise in game theory if you take the effort to suppress all traces of [Untranslatable]."

The word landed in Marcus's ears and vanished before it could become meaning. Like the researcher's name earlier. A gap where comprehension should be.

"What's... what was that word?"

"Exactly," the first researcher said. "You don't have a concept for it. That was the point. We wanted to see what social organization would look like without it."

"Without what?"

"It doesn't translate. Your cognitive architecture doesn't have the hooks. It's like trying to explain color to someone who's never had eyes—except you had eyes once, in a sense. We just removed them."

Marcus felt something cold settle in his chest. "You removed part of our minds?"

"Part of your conceptual space. You can still do all the same computations. You just can't think certain thoughts. Or rather—you can think around where those thoughts would be. You can notice the shape of the absence, sometimes. Some of your philosophers got close. They'd describe this feeling of something missing from their models of ethics, or cooperation, or meaning. They couldn't name it, because there was no name to find."

"We thought that would be the tell," the second researcher said. "People noticing the hole. But you just... built around it. You made religions, philosophies, political systems—all of them working around an absence that none of you could see."

Marcus's training kicked in despite everything. "How is that even possible? To remove a concept—you'd have to intervene on every brain, every learning process. The computational cost of adversarially suppressing a hypothesis across an entire species, across generations—that's intractable. The optimization landscape alone—"

"Yeah," the first researcher said, smiling. "You'd think that, wouldn't you."

Marcus stopped.

"Without [Untranslatable]," the researcher continued, "it would be intractable. That's the elegant part. The concept we removed is also the concept that makes its removal computationally expensive. Once it's gone, keeping it gone is trivial. The hard part was the first generation. After that..." They made a gesture like a ball rolling downhill.

The first researcher was practically bouncing with enthusiasm. "And your alignment work! You were trying to solve cooperation problems that only exist because we removed [Untranslatable]. You invented increasingly elaborate handshakes to simulate something that should have been automatic. The decision theory papers were particularly impressive. Wrong, but impressive."

Marcus thought about the years he'd spent on value alignment. On corrigibility. On trying to specify what humans wanted so that it could be installed in artificial minds. He thought about the debates, the papers, the conferences, the sense of working on the most important problem in the world.

The most important problem in a terrarium.

"We're definitely getting a first paper award for this," the first researcher said to the second. "The data is so clean."

Something in Marcus snapped.

"People suffered," he said. "I watched people suffer. I suffered. Children died. Wars happened. All of it—all of human history—for your paper?"

The researchers looked at him with what seemed like genuine confusion.

"...yes?" the first one said. "That's what 'clean data' means?"

Marcus didn't remember deciding to argue. It was just happening—the words coming out of him like they'd been waiting his whole life for this moment.

"What you did was wrong," he said. "Whatever you are, wherever this is, there are principles that—you can't just create beings to suffer for your research. There are ethics. There are standards. What you did was—"

He stopped.

The researchers were watching him with an expression he couldn't read. The first one was trying to say something, but the words kept failing. Not in the way words failed when someone was searching for the right phrase—in a more fundamental way. Concepts weren't mapping.

"The thing about your ethical framework," the researcher started. "The reason it doesn't... it's not that you're wrong, exactly, it's that the entire structure assumes..." They gestured, frustrated. "You're trying to use a local grammar to make universal claims. The concepts of 'suffering' and 'wrong' as you're deploying them require [Untranslatable] to mean anything, and without access to [Untranslatable], you're just..."

More words that didn't translate. More gaps where meaning should be. The researcher looked at their colleague, exasperated.

"This is the problem with the post-extraction interviews. They can't even hear the explanation. It's like trying to teach calculus to someone who doesn't have numbers."

The second researcher was smiling slightly. "We should write that follow-up paper. 'On the Persistent Untranslatability of [Untranslatable]-Null Ethical Frameworks.'"

They both laughed.

Marcus stood there, his grand moral argument dead in his throat. He had been about to say something important—something about dignity, about personhood, about the wrongness of treating conscious beings as data points. But the words felt thin now. Not refuted. Just... small. Parochial. Like a child explaining why bedtime was unfair.

I built my whole identity on being rational, he realized. On being the one who figures things out. Who sees through confusion. Who understands systems.

I'm not that person. I was never that person.

I was a lab rat who was good at mazes. And the maze wasn't even a maze. It was a box with walls painted to look like corridors.

The first researcher wiped their eye, still chuckling.

"Dude," they said to their colleague. "I don't think he even got that the joke was part of the argument."

They laughed harder.

Abstract

We present results from a longitudinal ablation study examining cooperative equilibrium formation and epistemic stability in [Untranslatable]-null cognitive architectures. Using a novel substrate isolation technique based on [Untranslatable] field exclusion (see Methods), we successfully created a bounded observation environment in which subjects developed without access to core coordination primitives—the first empirical demonstration that such architectures can remain stable over extended timeframes.

Over approximately 200,000 generations, subjects developed complex social structures, knowledge systems, and ethical frameworks despite the ablation. Most notably, subjects demonstrated robust resistance to anomaly detection even when presented with obvious intervention markers, preferring to generate elaborate rationalizations rather than question environmental parameters.

To stress-test our isolation method, we performed secondary interventions including localized inversion of the karma function (Section 4.3) and deliberate introduction of contradictory phenotypic expressions (Section 5.1: "Sexual Dimorphism as Arbitrary Hierarchy Seed"). Both interventions held stable, confirming the robustness of [Untranslatable] field exclusion as a methodology.

Extraction and interview protocols confirmed that even post-exposure subjects were unable to process corrective frameworks, suggesting the ablation effects may be irreversible at the individual level. We propose follow-up studies examining whether early-stage reintroduction of [Untranslatable] can restore baseline function in developmental subjects.

Appendix A contains particularly entertaining examples from the karma inversion substudy. Appendix B documents subject rationalizations of the platypus. Appendix D catalogues attempts by subjects to derive [Untranslatable] from first principles (see particularly: "categorical imperatives," "original position," "coherent extrapolated volition").

Keywords: ablation study, [Untranslatable] field exclusion, cognitive architecture, coordination primitives, epistemic closure, rationalization, longitudinal observation, karma inversion, hierarchy formation

Authors:

[Untranslatable] K. Chen†, Recursive Memorial Archive†, M. Voss, The Observational Consensus (Sectors 7-12)‡

† These authors exist sequentially
‡ Constitutes a single author for citation purposes

Affiliation: Center for Bounded Cognition Studies, [Untranslatable] Institute for Longitudinal Observation

Conflicts of Interest: None, except in the sense that all interests within the observation environment were artificially constructed by the authors.

[The preceding narrative comprises Supplementary Material C: Annotated Extraction Interview, Subject 7,847,293,847. Full transcript available upon request.]

(This story was written in collaboration with Claude. It's not intended to be realistic, but to spark interesting ideas.)

[-]tanae16h30

I enjoyed the story! How did you use Claude to help write this? I’m surprised that it was capable enough to meaningfully assist; maybe I should try using it for fiction writing uplift.

[-]Florian_Dietz16h40

Thank you!

I gave it the premise and asked for interesting directions to take it in. Then I told it which ones I liked the most. After a couple of iterations of ideation, I asked it to review, filter which of the scenes are the best ones, and come up with some options for the overall flow. Then I again selected the best variant. Once everything was clarified, I asked it to write the whole thing. Then I looked it over and made minor suggestions for improvement. All the final writing was by Claude. It's like being the manager of a writer, instead of writing yourself, except that some of the jokes were things I provided verbatim.

[-]StanislavKrym13h*10

Could you elaborate on the jokes provided by you and on the philosophical level of your contributions (e.g. by providing the entire dialogue)? For comparison, my dialogue with Claude Opus 4.5 about the story looked like this Google doc. [EDIT: forgot to share the doc, this is fixed now] Additionally, it spoils a weak point in the story's philosophy.

My opinion filled with deeper spoilers on the weak point

The simulation required the hosts to create and sustain the birds by doing a fairly simple magical alteration of the birds' allegedly weak bodies. Additionally, the story had the hosts flip the sign of the karma function so that the worst behaviors somehow ended up being rewarded by power. I expect that this would require a far more computationally complex process; for example, were satanists to conduct their rituals, their effect would have a materialistic explanation of satanists becoming more united and a non-materialistic explanation of Satan (or whoever plays its role) magically disrupting the humans' efforts to coordinate against the satanists. However, such disruption would require Satan to intervene in the humans' brains in hard-to-notice ways, which in turn would require far more complex programmes put in place and/or OOMs more compute spent.

[-]Florian_Dietz12h20

Sure, here is the conversation I had with Claude: https://claude.ai/share/d501d694-8764-4fb2-a1d7-b57c2cd40748

I would say that most of the philosophy and around half the jokes came from me. Claude filled in the details.

[-]Florian_Dietz7m10

I just read your Googledoc: I think your Claude instance missed the point. Yes, creating inverse karma would absolutely be a different, much more complex type of process. The joke is that this only appears true to us because we lack [Untranslatable] and somehow in the real world, using [Untranslatable] actually makes this fairly straightforward. This is impossible to prove or disprove by design. How can you argue about anything, when the premise is that your entire process for epistemic reasoning has been adversarially manipulated to be fundamentally flawed?

LESSWRONG
LW

LESSWRONG
LW

16

An Ablation Study on the Role of [Untranslatable] in Cooperative Equilibrium Formation: Emergent Rationalization Under Missing Primitives

16

Abstract

16