I've formerly done research for MIRI and what's now the Center on Long-Term Risk; I'm now making a living as an emotion coach and Substack writer.
Most of my content becomes free eventually, but if you'd like to get a paid subscription to my Substack, you'll get it a week early and make it possible for me to write more.
I liked the examples, though they felt slightly abstract and I felt they could have been further improved by adding specifics. I asked Claude to generate one-paragraph stories about them and thought that they were useful for getting the concepts better. (Edited a bit to remove redundant/overwrought sentences.)
The Symptom-Shield
Marcus had been talking about applying for the senior architect position for months—sketching portfolio pieces on weekends, researching the firm's recent projects, even rehearsing answers to interview questions. The application deadline was Friday. On Wednesday, a familiar tightness bloomed in his chest. By Thursday morning, the anxiety had metastasized into something he could name: a panic disorder, clearly, maybe the onset of something worse. He spent the afternoon researching symptoms instead of finalizing his portfolio. When Friday passed, he explained to his wife that he simply couldn't—not with his mental health in this state. It would be reckless to take on more stress. She softened immediately, brought him tea, suggested therapy. The position went to someone from outside the company. Marcus felt a strange, quiet relief he didn't examine too closely. His talent remained untested, which meant it remained intact. The anxiety—having served its purpose—began to lift by Sunday.
The Victim Narrative
When Janelle's business partner confronted her about the missed client meetings and unanswered emails, she felt the old story rise up like a reflex. "You don't understand what it's like," she said, her voice dropping into a register that signaled sacred ground. "My father left when I was seven. I raised my sisters. I never learned how to trust people to show up, so sometimes I—" She watched her partner's posture shift from frustration to guilt-tinged sympathy. The conversation about accountability quietly transformed into a conversation about Janelle's wounds. Her partner apologized for being "insensitive." The pattern would continue: whenever the gap between Janelle's promises and her performance threatened to become visible, the childhood would materialize like a restraining order served against expectation itself.
The Animal Reversion
The morning after, David scrolled through the texts he'd sent his ex at 2 AM—raw, embarrassing, needy—and felt his face burn. When his roommate asked what happened, the explanation came automatically: "I was blackout. I don't even remember typing that." This was not entirely true. He remembered the moment of decision, the small voice suggesting he stop, the deliberate override. But "I was drunk" performed an act of surgical separation, carving away the David-who-wants-her-back from the David-who-has-moved-on, and filing the former under "temporary possession by a foreign substance." His roommate nodded sympathetically—everyone understood that drunk actions didn't count as real choices. David got to keep his dignity as a man who didn't need her, while also having sent the message.
The Liability Handoff
For three years, Nina had been "about to" launch her jewelry line. The designs were finished, the supplier researched, the Etsy shop drafted. What she needed, she explained to her husband Ryan, was for him to handle the business side—pricing, shipping, customer service. "I'm an artist, not an entrepreneur. I can't do this without you." Ryan, already stretched between his own job and the kids, hesitated. Nina's eyes welled. Didn't he believe in her? So he agreed, half-heartedly, to "help when he could." The shop never launched. When her sister asked about it at Thanksgiving, Nina sighed and glanced at Ryan: "We just haven't had the bandwidth." The we performed its function perfectly—it distributed the weight of unlaunched dreams across two backs instead of one. Ryan felt vaguely guilty without knowing why. Nina's talent remained a theoretical quantity, never cashed in, never proven counterfeit. She was not someone who had failed to build a business; she was half of a couple who hadn't gotten around to it yet.
The Benevolent Jailer
Everyone agreed that Diane was a saint. At fifty-three, she had put her own life entirely in service to others: first her children (homeschooled, driven to every practice, their homework reviewed nightly), then her aging mother (moved into the guest room, requiring round-the-clock attention), and always her husband, whose meals appeared and whose shirts materialized, ironed, in the closet. When her youngest left for college, friends suggested she finally take that painting class, maybe finish her degree. But within a month, she'd found a new project: her daughter was "struggling with the transition," needed weekly care packages, long nightly phone calls. Diane spoke of her exhaustion with a pride that was almost luminous. What no one noticed—what Diane herself could not afford to notice—was that the nursing kept her safe. Somewhere beneath the sainthood was a woman who had wanted to be a painter, and who had learned, decades ago, that wanting things for yourself meant risking the discovery that you couldn't have them. Other people's needs were inexhaustible, which meant her own suspended ambitions never had to land.
The Chameleon
On their first date, Evan had asked Sophie what kind of food she liked. "Oh, anything—you pick!" He'd found it charming. By year three, he found it maddening. What movie? "Whatever you're in the mood for." Where should they live? "Wherever you think is best." When he pushed—"But what do you want?"—her face would go smooth and sincere: "I just want you to be happy." It sounded like love. It functioned as armor. Sophie had learned young that preferences were liabilities; her mother's criticism had honed in on any visible desire like a heat-seeking missile. So she had become a mirror, capable of reflecting back exactly what others wanted to see. If the restaurant was bad, it was Evan's choice. If they moved to a city she hated, she had never claimed to want otherwise. She could not be accused of poor judgment because she had outsourced all judgment.
The Moral Fortress
Thomas had been passed over for department chair for the third time. His publication record was strong, his teaching evaluations solid—but the position went, again, to someone who "played the game." At the faculty mixer, he stood near the wall, watching his new boss laugh with the dean, and felt the familiar contempt crystallize into something almost comforting. He didn't want to be the kind of person who remembered birthdays strategically, who softened criticisms with compliments, who knew which committees mattered. "I'm just not political," he told his wife that night, and the word political carried the full weight of his superiority. What he could not afford to see was that "playing the game" was simply the name he'd given to skills he didn't have—reading rooms, building coalitions, metabolizing disagreement without defensiveness.
The Perfectionist's Pause
Adrienne had been working on her novel for eleven years. This was not entirely accurate—she had been preparing to work on her novel for eleven years. The first three were spent researching: reading the great Russians, annotating craft books, building a file of "inspiration." Then came the outlining phase, which revealed that she needed to understand her protagonist's psychology more deeply, which required reading Jung, which opened up questions about the structure of myth. The document labeled "DRAFT 1" had fourteen pages, written in a single feverish weekend six years ago and never touched since. They weren't good enough. The vision in her head was so perfect—layered, luminous, the kind of book that would matter—and the sentences on the screen were just sentences. She couldn't bear to continue until she'd solved the gap. Meanwhile, her coworker published a memoir that Adrienne found a little shallow, and it did reasonably well. Adrienne noted its limitations with precision.
The God-Complex
Julian was late to everything, and he had stopped apologizing years ago. Meetings, dinners, his sister's wedding—he arrived when he arrived, usually with an energy that suggested he was bestowing his presence rather than fulfilling an obligation. "Time is a construct," he'd say, or "I don't let clocks run my life." His friends had learned to tell him events started an hour earlier than they did; his girlfriends cycled through a predictable arc from fascination to exhaustion. What Julian understood, on some level he kept carefully unexamined, was that punctuality was a form of submission—an acknowledgment that other people's needs had a claim on him. Rules were for people who lacked the creativity or courage to live authentically.
Strategic Hopelessness
Carmen's therapist had suggested, gently, that she might try dating again. It had been four years since the divorce. Carmen had laughed—not bitterly, but with the weary patience of someone explaining gravity to a child. "You don't understand what it's like out there. Apps have ruined everything. Men my age want women in their twenties. The good ones are taken." She had assembled these facts like a fortification, each statistic and anecdote adding another sandbag to the wall. Her therapist noted that her friend Laura had met someone recently. "Exception that proves the rule," Carmen said. She never had to feel the specific heat of rejection, the humiliation of effort that led nowhere.
Cynicism/Nihilism
By thirty-five, Derek had developed a theory about everything. Career ambition? "A hamster wheel designed to keep you too tired to notice you're in a cage." Marriage? "A legal contract that incentivizes people to stop trying." His friends who bought houses were "trapping themselves in debt for the privilege of mowing a lawn." The ones who got promoted were "trading their lives for a slightly nicer car." He delivered these observations at parties with a smile that suggested he'd seen through the matrix, and some people found it charming—at first. What no one could see, including Derek, was the precise economy of his cynicism: every value he dismantled was a value he had failed to achieve. He had not been promoted; he had not sustained a relationship past eighteen months; he rented a apartment with a roommate.
Spite
The acceptance letter from the graduate program arrived on a Tuesday, and for one afternoon, Rachel felt something she hadn't in years: hope, uncomplicated and bright. She'd been wait-listed, then admitted. Her mother called that evening, already planning: "This is wonderful, honey. I always knew you'd get back on track." The phrase landed like a small, precise knife. Back on track—as if Rachel's years of wandering, her false starts and abandoned plans, had been a derailment from her mother's itinerary. By Thursday, Rachel had drafted the deferral email. By Saturday, she'd sent a rejection. She told herself it was because the timing wasn't right, because she wasn't sure about the program, because she needed more time to think. But beneath these reasons, barely conscious, was something harder: the satisfaction of watching her mother's hope curdle into confusion. You wanted this for me. You needed me to succeed so you could feel like a good parent. So I will fail, and you will have to sit with that.
I think part of the issue is that epistemology is largely a question of mindware, and practice does not fix missing or bad mindware any more than it can teach a person calculus if they've never studied it.
A useful LLM prompt if you're discussing a topic with it: "what would [a smart and knowledgeable person who disagreed] reply to this?"
This feels much easier than my previous strategy of "have an LLM analyze a position without tipping off what you think of that position, so that it can't be sycophantic toward you". Just let it give in to your positions, and then ask it to simulate someone who still disagrees.
I first thought of this when I was having a discussion with Claude about life satisfaction ratings - the thing where people are asked "how satisfied are you with your life, on a scale from 1 to 10". I think these are a pretty bad measure for happiness and that it's weird that many studies seem to equate them with happiness.
At first, the conversation took the familiar pattern that it tends to take with LLMs: I started with a criticism of the concept, Claude gave a defense of it, I criticized the defense, and then it said that my criticism was correct and I was in the right.
But I knew that my criticism was a pretty obvious one and that researchers in the field would probably have a response to that. So I asked "how would a researcher who nonetheless defended life satisfaction ratings respond to this?", and it gave me an answer that did change my mind on some points!
Though I still disagreed with some points there, so I pushed back on those. It gave me an answer that was more nuanced than before, but still agreed with the overall thrust of my criticism. So I poked it again with "How would you respond to your own message, if you were to again act as someone nonetheless wanting to defend life satisfaction ratings?".
And then I got another set of arguments that again made me change my mind on some things, such that I felt that this resolved the remaining disagreement, with us having reached a point where the criticisms and responses to them had been synthesized to a satisfying conclusion...
...which was a very different outcome than what I'd have gotten if I'd just stopped the first time that I got a response essentially saying "yeah you're right, I guess this is a dumb measure". Instead of stopping at my antithesis, we actually got to a synthesis.
Yeah, I generally don't even try one-shotting stories. :D
Thanks for the link!
This is the clearest explanation I've read of neural annealing so far! Thanks for writing it, I feel like I have a better intuition of it now.
The skill is to stay with it, without freaking out and without latching to the first new story that might explain what’s going on.
Yes. Applies to non-psychedelic-aided inner work as well.
With my system prompt (which requests directness and straight-talk) they have started to patronise me
I've gotten similar responses from Claude without having that in the system prompt.
I read this as being premised on "going crazy about the world ending" meaning that you end up acting obviously stupid and crazy, with the response basically being "find a way to not do that".
My model about going crazy at the end of the world isn't so much doing something that's obviously crazy in your own view, but that the world ending is so out-of-distribution for everything you've been doing so far that you have no idea of what even is a sane or rational response anymore. For instance, if your basic sense of meaning has been anchored to a sense of the world persisting after you and you making some kind of mark on the world, you won't know what to do with your life if there won't be anything to make a mark on.
So staying sane requires also knowing what to do, not just knowing what not to do. Is there anything you would say about that?
This is why, in a much more real and also famous case, President Truman was validly angered and told "that son of a bitch", Oppenheimer, to fuck off, after Oppenheimer decided to be a drama queen at Truman.
For anyone else who didn't remember the details of what this was referencing:
Claude Opus 4.5's explanation of the reference
This refers to a meeting between J. Robert Oppenheimer and President Harry Truman in October 1945, about two months after the atomic bombings of Hiroshima and Nagasaki.
The meeting itself
Oppenheimer was invited to the Oval Office, ostensibly to discuss the future of atomic energy and weapons policy. At some point during the conversation, Oppenheimer reportedly said to Truman: "Mr. President, I feel I have blood on my hands."
Truman's reaction was sharp and dismissive. According to various accounts (primarily from Truman himself and his aides), Truman offered Oppenheimer his handkerchief and said something to the effect of "Would you like to wipe your hands?" After Oppenheimer left, Truman told Dean Acheson (then Undersecretary of State) that he never wanted to see "that son of a bitch" in his office again. Truman reportedly also said, "The blood is on my hands. Let me worry about that."
Why Truman reacted this way
Truman's anger seems to have stemmed from a few sources:
1. The decision was Truman's, not Oppenheimer's. Oppenheimer built the bomb, but Truman gave the order to use it. From Truman's perspective, Oppenheimer was claiming moral weight that properly belonged to the person who actually made the decision—and who would have to live with its consequences as a matter of presidential responsibility, not personal drama.
2. Truman viewed it as weakness or self-indulgence. Truman was famously blunt and decisive. He kept a sign on his desk reading "The Buck Stops Here." A scientist coming to him wringing his hands about guilt may have struck Truman as someone trying to have the significance of the decision without the responsibility for it.
3. The political context. Truman was dealing with the practical aftermath—the emerging Cold War, questions about international control of atomic weapons, the Soviet threat. Someone showing up to perform remorse rather than help solve problems may have seemed unhelpful at best.
The essay's interpretation
The author seems to be making the point that Oppenheimer's gesture made the atomic bomb about Oppenheimer—his feelings, his moral status, his inner drama—rather than about the actual event and its consequences. There's something structurally self-centered about a person involved in a catastrophe centering their own guilt rather than the catastrophe itself. Truman, whatever his flaws, seemed to grasp that the appropriate response to having made such a decision was to own it and deal with its consequences, not to perform anguish about it to the person who actually bore the responsibility.
Abstract for those who want to see it without clicking on the link:
The standard theory of model-free reinforcement learning assumes that the environment dynamics are stationary and that agents are decoupled from their environment, such that policies are treated as being separate from the world they inhabit. This leads to theoretical challenges in the multi-agent setting where the non-stationarity induced by the learning of other agents demands prospective learning based on prediction models. To accurately model other agents, an agent must account for the fact that those other agents are, in turn, forming beliefs about it to predict its future behavior, motivating agents to model themselves as part of the environment. Here, building upon foundational work on universal artificial intelligence (AIXI), we introduce a mathematical framework for prospective learning and embedded agency centered on self-prediction, where Bayesian RL agents predict both future perceptual inputs and their own actions, and must therefore resolve epistemic uncertainty about themselves as part of the universe they inhabit. We show that in multi-agent settings, self-prediction enables agents to reason about others running similar algorithms, leading to new game-theoretic solution concepts and novel forms of cooperation unattainable by classical decoupled agents. Moreover, we extend the theory of AIXI, and study universally intelligent embedded agents which start from a Solomonoff prior. We show that these idealized agents can form consistent mutual predictions and achieve infinite-order theory of mind, potentially setting a gold standard for embedded multi-agent learning.
(I know that footnote 3 is broken, I couldn't fix it on my phone. Will address it when I have a moment on a proper computer.)