Reality-Revealing and Reality-Masking Puzzles

Tl;dr: I’ll try here to show how CFAR’s “art of rationality” has evolved over time, and what has driven that evolution.

In the course of this, I’ll introduce the distinction between what I’ll call “reality-revealing puzzles” and “reality-masking puzzles”—a distinction that I think is almost necessary for anyone attempting to develop a psychological art in ways that will help rather than harm. (And one I wish I’d had explicitly back when the Center for Applied Rationality was founded.)

I’ll also be trying to elaborate, here, on the notion we at CFAR have recently been tossing around about CFAR being an attempt to bridge between common sense and Singularity scenarios—an attempt to figure out how people can stay grounded in common sense and ordinary decency and humane values and so on, while also taking in (and planning actions within) the kind of universe we may actually be living in.


Arts grow from puzzles. I like to look at mathematics, or music, or ungodly things like marketing, and ask: What puzzles were its creators tinkering with that led them to leave behind these structures? (Structures now being used by other people, for other reasons.)

I picture arts like coral reefs. Coral polyps build shell-bits for their own reasons, but over time there accumulates a reef usable by others. Math built up like this—and math is now a powerful structure for building from. [Sales and Freud and modern marketing/self-help/sales etc. built up some patterns too—and our basic way of seeing each other and ourselves is now built partly in and from all these structures, for better and for worse.]

So let’s ask: What sort of reef is CFAR living within, and adding to? From what puzzles (what patterns of tinkering) has our “rationality” accumulated?

Two kinds of puzzles: “reality-revealing” and “reality-masking”

First, some background. Some puzzles invite a kind of tinkering that lets the world in and leaves you smarter. A kid whittling with a pocket knife is entangling her mind with bits of reality. So is a driver who notices something small about how pedestrians dart into streets, and adjusts accordingly. So also is the mathematician at her daily work. And so on.

Other puzzles (or other contexts) invite a kind of tinkering that has the opposite effect. They invite a tinkering that gradually figures out how to mask parts of the world from your vision. For example, some months into my work as a math tutor I realized I’d been unconsciously learning how to cue my students into acting like my words made sense (even when they didn’t). I’d learned to mask from my own senses the clues about what my students were and were not learning.

We’ll be referring to these puzzle-types a lot, so it’ll help to have a term for them. I’ll call these puzzles “good” or “reality-revealing” puzzles, and “bad” or “reality-masking” puzzles, respectively. Both puzzle-types appear abundantly in most folks’ lives, often mixed together. The same kid with the pocket knife who is busy entangling her mind with data about bark and woodchips and fine motor patterns (from the “good” puzzle of “how can I whittle this stick”), may simultaneously be busy tinkering with the “bad” puzzle of “how can I not-notice when my creations fall short of my hopes.”

(Even “good” puzzles can cause skill loss: a person who studies Dvorak may lose some of their QWERTY skill, and someone who adapts to the unselfconscious arguing of the math department may do worse for a while in contexts requiring tact. The distinction is that “good” puzzles do this only incidentally. Good puzzles do not invite a search for configurations that mask bits of reality. Whereas with me and my math tutees, say, there was a direct reward/conditioning response that happened specifically when the “they didn’t get it” signal was masked from my view. There was a small optimizer inside of me that was learning how to mask parts of the world from me, via feedback from the systems of mine it was learning to befuddle.)

Also, certain good puzzles (and certain bad ones!) allow unusually powerful accumulations across time. I’d list math, computer science, and the English language as examples of unusually powerful artifacts for improving vision. I’d list “sales and marketing skill” as an example of an unusually powerful artifact for impairing vision (the salesperson’s own vision, not just the customer’s).

The puzzles that helped build CFAR

Much of what I love about CFAR is linked to the puzzles we dwell near (the reality-revealing ones, I mean). And much of what gives me the shudders about CFAR comes from a reality-masking puzzle-set that’s been interlinked with these.

Eliezer created the Sequences after staring a lot at the AI alignment problem. He asked how a computer system could form a “map” that matches the territory; he asked how he himself could do the same. He asked, “Why do I believe what I believe?” and checked whether the mechanistic causal history that gave rise to his beliefs would have yielded different beliefs in a world where different things were true.

There’s a springing up into self-awareness that can come from this! A taking hold of our power as humans to see. A child’s visceral sense that of course we care and should care—freed from its learned hopelessness. And taking on the stars themselves with daring!

CFAR took these origins and worked to make at least parts of them accessible to some who bounced off the Sequences, or who wouldn’t have read the Sequences. We created feedback loops for practicing some of the core Sequences-bits in the context of folks’ ordinary lives rather than in the context of philosophy puzzles. If you take a person (even a rather good scientist) and introduce them to the questions about AI and the long-term future… often nothing much happens in their head except some random stuck nonsense intuitions (“AIs wouldn’t do that, because they’re our offspring. What’s for lunch?”). So we built a way to practice some of the core moves that alignment thinking needed. Especially, we built a way to practice having thoughts at all, in cases where standard just-do-what-the-neighbors-do strategies would tend to block them off.

For example:

  • Inner Simulator. (Your “beliefs” are what you expect to see happen—not what you “endorse” on a verbal level. You can practice tracking these anticipations in daily life! And making plans with them! And once you’ve seen that they’re useful for planning—well, you might try also having them in contexts like AI risk. Turns out you have beliefs even where you don’t have official “expertise” or credentials authorizing belief-creation! And you can dialog with them, and there’s sense there.)
  • Crux-Mapping; Double Crux. (Extends your ability to dialog with inner simulator-style beliefs. Lets you find in yourself a random opaque intuition about AI being [likely/unlikely/safe/whatever], and then query it via thought experiments until it is more made out of introspectable verbal reasoning. Lets two people with different intuitions collide them in verbal conversation.)
  • Goal Factoring and Units of Exchange. (Life isn’t multiple choice; you can name the good things and the bad things, and you can invest in seeking the alternatives with more of the good and less of the bad. For example, if you could save 4 months in a world where you were allowed to complete your PhD early, it may be worth more than several hours to scheme out how to somehow purchase permission from your advisor, since 4 months is worth rather more than several hours.)
  • Hamming Questions. (Some questions are worth a lot more than others. You want to focus at least some of your attention on the most important questions affecting your life, rather than just the random details in front of you. And you can just decide to do that on purpose, by using pen and paper and a timer!)[1]

Much good resulted from this—many loved the Sequences; many loved CFAR’s intro workshops; and a fair number who started there went into careers in AI alignment work and credited CFAR workshops as partially causal.

And still, as we did this, problems arose. AI risk is disorienting! Helping AI risk hit more people meant “helping” more people encounter something disorienting. And so we set to work on that as well. The thing I would say now about the reality-revealing puzzles that helped grow CFAR is that there were three, each closely linked with each other:

  1. Will AI at some point radically transform our lightcone? (How / why / with what details and intervention options?)
  2. How do we get our minds to make contact with problem (1)? And how do we think groundedly about such things, rather than having accidental nonsense-intuitions and sticking there?
  3. How do we stay human, and stay reliably in contact with what’s worth caring about (valuing honesty and compassion and hard work; having reliable friendships; being good people and good thinkers and doers), while still taking in how disorientingly different the future might be? (And while neither pretending that we have no shot at changing the future, nor that “what actions should I take to impact the future?” is a multiple choice question with nothing further to do, nor that any particular silly plan is more likely to work than it is?)

CFAR grew up around all three of these puzzles—but (2) played an especially large role over most of our history, and (3) has played an especially large role over the last year and (I think) will over the coming one.

I’d like to talk now about (3), and about the disorientation patterns that make (3) needed.

Disorientation patterns

To start with an analogous event: The process of losing a deeply held childhood religion can be quite disruptive to a person’s common sense and values. Let us take as examples the two commonsensical statements:

  • (A) It is worth getting out of bed in the morning; and,
  • (B) It is okay to care about my friends.

These two commonsensical statements are held by most religious people. They are actually also held by most atheists. Nevertheless, when a person loses their religion, they fairly often become temporarily unsure about whether these two statements (and various similar such statements) are true. That’s because somehow the person’s understanding of why statements (A) and (B) are true was often tangled up in (for example) Jehovah. And figuring out how to think about these things in the absence of their childhood religion (even in cases like this one where the statements should survive!) can require actual work. (This is particularly true because some things really are different given that Jehovah is false—and it can take work to determine which is which.)

Over the last 12 years, I’ve chatted with small hundreds of people who were somewhere “in process” along the path toward “okay I guess I should take Singularity scenarios seriously.” From watching them, my guess is that the process of coming to take Singularity scenarios seriously is often even more disruptive than is losing a childhood religion. Among many other things, I have seen it sometimes disrupt:

  • People's belief that they should have rest, free time, some money/time/energy to spend on objects of their choosing, abundant sleep, etc.
    • “It used to be okay to buy myself hot cocoa from time to time, because there used to be nothing important I could do with money. But now—should I never buy hot cocoa? Should I agonize freshly each time? If I do buy a hot cocoa does that mean I don’t care?”
  • People's in-practice ability to “hang out”—to enjoy their friends, or the beach, in a “just being in the moment” kind of way.
    • “Here I am at the beach like my to-do list told me to be, since I’m a good EA who is planning not to burn out. I’ve got my friends, beer, guitar, waves: check. But how is it that I used to be able to enter “hanging out mode”? And why do my friends keep making meaningless mouth-noises that have nothing to do with what’s eventually going to happen to everyone?”
  • People's understanding of whether commonsense morality holds, and of whether they can expect other folks in this space to also believe that commonsense morality holds.
    • “Given the vast cosmic stakes, surely doing the thing that is expedient is more important than, say, honesty?”
  • People's in-practice tendency to have serious hobbies and to take a deep interest in how the world works.
    • “I used to enjoy learning mathematics just for the sake of it, and trying to understand history for fun. But it’s actually jillions of times higher value to work on [decision theory, or ML, or whatever else is pre-labeled as ‘AI risk relevant’].”
  • People's ability to link in with ordinary institutions and take them seriously (e.g. to continue learning from their day job and caring about their colleagues’ progress and problems; to continue enjoying the dance club they used to dance at; to continue to take an interest in their significant other’s life and work; to continue learning from their PhD program; etc.)
    • “Here I am at my day job, meaninglessly doing nothing to help no one, while the world is at stake—how is it that before learning about the Singularity, I used to be learning skills and finding meaning and enjoying myself in this role?”
  • People's understanding of what’s worth caring about, or what’s worth fighting for
    • “So… ‘happiness’ is valuable, which means that I should hope we get an AI that tiles the universe with a single repeating mouse orgasm, right? ... I wonder why imagining a ‘valuable’ future doesn’t feel that good/motivating to me.”
  • People's understanding of when to use their own judgment and when to defer to others.
    • “AI risk is really really important… which probably means I should pick some random person at MIRI or CEA or somewhere and assume they know more than I do about my own career and future, right?”

My take is that many of these disorientation-bits are analogous to the new atheist’s disorientation discussed earlier. “Getting out of bed in the morning” and “caring about one’s friends” turn out to be useful for more reasons than Jehovah—but their derivation in the mind of that person was entangled with Jehovah. Honesty is analogously valuable for more reasons than its value as a local consumption good; and many of these reasons apply extra if the stakes are high. But the derivation of honesty that many folks were raised with does not survive the change in imagined surroundings—and so it needs to be thought through freshly.

Another part of the disorientation perhaps stems from emotional reeling in contact with the possibility of death (both one’s own death, and the death of the larger culture/tribe/species/values/life one has been part of).

And yet another part seems to me to stem from a set of “bad” puzzles that were inadvertently joined with the “good” puzzles involved in thinking through Singularity scenarios—“bad” puzzles that disable the mental immune systems that normally prevent updating in huge ways from weird and out-there claims. I’ll postpone this third part for a section and then return to it.

There is value in helping people with this disorientation; and much of this helping work is tractable

It seems not-surprising that people are disrupted in cases where they seriously, viscerally wonder “Hey, is everything I know and everything humanity has ever been doing to maybe-end, and also to maybe become any number of unimaginably awesome things? Also, am I personally in a position of possibly incredibly high leverage and yet also very high ambiguity with respect to all that?”

Perhaps it is more surprising that people in fact sometimes let this into their system 1’s at all. Many do, though; including many (but certainly not all!) of those I would consider highly effective. At least, I’ve had many many conversations with people who seem viscerally affected by all this. Also, many people who tell me AI risk is “only abstract to [them]” still burst into tears or otherwise exhibit unambiguous strong emotion when asked certain questions—so I think people are sometimes more affected than they think.

An additional point is that many folks over the years have told me that they were choosing not to think much about Singularity scenarios lest such thinking destabilize them in various ways. I suspect that many who are in principle capable of doing useful technical work on AI alignment presently avoid the topic for such reasons. Also, many such folks have told me over the years that they found pieces at CFAR that allowed them to feel more confident in attempting such thinking, and that finding these pieces then caused them to go forth and attempt such thinking. (Alas, I know of at least one person who later reported that they had been inaccurate in revising this risk assessment! Caution seems recommended.)

Finally: people sometimes suggest to me that researchers could dodge this whole set of difficulties by simply reasoning about Singularity scenarios abstractly, while avoiding ever letting such scenarios get into their viscera. While I expect such attempts are in fact useful to some, I believe this method insufficient for two reasons. First, as noted, it seems to me that these topics sometimes get under people’s skin more than they intend or realize. Second, it seems to me that visceral engagement with the AI alignment problem is often helpful for the best scientific research—if a person is to work with a given “puzzle” it is easier to do so when they can concretely picture the puzzle, including in their system 1. This is why mathematicians often take pains to “understand why a given theorem is true” rather than only to follow its derivation abstractly. This is why Richard Feynman took pains to picture the physics he was working with in the “make your beliefs pay rent in anticipated experiences” sense and took pains to ensure that his students could link phrases such as “materials with an index of refraction” with examples such as “water.” I would guess that with AI alignment research, as elsewhere, it is easier to do first-rate scientific work when you have visceral models of what the terms, claims, and puzzles mean and how it all fits together.

In terms of the tractability of assisting with disorientation in such cases: it seems to me that simply providing contexts for people to talk to folks who’ve “been there before” can be pretty helpful. I believe various other concepts we have are also helpful, such as: familiarity with what bucket errors often look like for AI risk newcomers; discussion of the unilateralist’s curse; explanations of why hobbies and world-modeling and honesty still matter when the stakes are high. (Certainly participants sometimes say that these are helpful.) The assistance is partial, but there’s a decent iteration loop for tinkering away at it. We’ll also be trying some LessWrong posts on some of this in the coming year.

A cluster of “reality-masking” puzzles that also shaped CFAR

To what extent has CFAR’s art been shaped by reality-masking puzzles—tinkering loops that inadvertently disable parts of our ability to see? And how can we tell, and how can we reduce such loops? And what role have reality-masking puzzles played in the disruption that sometimes happens to folks who get into AI risk (in and out of CFAR)?

My guess is actually that a fair bit of this sort of reality-masking has occurred. (My guess is that the amount is “strategically significant” but not “utterly overwhelming.”) To name one of the more important dynamics:

Disabling pieces of the epistemic immune system

Folks arrive with piles of heuristics that help them avoid nonsense beliefs and rash actions. Unfortunately, many of these heuristics—including many of the generally useful ones—can “get in the way.” They “get in the way” of thinking about AI risk. They also “get in the way” of folks at mainline workshops thinking about changing jobs/relationships/life patterns etc. unrelated to AI risk. And so disabling them can sometimes help people acquire accurate beliefs about important things, and have more felt freedom to change their lives in ways they want.

Thus, the naive process of tinkering toward “really helping this person think about AI risk” (or “really helping this person consider their life options and make choices”) can lead to folks disabling parts of their epistemic immune system. (And unfortunately also thereby disabling their future ability to detect certain classes of false claims!)

For example, the Sequences make some effort to disable:

Similarly, CFAR workshops sometimes have the effect of disabling:

  • Taste as a fixed guide to which people/organizations/ideas to take in or to spit out. (People come in believing that certain things just “are” yucky. Then, we teach them how to “dialog” with their tastes… and they become more apt to sometimes-ignore previous “yuck” reactions.)
  • Antibodies that protect people from updating toward optimizing for a specific goal, rather than for a portfolio of goals. For example, entering participants will say things like “I know it’s not rational, but I also like to [activity straw vulcans undervalue].” And even though CFAR workshops explicitly warn against straw vulcanism, they also explicitly encourage people to work toward having goals that are more internally consistent, which sometimes has the effect of disabling the antibody which prevents people from suddenly re-conceptualizing most of their goal set as all being instrumental to/in service of some particular purportedly-paramount goal.
  • Folks’ tendency to take actions based on social roles (e.g., CFAR’s Goal-Factoring class used to explicitly teach people not to say “I’m studying for my exam because I’m a college student” or “I have to do it because it’s my job,” and to instead say “I’m studying for my exam in order to [cause outcome X]”).

Again, these particular shifts are not all bad; many of them have advantages. But I think their costs are easy to underestimate, and I’m interested in seeing whether we can get a “rationality” that causes less disablement of ordinary human patterns of functioning, while still helping people reason well in contexts where there aren’t good prexisting epistemic guardrails. CFAR seems likely to spend a good bit of time modeling these problems over the coming year, and trying to develop candidate solutions—we’re already playing with a bunch of new curriculum designed primarily for this purpose—and we’d love to get LessWrong’s thoughts before playing further!


Thanks to Adam Scholl for helping a lot with the writing. Remaining flaws are of course my own.

Edited to add:

I think I did not spell out well enough what I mean by "reality-masking puzzles." I try again in a comment.

I think that getting this ontology right is a core and difficult task, and one I haven't finished solving yet -- it is the task of finding analogs of the "reasoning vs rationalization" distinction that are suitable for understanding group dynamics. I would love help with this task -- that is maybe the main reason I wrote this post.

I think this task is closely related to what Zvi and the book "Moral Mazes" are trying for.

  1. If you don't know some of these terms but want to, you can find them in CFAR's handbook. ↩︎

New Comment
59 comments, sorted by Click to highlight new comments since:
Over the last 12 years, I’ve chatted with small hundreds of people who were somewhere “in process” along the path toward “okay I guess I should take Singularity scenarios seriously.” From watching them, my guess is that the process of coming to take Singularity scenarios seriously is often even more disruptive than is losing a childhood religion. Among many other things, I have seen it sometimes disrupt:

I feel like I was hit by most of these disruptions myself, and eventually managed to overcome them. But the exact nature of how exactly I overcame them, suggests to me that there might be one more piece to the puzzle which hasn't been mentioned here.

A concept which I've seen thrown around in a few places is that of an "exile-driven life"; "exile" referring to the Internal Family Systems notion of strong painful feelings which a person is desperate to keep buried. Your life or some aspect of your life being exile-driven, means that keeping those painful feelings suppressed is one of the primary motivations behind your choices. The alcoholic who drinks to make their feelings of shame go away is exile-driven, but one can also have an exile-driven career that looks successful from the outside, or an exile-driven relationship where someone is primarily in the relationship for the sake of e.g. getting validation from their partner, and gets desperate whenever they don't get enough of it.

In retrospect, it looks to me like most of my disruptions - such as losing the belief of having a right to rest etc. - were ultimately linked to strong feelings of moral obligation, guilt, and worthlessness which have also popped up in other contexts. For example, it has happened more than once that a friend has gotten very depressed and suicidal, and then clutched onto me for help; and this has triggered exactly the same kind of reasoning as the various Singularity scenarios. "What right do I have to rest when this other person is much more badly off", and other classic codependency symptoms. (Looking at that list of codependency symptoms actually makes for a very interesting parallel to "Singularity disorder", now that I think of it.)

Now, I do agree that there's something to the "eliminating antibodies" framing - in each of those cases, there have been related thoughts about consequentialism and (this was particularly toxic) heroic responsibility saying that yes, if I don't manage to help this person, then their suffering and possibly death is my fault.

But the "eliminating antibodies" framing is something that suggests that this is something that could happen to anyone. And maybe it could: part of my recovery involved starting to explicitly reject excessive consequentialism and utilitarianism in my thinking. Still, it wasn't until I found ways to address the underlying emotional flaws themselves, that the kinds of failure modes that you described also started fixing themselves more thoroughly.

So at least my own experience was less of "eliminating these antibodies caused me to overgeneralize factual beliefs", as "there were pre-existing parts of my mind that believed that I was worthless, and all the rationalist stuff handed them even more evidence that they could use for making that case, eliminating existing defenses against the belief". If I hadn't had those pre-existing vulnerabilities, I suspect that I wouldn't have been disrupted to the same extent.

Qiaochu and others have been making the observation that the rationalist community seems to have a large share of people who are traumatized; it's been remarked that self-improvement communities in general attract the walking wounded. At my IFS training, it was remarked that manager parts that are struggling to keep exiles in bay tend to be really strongly attracted into any systems which offer a promise of control and predictability, such as what you might get from the original Sequences - "here are the mathematically correct ways of reasoning and acting, just follow these instructions and you're doing as well as a human can!". There's the thought that if only you can work yourself hard enough, and follow the dictates of this new system faithfully enough, then the feelings of guilt and worthlessness will stop. But since consequentialism is more demanding than what any human is ever capable of, you can never say "okay, now I've done enough and can rest", and those feelings of worthlessness will just continue to recur.

This would suggest that not only are there pre-existing vulnerabilities that make some people more susceptible to being disrupted by rationalist memes, those are also exactly the same kinds of people who frequently get drawn to rationalist memes, since in the view of some of their parts, the "disruption" is actually a way to redeem themselves.


As I commented elsewhere I think this is great, but there's one curious choice here, which is to compare exposure to The Singularity as a de-conversion experience and loss of faith rather than a conversion experience where one gets faith. The parallel is from someone going from believer to atheist, rather than atheist to believer.

Which in some ways totally makes sense, because rationality goes hand in hand with de-conversion, as the Sequences are quite explicit about over and over again, and often people joining the community are in fact de-converting from a religion (and when and if they convert to one, they almost always leave the community). And of course, because the Singularity is a real physical thing that might really happen and really do all this, and so on.

But I have the system-1 gut instinct that this is actually getting the sign wrong in ways that are going to make it hard to understand people's problem here and how to best solve it.

(As opposed to it actually being a religion, which it isn't.)

From the perspective of a person processing this kind of new information, the fact that the information is true or false, or supernatural versus physical, doesn't seem that relevant. What might be much more relevant is that you now believe that this new thing is super important and that you can potentially have really high leverage over that thing. Which then makes everything feel unimportant and worth sacrificing - you now need to be obsessed with new hugely important thing and anyone who isn't and could help needs to be woken up, etc etc.

If you suddenly don't believe in God and therefore don't know if you can be justified in buying hot cocoa, that's pretty weird. But if you suddenly do believe in God and therefore feel you can't drink hot cocoa, that's not that weird.

People who suddenly believe in God don't generally have the 'get up in the morning' question on their mind, because the religions mostly have good answers for that one. But the other stuff all seems to fit much better?

Or, think about the concept Anna discusses about people's models being 'tangled up' with stuff they've discarded because they lost faith. If God doesn't exist why not [do horrible things] and all that because nothing matters so do what you want. But this seems like mostly the opposite, it's that the previous justifications have been overwritten by bigger concerns.

I think that losing your faith in civilization adequacy does feel more like a deconversion experience. All your safety nets are falling, and I cannot promise you that we'll replace them all. The power that 'made things okay' is gone from the world.

I experienced a bunch of those disorientation patterns during my university years. For example:

  • I would only spend time with people who cared about x-risk as well, because other people seemed unimportant and dull, and I thought I wouldn't want to be close to them in the long run. I would choose to spend time with people even if I didn't connect with very much, hoping that opportunities to do useful things would show up (most of the time they didn't). And yet I wasn't able to hang out with these people. I went through maybe a 6 month period where when I met up with someone, the first thing I'd do was list out like 10-15 topics we could discuss, and try to figure out which were the most useful to talk about and in what order we should talk. I definitely also turned many of these people off hanging out with me because it was so taxing. I was confused about this at the time. I though I was not doing it well enough or something, because I wasn't providing enough value to them such that they were clearly having a good time.
  • I became very uninterested in talking with people whose words didn't cache out into a gears level model of the situation based in things I could independently confirm or understand. I went through a long period of not being able to talk to my mum about politics at all. She's very opinionated and has a lot of tribal feelings and affiliations, and seemed to me to not be thinking about it in the way I wanted to think about it, which was a more first-principles fashion. Nowadays I find it interesting to put engage with how she sees the world, argue with it, feel what she feels. It's not the "truth" that I wanted, I can't take in the explicit content of her words and just input them into my beliefs, but this isn't the only way to learn from her. She has a substantive perspective on human coordination, that's tied up with important parts of her character and life story, that a lot of people share.
  • Relatedly, I went through a period of not being able to engage with aphorisms or short phrases that sounded insightful. Now I feel more trusting of my taste in what things mean and which things to take with me.
  • I generally wasn't able to connect with my family about what I cared about in life / in the big picture. I'd always try to be open and honest, and so I'd say something like "I think the world might end and I should do something about it" and they'd think that sounded mad and just ignore it. My Dad would talk about how he just cares that I'm happy. Nowadays I realise we have a lot of shared reference points for people who do things, not because they make you happy or because they help you be socially secure, but because they're right, because they're meaningful and fulfilling, and because it feels like it's your purpose. And they get that, and they know they make decisions like that, and they understand me when I talk about my decisions through that frame.
  • I remember on my 20th birthday, I had 10 of my friends round and gave a half-hour power-point presentation on my life plan. Their feedback wasn't that useful, but I realised like a week later, that the talk only contained info about how to evaluate whether a plan was good, and not how to generate plans to be evaluated. I'd just picked the one thing that people talked about that sounded okay under my evaluation process (publishing papers in ML, which was a terrible choice for me, I interacted very badly with academia). It took me a week to notice that I'd not said how to come up with plans. I then realised that I'd been thinking in a very narrow and evaluative way, and not been open to exploring interesting ideas before I could evaluate whether they worked.

I should say, these shifts have not been anything like an unmitigated failure. I think the whole process was totally worth it, and not just because they caused me to be more socially connected to x-risk people, or because they were worth it in some pascal's mugging kind of way. Like, riffing off that last example, the birthday party was followed by us doing a bunch of other things I really liked - my friends and I read a bunch of dialogues from GEB after that (the voices people did were very funny) and ate cake, and I remember it fondly. The whole event was slightly outside my comfort zone, but everyone had a great time, and it was also in the general pattern of me trying to more explicitly optimise for what I cared about. A bunch of the stuff above has lead me to form the strongest friendships I had, much stronger than I think I expected I could have. And many other things I won't detail here.

Overall the effects on me personally, on my general fulfilment and happiness and connection to people I care about, has been strongly positive, and I'm glad about this. I take more small social risks, and they pay off in large ways. I'm better at getting what I want, getting sh*t done, etc. Here, I'm mostly just listing some of the awkward things I did while at university.

I should say, these shifts have not been anything like an unmitigated failure, and I don't now believe were worth it just because they caused me to be more socially connected to x-risk things.

Had a little trouble parsing this, especially the second half. Here's my attempted paraphrase:

I take you to be saying that: 1) the shifts that resulted from engaging with x-risk were not all bad, despite leading to the disorienting events listed above, and 2) in particular, you think the shifts were (partially) beneficial for reasons other than just that they led you to be more socially connected to x-risk people.

Is that right?

That's close.

Engaging with CFAR and LW's ideas about redesigning my mind and focusing on important goals for humanity (e.g. x-risk reduction), has primarily - not partially - majorly improved my general competence, and how meaningful my life is. I'm a much better person, more honest and true, because of it. It directly made my life better, not just my abstract beliefs about the future.

The difficulties above were transitional problems, not the main effects.

The difficulties above were transitional problems, not the main effects.

Why do you say they were "transitional"? Do you have a notion of what exactly caused them?

Hm, what caused them? I'm not sure exactly, but I will riff on it for a bit anyway.

Why was I uninterested in hanging out with most people? There was something I cared about quite deeply, and it felt feasible that I could get it, but it seemed transparent that these people couldn't recognise it or help me get it and I was just humouring them to pretend otherwise. I felt kinda lost at sea, and so trying to understand and really integrate others' worldviews when my own felt unstable was... it felt like failure. Nowadays I feel stable in my ability to think and figure out what I believe about the world, and so I'm able to use other people as valuable hypothesis generation, and play with ideas together safely. I feel comfortable adding ideas to my wheelhouse that aren't perfectly vetted, because I trust overall I'm heading in a good direction and will be able to recognise their issues later.

I think that giving friends a life-presentation and then later noticing a clear hole in it felt really good, it felt like thinking for myself, putting in work, and getting out some real self-knowledge about my own cognitive processes. I think that gave me more confidence to interact with others' ideas and yet trust I'd stay on the right track. I think writing my ideas down into blogposts also helped a lot with this.

Generally building up an understanding of the world that seemed to actually be right, and work for making stuff, and people I respected trusted, helped a lot. 

That's what I got right now.

Oh, there was another key thing tied up with the above: feeling like I was in control of my future. I was terrible at being a 'good student', yet I thought that my career depended on doing well at university. This lead to a lot of motivated reasoning and a perpetual fear that made it hard to explore, and gave me a lot of tunnel vision throughout my life at the time. Only when I realised I could get work that didn't rely on good grades at university, but instead on trust I had built in the rationality and EA networks, and I could do things I cared about like work on LessWrong, did I feel more relaxed about considering exploring other big changes I wanted in how I lived my life, and doing things I enjoyed.

A lot of these worries felt like I was waiting to fix a problem - a problem whose solution I could reach, at least in principle - and then the worry would go away. This is why I said 'transitional'. I felt like the problems could be overcome.


This post is great and much needed, and makes me feel much better about the goings-on at CFAR.

It is easy to get the impression that the concerns raised in this post are not being seen, or are being seen from inside the framework of people making those same mistakes. Sometimes these mistakes are disorientation that people know are disruptive and need to be dealt with, but other times I've encountered many who view such things as right and proper, and view not having such a perspective as blameworthy. I even frequently find an undertone of 'if you don't have this orientation something went wrong.'

It's clear from this post that this is not what is happening for Anna/CFAR, which is great news.

This now provides, to me, two distinct things.

One, a clear anchor from which to make it clear that failure to engage with regular life, and failure to continue to have regular moral values and desires and cares and hobbies and so on, is a failure mode of some sort of phase transition that we have been causing. That it is damaging, and it is to be avoided slash the damage contained and people helped to move on as smoothly and quickly as possible.

Two, the framework of reality-revealing versus reality-masking, which has universal application. If this resonates with people it might be a big step forward in being able to put words to key things, including things I'm trying to get at in the Mazes sequence.

It is easy to get the impression that the concerns raised in this post are not being seen, or are being seen from inside the framework of people making those same mistakes.

I don't have a strong opinion about the CFAR case in particular, but in general, I think this is impression is pretty much what happens by default in organizations, even when people running them are smart and competent and well-meaning and want to earn the community's trust. Transparency is really hard, harder than I think anyone expects until they try to do it, and to do it well you have to allocate a lot of skill points to it, which means allocating them away from the organization's core competencies. I've reached the point where I no longer find even gross failures of this kind surprising.

(I think you already appreciate this but it seemed worth saying explicitly in public anyway.)

My take on the question

I’m worried this misses nuance, but I basically look at all of this in the following way:

  • Turns out the world might be really weird
  • This means you want people to do weird things with their brains too
  • You teach them skills to do weird stuff with their brains
  • When people are playing around with these skills, they sometimes do unintended weird stuff which is very bad for them

And then the question is, what are the safety rails here/are there differential ways of teaching people to do weird stuff with their brains.

Some of my experience with disorientation:

  • I initially found out about EA from my partner, who had recently found out about it and was excited and not overly subtle in his application of the ideas. Eventually I got argued into a place where it appeared to me I had to either bite bullets I didn’t want to (e.g. ‘no, I don’t care that more children will die of malaria if I do x’) or admit defeat. It didn’t occur to me that I could just say ‘hmm, I don’t know why I still don’t feel happy with this, but I don’t. So I’m not going to change my mind just yet’. I admitted defeat, and did a bunch of EA stuff in a kind of ‘I suppose I should eat my carrots’ way (like doing a job I really didn’t like and spending lots of my other hours on community building for a thing I wasn’t actually excited about).
  • The thing that snapped me out of that wasn’t CFAR, it was reading a novel (D.H. Lawrence’s Women in Love), which filled me with a sense that life was too short to be miserable and I should do what I wanted. I did other things for a while.
  • CFAR then indirectly helped me make peace with the fact that part of what I want is to make the actual world better, and now I work on long-termist stuff.
  • My more recent experience of these things was quite deliberately trying to take my work and myself more seriously - recognising that for the most part I was just messing around and trying to try. I knew that taking things more seriously was risky, and I thought that knowing this would be sufficient. But it totally wasn’t, and I made myself very unhappy and stressed and exhausted, before pulling up in an experience that felt very similar to reading Women in Love, but didn’t involve an actual book.
  • Following this, I once again stopped caring about this stuff for a while (and just pitched up to my job 9 to 5 like a normal person). Now I’m starting to be able to care a bit again, and we’ll see.

My guess is that if I had pushed a bit harder in either of the disorientation phases, I would have done myself substantially more damage, and it was good that I threw in the towel early, and just went off to do other things.

I also think that liking novels and poetry was a big aesthetic reason that I didn't want to be around the EA/safety crowd, and I'm really glad that this tension didn't lead to me stopping to read, given how useful reading random novels turned out to be for me.

A couple people asked for a clearer description of what a “reality-masking puzzle” is. I’ll try.

JamesPayor’s comment speaks well for me here:

There was the example of discovering how to cue your students into signalling they understand the content. I think this is about engaging with a reality-masking puzzle that might show up as "how can I avoid my students probing at my flaws while teaching" or "how can I have my students recommend me as a good tutor" or etc.

It's a puzzle in the sense that it's an aspect of reality you're grappling with. It's reality-masking in that the pressure was away from building true/accurate maps.

To say this more slowly:

Let’s take “tinkering” to mean “a process of fiddling with a [thing that can provide outputs] while having some sort of feedback-loop whereby the [outputs provided by the thing] impacts what fiddling is tried later, in such a way that it doesn’t seem crazy to say there is some ‘learning’ going on.”

Examples of tinkering:

  • A child playing with legos. (The “[thing that provides outputs]” here is the [legos + physics], which creates an output [an experience of how the legos look, whether they fall down, etc.] in reply to the child’s “what if I do this?” attempts. That output then affects the child’s future play-choices some, in such a way that it doesn’t seem crazy to say there is some “learning” happening.)
  • An person doodling absent-mindedly while talking on the phone, even if the doodle has little to no conscious attention;
  • A person walking. (Since the walking process (I think) contains at least a bit of [exploration / play / “what happens if I do this?” -- not necessarily conscious], and contains some feedback from “this is what happens when you send those signals to your muscles” to future walking patterns)
  • A person explicitly reasoning about how to solve a math problem
  • A family member A mostly-unconsciously taking actions near another family member B [while A consciously or unconscoiusly notices something about how the B responds, and while A has some conscious or unconscious link between [how B responds] and [what actions A takes in future].

By a “puzzle”, I mean a context that gets a person to tinker. Puzzles can be person-specific. “How do I get along with Amy?” may be a puzzle for Bob and may not be a puzzle for Carol (because Bob responds to it by tinkering, and Carol responds by, say, ignoring it). A kong toy with peanut butter inside is a puzzle for some dogs (i.e., it gets these dogs to tinker), but wouldn’t be for most people. Etc.

And… now for the hard part. By a “reality-masking puzzle”, I mean a puzzle such that the kind of tinkering it elicits in a given person will tend to make that person’s “I” somehow stupider, or in less contact with the world.

The usual way this happens is that, instead of the tinkering-with-feedback process gradually solving an external problem (e.g., “how do I get the peaut butter out of the kong toy?”), the tinkering-with-feedback process is gradually learning to mask things from part of their own mind (e.g. “how do I not-notice that I feel X”).

This distinction is quite related to the distinction between reasoning and rationalization.

However, it differs from that distinction in that “rationalization” usually refers to processes happening within a single person’s mind. And in many examples of “reality-masking puzzles,” the [process that figures out how to mask a bit of reality from a person’s “I”] is spread across multiple heads, with several different tinkering processes feeding off each other and the combined result somehow being partially about blinding someone.

I am actually not all that satisfied by the “reality-revealing puzzles” vs “reality-masking puzzles” ontology. It was more useful to me than what I’d had before, and I wanted to talk about it, so I posted it. But… I understand what it means for the evidence to run forwards vs backwards, as in Eliezer’s Sequences post about rationalization. I want a similarly clear-and-understood generalization of the “reasoning vs rationalizing” distinction that applies also to processes to spread across multiple heads. I don’t have that yet. I would much appreciate help toward this. (Incremental progress helps too.)

To try yet again:

The core distinction between tinkering that is “reality-revealing” and tinkering that is “reality-masking,” is which process is learning to predict/understand/manipulate which other process.

When a process that is part of your core “I” is learning to predict/manipulate an outside process (as with the child who is whittling, and is learning to predict/manipulate the wood and pocket knife), what is happening is reality-revealing.

When a process that is not part of your core “I” is learning to predict/manipulate/screen-off parts of your core “I”s access to data, what is happening is often reality-masking.

(Multiple such processes can be occurring simultaneously, as multiple processes learn to predict/manipulate various other processes all at once.)

The "learning" in a given reality-masking process can be all in a single person's head (where a person learns to deceive themselves just by thinking self-deceptive thoughts), but it often occurs via learning to impact outside systems that then learn to impact the person themselves (like in the example of me as a beginning math tutor learning to manipulate my tutees into manipulating me into thinking I'd explained things clearly)).

The "reality-revealing" vs "reality-masking" distinction is in attempt to generalize the "reasoning" vs "rationalizing" distinction to processes that don't all happen in a single head.

There are some edge cases I am confused about, many of which are quite relevant to the “epistemic immune system vs Sequences/rationality” stuff discussed above:

Let us suppose a person has two faculties that are both pretty core parts of their “I” -- for example, deepset “yuck/this freaks me out” reactions (“A”), and explicit reasoning (“B”). Now let us suppose that the deepset “yuck/this freaks me out” reactor (A) is being used to selectively turn off the person’s contact with explicit reasoning in cases where it predicts that B “reasoning” will be mistaken / ungrounded / not conducive to the goals of the organism. (Example: a person’s explicit models start saying really weird things about anthropics, and then they have a less-explicit sense that they just shouldn’t take arguments seriously in this case.)

What does it mean to try to “help” a person in such as case, where two core faculties are already at loggerheads, or where one core faculty is already masking things from another?

If a person tinkers in such a case toward disabling A’s ability to disable B’s access to the world… the exact same process, in its exact same aspect, seems “reality-revealing” (relative to faculty B) and “reality-masking” (relative to faculty A).

You are talking about it as though it is a property of the puzzle, when it seems likely to be an interaction between the person and puzzle

(These last two comments were very helpful for me, thanks.)

I want a similarly clear-and-understood generalization of the “reasoning vs rationalizing” distinction that applies also to processes to spread across multiple heads. I don’t have that yet. I would much appreciate help toward this.

I feel like Vaniver's interpretation of self vs. no-self is pointing at a similar thing; would you agree?

I'm not entirely happy with any of the terminology suggested in that post; something like "seeing your preferences realized" vs. "seeing the world clearly" would in my mind be better than either "self vs. no-self" or "design specifications vs. engineering constraints".

In particular, Vaniver's post makes the interesting contribution of pointing out that while "reasoning vs. rationalization" suggests that the two would be opposed, seeing the world clearly vs. seeing your preferences realized can be opposed, mutually supporting, or orthogonal. You can come to see your preferences more realized by deluding yourself, but you can also deepen both, seeing your preferences realized more because you are seeing the world more clearly.

In that ontology, instead of something being either reality-masking or reality-revealing, it can

  • A. Cause you to see your preferences more realized and the world more clearly
  • B. Cause you to see your preferences more realized but the world less clearly
  • C. Cause you to see your preferences less realized but the world more clearly
  • D. Cause you to see your preferences less realized and the world less clearly

But the problem is that a system facing a choice between several options has no general way to tell whether some option it could take is actually an instance of A, B, C or D or if there is a local maximum that means that choosing one possiblity increases one variable a little, but another option would have increased it even more in the long term.

E.g. learning about the Singularity makes you see the world more clearly, but it also makes you see that fewer of your preferences might get realized than you had thought. But then the need to stay alive and navigate the Singularly successfully, pushes you into D, where you are so focused on trying to invest all your energy into that mission that you fail to see how this prevents you from actually realizing any of your preferences... but since you see yourself as being very focused on the task and ignoring "unimportant" things, you think that you are doing A while you are actually doing D.

In the spirit of incremental progress, there is an interpersonal reality-masking pattern I observe.

Perhaps I'm meeting someone I don't know too well, and we're sort of feeling each other out. It becomes clear that they're sort of hoping for me to be shaped a certain way. To take the concrete example at hand, perhaps they're hoping that I reliably avoid reality-masking puzzles. Unless I'm quite diligent, then I will shape my self-presentation to match that desire.

This has two larger consequences. The first is if that person is trying to tell if they want to have more regular contact with me, we're starting to build a relationship with a rotten plank that will spawn many more reality-masking puzzles.

The second is that I might buy my own bullshit, and identify with avoiding reality-masking puzzles. And I might try to proselytize for this behavior. But I don't really understand it. So when talking to people, I'll be playing with the puzzle of how to mask my lack of understanding / actually holding the virtue. And if I'm fairly confident about the goodness of this virtue, then I'll also be pushing those around me to play with the puzzle of how they can feel they have this virtue without knowing what it really is

To me terminology like "puzzle" seems to suggest it is a search for an answer but the process seems also be characterised by avoidance of information generation.

You could have a challenge of lifting a weigth and one could struggle by pulling or pressing hard with their muscles. "tinkering" seems to refer to cognitive adaptation so weightlifting doesn't fit into the definition. But to me it seems it is more about success rather than smarting up. If one phrases it as "I feel uncomfortable when X happens, let's do something different" and "Now I feel comfortable" it is a challenge and a struggle but not a question or a puzzle. If one were to ask "What I could do to make myself comfortable?" that could be answered with knowledge or knowledge generation. But it doesn't seem clear to me whether the struggle actually has question structure.

At most extreme it would not be totally crazy to describe a weightlifter as answering the question "How do I lift these weights?" and the answer being "give muscle motor commands in the order x, y ,z". I guess somebody could help with weigthlifting with turning it into a puzzle "hey I see your technique is wrong. Try lifting like this.". But more usually it is a challenge of bothering the effort and maybe living throught the uncomfortability of the lift. And while even those could be turned into emotional intelligence questions ("emotional technique") they are not standardly tackled as questions.

Someone that is interested in "instrumental epistemology" should be interested in instrumental anything and succeeding at a task often involves succeding in dimensions other than epistemology too. All models are wrong but some are useful so in some situations it might be easy to find models that are very useful but very simple. Like being a religious zealot might give a lot of confidence which could be very useful so a consequentialist mind might recognise the success and lean into that direction. Is such an inductive inference reasonable? Maybe doing quantum mechanics as a bind fate black box leads to "shut up and calculate" be a more succesfull strategy than trying to form a broken understading/intuition and suffer many mistakes. Thus competence might mean abstraction supression.

Overall I'm still quite confused, so for my own benefit, I'll try to rephrase the problem here in my own words:

Engaging seriously with CFAR’s content adds lots of things and takes away a lot of things. You can get the affordance to creatively tweak your life and mind to get what you want, or the ability to reason with parts of yourself that were previously just a kludgy mess of something-hard-to-describe. You might lose your contentment with black-box fences and not applying reductionism everywhere, or the voice promising you'll finish your thesis next week if you just try hard enough.

But in general, simply taking out some mental stuff and inserting an equal amount of something else isn't necessarily a sanity-preserving process. This can be true even when the new content is more truth-tracking than what it removed. In a sense people are trying to move between two paradigms -- but often without any meta-level paradigm-shifting skills.

Like, if you feel common-sense reasoning is now nonsense, but you’re not sure how to relate to the singularity/rationality stuff, it's not an adequate response for me to say "do you want to double crux about that?" for the same reason that reading bible verses isn't adequate advice to a reluctant atheist tentatively hanging around church.

I don’t think all techniques are symmetric, or that there aren't ways of resolving internal conflict which systematically lead to better results, or that you can’t trust your inside view when something superficially pattern matches to a bad pathway.

But I don’t know the answer to the question of “How do you reason, when one of your core reasoning tools is taken away? And when those tools have accumulated years of implicit wisdom, instinctively hill-climbing to protecting what you care about?”

I think sometimes these consequences are noticeable before someone fully undergoes them. For example, after going to CFAR I had close friends who were terrified of rationality techniques, and who have been furious when I suggested they make some creative but unorthodox tweaks to their degree, in order to allow more time for interesting side-projects (or, as in Anna's example, finishing your PhD 4 months earlier). In fact, they've been furious even at the mere suggestion of the potential existence of such tweaks. Curiously, these very same friends were also quite high-performing and far above average on Big 5 measures of intellect and openness. They surely understood the suggestions.

There can be many explanations of what's going on, and I'm not sure which is right. But one idea is simply that 1) some part of them had something to protect, and 2) some part correctly predicted that reasoning about these things in the way I suggested would lead to a major and inevitable life up-turning.

I can imagine inside views that might generate discomfort like this.

  • "If AI was a problem, and the world is made of heavy tailed distributions, then only tail-end computer scientists matter and since I'm not one of those I lose my ability to contribute to the world and the things I care about won’t matter."
  • "If I engaged with the creative and principled optimisation processes rationalists apply to things, I would lose the ability to go to my mom for advice when I'm lost and trust her, or just call my childhood friend and rant about everything-and-nothing for 2h when I don't know what to do about a problem."

I don't know how to do paradigm-shifting; or what meta-level skills are required. Writing these words helped me get a clearer sense of the shape of the problem.

(Note: this commented was heavily edited for more clarity following some feedback)

The habit of saying/believing one “doesn’t know” in cases where one hasn’t much “legitimate” evidence

For what its worth I consider "saying 'I Don't Know' " to be a crucial rationality skill (which I think I learned from the Freakonmics guys before, or around the time I read the sequences). Related to “do not make stuff up, especially when you are very confused”.

According to me, when someone asks you a question about an area where I don't direct empirical data, the proper response is to first say "I don't know", and then to offer speculations and intuitions.

Failing to engage with one's intuitions it rejecting often huge swaths of relevant information. Failing to tag those intuitive models as speculation is shooting yourself in the foot, because if you belive the first thought that came to you, you're very unlikely to actually check.

To me, doing things because they are important seems to invite this kind of self-deception (and other problems as well), while doing things because they are interesting seems to invite many good outcomes. Don't know if other people have the same experience, though.

I agree with this, based on my experience.

At least one reason for it seems straightforward, though. Whether something is important is a judgment that you have to make, and it’s not an easy one; it’s certainly not obvious what things are important, and you can’t ever be totally certain that you’ve judged importance correctly (and importance of things can change over time, etc.). On the other hand, whether something is interesting (to you!) is just a fact, available to you directly; it’s possible to deceive yourself about whether something’s interesting to you, but not easy… certainly the default is that you just know whether you find something interesting or not.

In other words, self-deception about what’s important is just structurally much more likely than self-deception about what’s interesting.

I find the structure of this post very clear, but I'm confused about which are the 'reality-masking' problems that you say you spent a while puzzling. You list three bullets in that section, let me rephrase them as problems.

  • How to not throw things out just because they seem absurd
  • How to update on bayesian evidence even if it isn't 'legible, socially approved evidence'
  • How to cause beliefs to propagate through one's model of the world

I guess this generally connects with my confusion around the ontology of the post. I think it would make sense for the post to be 'here are some problems where puzzling at them helped me understand reality' and 'here are some problems where puzzling at them caused me to hide parts of reality from myself', but you seem to think it's an attribute of the puzzle, not the way one approaches it, and I don't have a compelling sense of why you think that.

You give an example of teaching people math, and finding that you were training particular bad patterns of thought in yourself (and the students). That's valid, and I expect a widespread experience. I personally have done some math tutoring that I don't think had that property, due to background factors that affected how I approached it. In particular, I wasn't getting paid, my mum told me I had to do it (she's a private english teacher who also offers maths, but knows I grok maths better than her), and so I didn't have much incentive to achieve results. I mostly just spoke with kids about what they understood, drew diagrams, etc, and had a fun time. I wasn't too results-driven, mostly just having fun, and this effect didn't occur.

More generally, many problems will teach you bad things if you locally hill-climb or optimise in a very short-sighted way. I remember as a 14 year old, I read Thinking Physics, spent about 5 mins per question, and learned nothing from repeatedly just reading the answers. Nowadays I do Thinking Physics problems weekly, and I spend like 2-3 hours per question. This seems more like a fact about how I approached it than a fact about the thing itself.

Looking up at the three bullets I pointed to, all three of them are important things to get right, that most people could be doing better on. I can imagine healthy and unhealthy ways of approaching them, but I'm not sure what an 'unhealthy puzzle' looks like.

I like your example about your math tutoring, where you "had a fun time” and “[weren’t] too results driven” and reality-masking phenomena seemed not to occur.

It reminds me of Eliezer talking about how the first virtue of rationality is curiosity.

I wonder how general this is. I recently read the book “Zen Mind, Beginner’s Mind,” where the author suggests that difficulty sticking to such principles as “don’t lie,” “don’t cheat,” “don’t steal,” comes from people being afraid that they otherwise won’t get a particular result, and recommends that people instead… well, “leave a line of retreat” wasn’t his suggested ritual, but I could imagine “just repeatedly leave a line of retreat, a lot” working for getting unattached.

Also, I just realized (halfway through typing this) that cousin_it and Said Achmiz say the same thing in another comment.

Thanks; you naming what was confusing was helpful to me. I tried to clarify here; let me know if it worked. The short version is that what I mean by a "puzzle" is indeed person-specific.

A separate clarification: on my view, reality-masking processes are one of several possible causes of disorientation and error; not the only one. (Sort of like how rationalization is one of several possible causes of people getting the wrong answers on math tests; not the only one.) In particular, I think singularity scenarios are sufficiently far from what folks normally expect that the sheer unfamiliarity of the situation can cause disorientation and errors (even without any reality-masking processes; though those can then make things worse).

I read 'unhealthy puzzle' as a situation in which (without trying to redesign it) you are likely to fall into a pattern that hides the most useful information about your true progress. Situation where you seek confirmatory evidence of your success, but the measures are only proxy measures can often have this feature (relating to Goodhart's law).

  • example: If I want to be a better communicator I might accidentally spend more time with those I can already communicate well. Thus I feel like I'm making progress "the percentage of time that I'm well understood has increased" but not actually have made any change to my communication skills.
  • example: If I want to teach well it would be easier to seem like I'm making progress if I do things that make it harder for the student to explicitly show their confusion - e.g. I might answer my own questions before the student has time to think about them, I might give lots of prompts to students on areas they should be able to answer, I might talk too much and not listen enough.
  • example: If I'm trying to research something I might focus on the areas the theory is already known to succeed.

All of this could be done without realising that you are accidentally optimising for fake-progress.

This seems like a good place to mention that if you are thinking about the future, doing contemplative practice, and get philosophically disoriented, and normal therapists and meditation teachers feel like they aren't actually addressing your concerns, that I am more than happy to talk with you. Traditional instructions often don't really emphasize that periods of high confusion are expected if you're doing the thing correctly because it just doesn't matter in a monastic context where you have a highly highly structured environment to fall back on.

I'd like to emphasize some things related to this perspective.

One thing that seems frustrating to me from just outside CFAR in the control group[1] is the way it is fumbling its way towards creating a new traditional for what I'll vaguely and for lack of a better term call positive transformation, i.e. taking people and helping them turn themselves into better versions of themselves that they more like and have greater positive impact on the world (make the world more liked by themselves and others). But there are already a lot of traditions that do this, albeit with different worldviews than the one CFAR has. So it's disappointing to watch CFAR to have tried and failed over the years in various ways, as measured by my interactions with people who have gone through their training programs, that were predictable if they were more aware of and practiced with existing traditions.

This has not been helped by what I read as a disgust or "yuck" reaction from some rationalists when you try to bring in things from these traditions because they are confounded in those traditions with things like supernatural claims. To their credit, many people have not reacted this way, but I've repeatedly felt the existence of this "guilty by association" meme from people who I consider allies in other respects. Yes, I expect on the margin some of this is amped up by the limitations of my communication skills such that I observe more of it than others do along with my ample willingness to put forward ideas that I think work even if they are "wrong" in an attempt to jump closer to global maxima, but I do not think the effect is so large as to discredit this observation.

I'm really excited to read that CFAR is moving in the direction implied by this post, and, because of the impact CFAR is having on the world through the people it impacts, like Romeo I'm happy to assist in what ways I can to help CFAR learn from the wisdom of existing traditions to make itself into an organization that has more positive effects on the world.

[1] This is a very tiny joke: I was in the control group for an early CFAR study and have still not attended a workshop, so in a certain sense I remain in the control group.

This is a long and good post with a title and early framing advertising a shorter and better post that does not fully exist, but would be great if it did. 

The actual post here is something more like "CFAR and the Quest to Change Core Beliefs While Staying Sane." 

The basic problem is that people by default have belief systems that allow them to operate normally in everyday life, and that protect them against weird beliefs and absurd actions, especially ones that would extract a lot of resources in ways that don't clearly pay off. And they similarly protect those belief systems in order to protect that ability to operate in everyday life, and to protect their social relationships, and their ability to be happy and get out of bed and care about their friends and so on. 

A bunch of these defenses are anti-epistemic, or can function that way in many contexts, and stand in the way of big changes in life (change jobs, relationships, religions, friend groups, goals, etc etc). 

The hard problem CFAR is largely trying to solve in this telling, and that the sequences try to solve in this telling, is to disable such systems enough to allow good things, without also allowing bad things, or to find ways to cope with the subsequent bad things slash disruptions. When you free people to be shaken out of their default systems, they tend to go to various extremes that are unhealthy for them, like optimizing narrowly for one goal instead of many goals, or having trouble spending resources (including time) on themselves at all, or being in the moment and living life, And That's Terrible because it doesn't actually lead to better larger outcomes in addition to making those people worse off themselves.

These are good things that need to be discussed more, but the title and introduction promise something I find even more interesting.

In that taxonomy, the key difference is that there are games one can play, things one can be optimizing for or responding to, incentives one can create, that lead to building more effective tools for modeling and understanding reality, and then changing it. One can cultivate an asthetic sense that these are good, healthy, virtuous, wholesome, etc. Interacting with these systems is 'good for you' and more people being in such modes more leads to more good things, broadly construed (if I was doing a post I'd avoid using such loaded language, it's not useful, but it's faster as a way to gesture at the thing).

Then there are reality-masking puzzles, which are where instead of creating better maps of the territory and enabling us to master the world, we instead learn to obscure our maps of the world, obscure the maps of others, fool ourselves first to then fool others, and otherwise learn how to do symbolic actions and social manipulations to get advantage or cause actions. 

This is related to simulacra (level 1 puzzles versus level 2-4 puzzles), it is related to moral mazes (if you start a small business buying and selling things you are reality revealing, whereas if you are navigating corporate politics you are reality masking, etc). Knowing how to tell which is which, and how to chart paths through problem spaces that shift problems of one type into the other (e.g. finding ways to do reality-revealing marketing/sales/public-relations/politics/testing/teaching/etc to extent possible). In particular, the question of: Are you causing optimization towards learning and figuring out how reality functions, or are you causing optimization towards faking that you understand or agree or are smart/agreeable/conscientious/willing-to-falsify? Are you optimizing for making things explicit, or for making things implicit? Etc.

So I'd love to see a post by Anna, or otherwise, that is entitled "Reality-Revealing and Reality-Masking Puzzles, No Really This Time" that takes this out of the CFAR/AI context entirely. But this still has a lot going on that's good and seems well over the threshold for inclusion in such a collection.

The post mentions problems that encourage people to hide reality from themselves. I think that constructing a 'meaningful life narrative' is a pretty ubiquitous such problem. For the majority of people, constructing a narrative where their life has intrinsic importance is going to involve a certain amount of self-deception.

Some of the problems that come from the interaction between these sorts of narratives and learning about x-risks have already been mentioned. To me, however, it looks like some of the AI x-risk memes themselves are partially the result of reality-masking optimization with the goal of increasing the perceived meaningfulness of the lives of people working on AI x-risk. As an example, consider the ongoing debate about whether we should expect the field of AI to mostly solve x-risk on its own. Clearly, if the field can't be counted upon to avoid the destruction of humanity, this greatly increases the importance of outside researchers trying to help them. So to satisfy their emotional need to feel that their actions have meaning, outside researchers have a bias towards thinking that the field is more incompetent than it is, and to come up with and propagate memes justifying that conclusion. People who are already in insider institutions have the opposite bias, so it makes sense that this debate divides to some extent along these lines.

From this perspective, it's no coincidence that internalizing some x-risk memes leads people to feel that their actions are meaningless. Since the memes are partially optimized to increase the perceived meaningfulness of the actions of a small group of people, by necessity they will decrease the perceived meaningfulness of everyone else's actions.

(Just to be clear, I'm not saying that these ideas have no value, that this is being done consciously, or that the originators of said memes are 'bad'; this is a pretty universal human behavior. Nor would I endorse bringing up these motives in an object-level conversation about the issues. However, since this post is about reality-masking problems it seems remiss not to mention.)

Shorter version:

"How to get people to take ideas seriously without serious risk they will go insane along the way" is a very important problem. In retrospect, CFAR should have had this as an explicit priority from the start.

Responding partly to Orthonormal and partly to Raemon:

Part of the trouble is that group dynamic problems are harder to understand, harder to iterate on, and take longer to appear and to be obvious. (And are then harder to iterate toward fixing.)

Re: individuals having manic or psychotic episodes, I agree with what Raemon says. About six months into a year into CFAR’s workshop-running experience, a participant had a manic episode a couple weeks after a workshop in a way that seemed plausibly triggered partly by the workshop. (Interestingly, if I’m not mixing people up, the same individual later told me that they’d also been somewhat destabilized by reading the sequences, earlier on.) We then learned a lot about warning signs of psychotic or manic episodes and took a bunch of steps to mostly-successfully reduce the odds of having the workshop trigger these. (In terms of causal mechanisms: It turns out that workshops of all sorts, and stuff that messes with one’s head of all sorts, seem to trigger manic or psychotic episodes occasionally. E.g. Landmark workshops; meditation retreats; philosophy courses; going away to college; many different types of recreational drugs; and different small self-help workshops run by a couple people I tried randomly asking about this from outside the rationality community. So my guess is that it isn’t the “taking ideas seriously” aspect of CFAR as such, although I dunno.)

Re: other kinds of “less sane”:

(1) IMO, there has been a build-up over time of mentally iffy psychological habits/techniques/outlook-bits in the Berkeley “formerly known as rationality” community, including iffy thingies that affect the rate at which other iffy things get created (e.g., by messing with the taste of those receiving/evaluating/passing on new “mess with your head” techniques; and by helping people be more generative of “mess with your head” methods via them having had a chance to see several already which makes it easier to build more). My guess is that CFAR workshops have accidentally been functioning as a “gateway drug” toward many things of iffy sanity-impact, basically by: (a) providing a healthy-looking context in which people get over their concerns about introspection/self-hacking because they look around and see other happy healthy-looking people; and (b) providing some entry-level practice with introspection, and with “dialoging with one’s tastes and implicit models and so on”, which makes it easier for people to mess with their heads in other, less-vetted ways later.

My guess is that the CFAR workshop has good effects on folks who come from a sane-isn or at least stable-is outside context, attend a workshop, and then return to that outside context. My guess is that its effects are iffier for people who are living in the bay area, do not have a day job/family/other anchor, and are on a search for “meaning.”

My guess is that those effects have been getting gradually worse over the last five or more years, as a background level of this sort of thing accumulates.

I ought probably to write about this in a top-level post, and may actually manage to do so. I’m also not at all confident of my parsing/ontology here, and would quite appreciate help with it.

(2) Separately, AI risk seems pretty hard for people, including ones unrelated to this community.

(3) Separately, “taking ideas seriously” indeed seems to pose risks. And I had conversations with e.g. Michael Vassar back in ~2008 where he pointed out that this poses risks; it wasn’t missing from the list. (Even apart from tail risks, some forms of “taking ideas seriously” seem maybe-stupid in cases where the “ideas” are not grounded also in one’s inner simulator, tastes, viscera — much sense is there that isn’t in ideology-mode alone). I don’t know whether CFAR workshops increase or decrease peoples’ tendency to take ideas seriously in the problematic sense, exactly. They have mostly tried to connect peoples’ ideas and peoples’ viscera in both directions.

“How to take ideas seriously without [the taking ideas seriously bit] causing them to go insane” as such actually still isn’t that high on my priorities list; I’d welcome arguments that it should be, though.

I’d also welcome arguments that I’m just distinguishing 50 types of snow and that these should all be called the same thing from a distance. But for the moment for me the group-level gradual health/wholesomeness shifts and the individual-level stuff show up as pretty different.


Encouragement to write the top level post, with offer of at least some help although presumably people who are there in Berkeley to see it would be more helpful in many ways. This matches my model of what is happening.

Seeing you write about this problem, in such harsh terms as "formerly-known-as-rationality community" and "effects are iffier and getting worse", is surprising in a good way.

Maybe talking clearly could help against these effects. The American talking style has been getting more oblique lately, and it's especially bad on LW, maybe due to all the mind practices. I feel this, I guess that, I'd like to understand better... For contrast, read DeMille's interview after he quit dianetics. It's such a refreshingly direct style, like he spent years mired in oblique talk and mind practices then got fed up and flipped to the opposite, total clarity. I'd love to see more of that here.

The American talking style has been getting more oblique lately, and it’s especially bad on LW, maybe due to all the mind practices. I feel this, I guess that, I’d like to understand better…

I tend to talk like that, prefer that kind of talk, and haven't done any mind practices. (I guess you mean meditation, circling, that kind of thing?) I think it's a good way to communicate degrees of uncertainty (and other "metadata") without having to put a lot of effort into coming up with explicit numbers. I don't see anything in Anna's post that argues against this, so if you want to push against it I think you'll have to say more about your objections.

For some reason it's not as annoying to me when you do it. But still, in most cases I'd prefer to learn the actual evidence that someone saw, rather than their posterior beliefs or even their likelihood ratios (as your conversation with Hal Finney here shows very nicely). And when sharing evidence you don't have to qualify it as much, you can just say what you saw.

But still, in most cases I’d prefer to learn the actual evidence that someone saw, rather than their posterior beliefs or even their likelihood ratios (as your conversation with Hal Finney here shows very nicely).

I think that makes sense (and made the point more explicitly at the end of Probability Space & Aumann Agreement). But sharing evidence is pretty costly and it's infeasible to share everything that goes into one's posterior beliefs. It seems sensible to share posterior beliefs first and then engage in some protocol (e.g., double cruxing or just ordinary discussion) for exchanging the most important evidence while minimizing cost with whoever actually disagrees with you. (This does leave the possibility that two people agree after having observed different evidence and could still benefit from exchanging evidence, but still seems reasonable as a rule of thumb in the real world.)

And when sharing evidence you don’t have to qualify it so much, you can just say what you saw.

I think you still do? Because you may not be sure that you remember it correctly, or interpreted it correctly in the first place, or don't totally trust the source of the evidence, etc.

That's fair. Though I'm also worried that when Alice and Bob exchange beliefs ("I believe in global warming" "I don't"), they might not go on to exchange evidence, because one or both of them just get frustrated and leave. When someone states their belief first, it's hard to know where to even start arguing. This effect is kind of unseen, but I think it stops a lot of good conversations from happening.

While if you start with evidence, there's at least some chance of conversation about the actual thing. And it's not that time-consuming, if everyone shares their strongest evidence first and gets a chance to respond to the other person's strongest evidence. I wish more conversations went like that.

I agree that this is and should be a core goal of rationality. It's a bit unclear to me how easy it would have been to predict the magnitude of the problem in advance. There's a large number of things to get right when inventing a whole new worldview and culture from scratch. (Insofar as it was predictable in advance, I think it is good to do some kind of backprop where you try to figure out why you didn't prioritize it, so that you don't make that same mistake again. I'm not currently sure what I'd actually learn here)

Meanwhile, my impression is that once "actually a couple people have had psychotic breaks oh geez", CFAR was reasonably quick to pivot towards prioritize avoiding that outcome (I don't know exactly what went on there and it's plausible that response time should have been faster, or in response to earlier warning signs). 

But, part of the reason this is hard is that there isn't actually a central authority here and there's a huge inertial mass of people already excited about brain-tinkering that's hard to pivot on a dime.


The epistemic immune system serves a purpose--some things are very difficult to reason out in full and some pitfalls are easy to fall in unknowingly. If you were a perfect reasoner, of course, this wouldn't matter, but the epistemic immune system is necessary because you're not a perfect reasoner. You're running on corrupted hardware, and you've just proposed dumping the error-checking that protects you from flaws in the corrupted hardware.

And saying "we should disable them if they get in the way of accurate beliefs" is, to mix metaphors, like saying "we should dispense with the idea of needing a warrant for the police to search your house, as long as you're guilty". Everyone thinks their own beliefs are accurate; saying "we should get rid of our epistemic immune system if it gets in the way of accurate beliefs" is equivalent to getting rid of it all the time.

I'm reminded of the post Purchase Fuzzies and Utilons Separately.

The actual human motivation and decision system operates by something like "expected valence" where "valence" is determined by some complex and largely unconscious calculation. When you start asking questions about "meaning" it's very easy to decouple your felt motivations (actually experienced and internally meaningful System-1-valid expected valence) from what you think your motivations ought to be (something like "utility maximization", where "utility" is an abstracted, logical, System-2-valid rationalization). This is almost guaranteed to make you miserable, unless you're lucky enough that your System-1 valence calculation happens to match your System-2 logical deduction of the correct utilitarian course.

Possible courses of action include:

1. Brute forcing it, just doing what System-2 calculates is correct. This will involve a lot of suffering, since your System-1 will be screaming bloody murder the whole time, and I think most people will simply fail to achieve this. They will break.

2. Retraining your System-1 to find different things intrinsically meaningful. This can also be painful because System-1 generally doesn't enjoy being trained. Doing it slowly, and leveraging your social sphere to help warp reality for you, can help.

3. Giving up, basically. Determining that you'd rather just do things that don't make you miserable, even if you're being a bad utilitarian. This will cause ongoing low-level dissonance as you're aware that System-2 has evaluated your actions as being suboptimal or even evil, but at least you can get out of bed in the morning and hold down a job.

There are probably other options. I think I basically tried option 1, collapsed into option 3, and then eventually found my people and stabilized into the slow glide of option 2.

The fact that utilitarianism is not only impossible for humans to execute but actually a potential cause of great internal suffering to even know about is probably not talked about enough.

[+][comment deleted]50

Oh. I'm not interested in low-effort comments making fast assumptions about how/why the author is using pronouns, and assuming that they're engaging in the specific signaling game from where you're from. As a mod, I recommend lurking more and making higher substance comments in future.

Curated, with some thoughts:

I think the question of "how to safely change the way you think, in a way that preserves a lot of commonsense things" is pretty important. This post gave me a bit of a clearer sense of "Valley of Bad Rationality" problem.

This post also seemed like part of the general project of "Reconciling CFAR's paradigm(s?) with the established LessWrong framework. In this case I'm not sure it precisely explains any parts of CFAR that people tend to find confusing. But it does lay out some frameworks that I expect to be helpful groundwork for that.

I shared some of Ben's confusion re: what point the post was specifically making about puzzles:

I guess this generally connects with my confusion around the ontology of the post. I think it would make sense for the post to be 'here are some problems where puzzling at them helped me understand reality' and 'here are some problems where puzzling at them caused me to hide parts of reality from myself', but you seem to think it's an attribute of the puzzle, not the way one approaches it, and I don't have a compelling sense of why you think that.

There were some hesitations I had about curating it – to some degree, this post is a "snapshot of what CFAR is doing in 2020", which is less obviously "timeless content". The post depends a fair bit on the reader already knowing what CFAR is and how they relate to LessWrong. But the content was still focused on explaining concepts, which I expect to be generally useful.

"timeless content".

It's interesting to think about the review effort in this light. (Also, material about doing group rationality stuff can fit in with timeless content, but less in a oneshot way.)

The review has definitely had an effect on me looking at new posts, and thinking "which of these would I feel good about including in a Best of the Year Book?" as well as "which of these would I feel good about including in an actual textbook?

This post is sort of on the edge of "timeless enough that I think it'd be fine for the 2020 Review", but I'm not sure whether it's quite distilled enough to fit nicely into, say, the 2021 edition of "the LessWrong Textbook." (this isn't necessarily a complaint about the post, just noting that different posts can be optimized for different things)

Interesting post, although I wish "reality-masking" puzzles had been defined better. Most of this post is around disorientation pattern or disabling parts of the epistemic immune system more than anything directly masking reality.

Also related: Pseudo-rationality

Having a go at pointing at "reality-masking" puzzles:

There was the example of discovering how to cue your students into signalling they understand the content. I think this is about engaging with a reality-masking puzzle that might show up as "how can I avoid my students probing at my flaws while teaching" or "how can I have my students recommend me as a good tutor" or etc.

It's a puzzle in the sense that it's an aspect of reality you're grappling with. It's reality-masking in that the pressure was away from building true/accurate maps.

Having a go at the analogous thing for "disabling part of the epistemic immune system": the cluster of things we're calling an "epistemic immune system" is part of reality and in fact important for people's stability and thinking, but part of the puzzle of "trying to have people be able to think/be agenty/etc" has tended to have us ignore that part of things.

Rather than, say, instinctively trusting that the "immune response" is telling us something important about reality and the person's way of thinking/grounding, one might be looking to avoid or disable the response. This feels reality-masking; like not engaging with the data that's there in a way that moves toward greater understanding and grounding.

I found this a very useful post. It feels like a key piece in helping me think about CFAR, but also it sharpens my own sense of what stuff in "rationality" feels important to me. Namely "Helping people not have worse lives after interacting with rationalist memes"

I see. I guess that framing feels slightly off to me - maybe this is what you meant or maybe we have a disagreement - but I would say "Helping people not have worse lives after interacting with <a weird but true idea>". 

Like I think that similar disorienting things would happen if someone really tried to incorporate PG's "Black Swan Farming" into your action space, and indeed many good startup founders have weird lives with weird tradeoffs relative to normal people that often leads to burnout. "Interacting with x-risk" or "Interacting with the heavy-tailed nature of reality" or "Interacting with AGI" or whatever. Oftentimes stuff humans have only been interacting with in the last 300 years, or in some cases 50 years.

It might be useful to know that I'm not that sold on a lot of singularity stuff, and the parts of rationality that have affected me the most are some of the more general thinking principles. "Look at the truth even if it hurts" / "Understanding tiny amounts of evo and evo psyche ideas" / "Here's 18 different biases, now you can tear down most people's arguments".

It was those ideas (a mix of the naive and sophisticated form of them) + my own idiosyncrasies that caused me a lot of trouble. So that's why I say "rationalist memes". I guess that if I bought more singularity stuff I might frame it as "weird but true ideas".

I can't tell what you find jarring about it from this comment?

I think this is a complaint about use of unfamiliar jargon. This does seem like something that deserves a link/hover-over.

I guess it must be the word 'entangling'. Fair enough, can link here in future:

“Getting out of bed in the morning” and “caring about one’s friends” turn out to be useful for more reasons than Jehovah—but their derivation in the mind of that person was entangled with Jehovah.

Cf: "Learning rationality" and "Hanging out with like-minded people" turn out to be useful for more reasons than AI risk -- but their derivation in the mind of CFAR staff is entangled with AI risk.

On a side detail, in defence of sales and marketing skills:

Some useful things they bring is the ability to identify what features of your product are important, or unique, and what customers care about and want. Ie understanding the market. Thereby the ability to improve products and create new ones that are useful.

[Added:] Oh, and also to actually get products to the attention of people who would want them! You own lots of useful products that you would never have heard about or bought were it not for sales & marketing skills. Products don't sell themselves, and only some market themselves (e.g. by word of mouth - which only works for products that become popular among people you know).