Many good points!
Mostly wanted to say that even though CFAR got maybe "less far" than hoped for, in my view it actually got quite far. (I'm a bit worried that memetics works in a way where this post is at risk of one sentence version beeing ~ "how CFAR failed" or similar, which isn't true. )
Also, I'm wondering how large fraction of the negatives or obstacles was caused by "CFAR" vs "the environment", where into the environment I count e.g. Berkeley rationality scene, AI safety community, or similar, and even the broader Bay memetic landscape.
The hypothesis is part of the "CFAR in Berkeley" problem was, you ideally need fast and good feedback loops from reality in rationality education, but, unfortunately, x-risk oriented AI safety is a domain lacking good feedback loops even more than rationality education. The even broader context is Bay area is currently the best place in the world for production of memeplexes, influence-seeking patterns, getting money for persuasion, etc., which implies it is likely a great place where world would benefit from someone teaching rationality, but maybe not the best place for developing the skills.
Mostly wanted to say that even though CFAR got maybe "less far" than hoped for, in my view it actually got quite far.
I agree CFAR accomplished some real, good things. I'd be curious to compare our lists (and the list of whoever else wants to weigh in) as to where CFAR got.
On my best guess, CFAR's positive accomplishments include:
"Learning to run workshops where people often "wake up" and are more conscious/alive/able-to-reflect-and-choose, for at least ~4 days or so and often also for a several-month aftermath to a lesser extent"
I permanently upgraded my sense of agency as a result of CFAR workshops. Wouldn't be surprised if this happened to others too. Would be surprised if it happened to most CFAR participants.
//
I think CFAR's effects are pretty difficult to see and measure. I think this is the case for most interventions?
I feel like the best things CFAR did were more like... fertilizing the soil and creating an environment where lots of plants could start growing. What plants? CFAR didn't need to pre-determine that part. CFAR just needed to create a program, have some infrastructure, put out a particular call into the world, and wait for what shows up as a result of that particular call. And then we showed up. And things happened. And CFAR responded. And more things happened. Etc.
CFAR can take partial credit for my life starting from 2015 and onwards, into the future. I'm not sure which parts of it. Shrug.
Maybe I think most people try to slice the cause-effect pie in weird, false ways, and I'm objecting to that here.
[wrote these points before reading your list]
1. CFAR managed to create a workshop which is, in my view, reasonably balanced - and subsequently beneficial for most people.
In my view, one of the main problems with “teaching rationality” is people’s minds often have parts which are “broken” in a compatible way, making the whole work. My goto example is “planning fallacy” and “hyperbolic discounting”: because in decision making, typically only a product term of both appears, they can largely cancel out, and practical decisions of someone exhibiting both biases could be closer to optimum than people expect. Teach someone just how to be properly calibrated in planning … and you can make them worse off.
Some of the dimensions to balance I mean here could be labelled eg “S2 getting better S1 data access”, “S2 getting better S1 write access”, “S1 getting better communication channel to S2”, “striving for internal cooperation and kindness”, “get good at reflectivity”, “don’t get lost infinitely reflecting”. (all these labels are fake but useful)
(In contrast, a component which was in my view off-balance is “group rationality”)
This is non-trivial, and I’m actually worried about e.g. various EA ...
"Anti-crux" is where the two parties who're disagreeing about X take the time to map out the "common ground" that they both already believe, and expect to keep believing, regardless of whether X is true or not. It's a list of the things that "X or not X?" is not a crux of. Often best done before double-cruxing, or in the middle, as a break, when the double-cruxing gets triggering/disorienting for one or both parties, or for a listener, or for the relationship between the parties.
A common partial example that may get at something of the spirit of this (and an example that people do in the normal world, without calling it "anti-crux") is when person A has a criticism of e.g. person B's blog post or something (and is coming to argue about that), but A starts by creating common knowledge that e.g. they respect person B, so that the disagreement won't seem to be about more than it is.
A path I wish you had taken was trying to get rationality courses taught on many college campuses. Professors have lots of discretion in what they teach. (I'm planning on offering a new course and described it to my department chair as a collection of topics I find really interesting and think I could teach to first years. Yes, I will have to dress it up to get the course officially approved.) If you offer a "course in a box" which many textbook publishers do (providing handouts, exams, and potential paper assignments to instructors) you make it really easy for professors to teach the course. Having class exercises that scale well would be a huge plus.
The hypotheses listed mostly focus on the internal aspects of CFAR.
This may be somewhat misleading to a naive reader. (I am speaking mainly to this hypothetical naive reader, not to Anna, who is non-naive.)
What CFAR was trying to do was extremely ambitious, and it was very likely going to 'fail' in some way. It's good FOR CFAR to consider what the org could improve on (which is where its leverage is), but for a big picture view of it, you should also think about the overall landscape and circumstances surrounding CFAR. And some of this was probably not obvious at the outset (at the beginning of its existence), and so CFAR may have had to discover where certain major roadblocks were, as they tried to drive forward. This post doesn't seem to touch on those roadblocks in particular, maybe because they're not as interesting as considering the potential leverage points.
But if you're going to be realistic about this and want the big-picture sense, you should consider the following:
Also:
- The egregores that are dominating mainstream culture and the global world situation are not just sitting passively around while people try to train themselves to break free of their deeply ingrained patterns of mind. I think people don't appreciate just how hard it is to uninstall the malware most of us are born with / educated into (and which block people from original thinking). These egregores have been functioning for hundreds of years. Is the ground fertile for the art of rationality? My sense is that the ground is dry and salted, and yet we still make attempts to grow the art out of that soil.
- IMO the same effects that have led us to current human-created global crises are the same ones that make it difficult to train people in rationality. So, ya'll are up against a strong and powerful foe.
Honestly my sense is that CFAR was significantly crippled by one or more of these egregores (partially due to its own cowardice).
Yes; I agree with this. And it seems big. I wish I knew more legible, obviously-real concepts for trying to get at this.
It's originally an occult term, but my more-materialistic definition of it is "something that acts like an entity with motivations that is considerably bigger than a human and is generally run in a 'distributed computing' fashion across many individual minds." Microsoft the company is an egregore; feminism the social movement is an egregore; America the country is an egregore. The program "Minecraft" is not an egregore, an individual deer is not an egregore, a river is not an egregore.
Unreal's point is that these things 'fight back' and act on their distributed perception; if your corner of the world comes to believe that academia is a wasteful trap, for example, "academia" will notice and label you various things, which will then cause pro-academia people to avoid you and anti-academia people to start treating you as a political ally, both of which can make you worse off / twisted away from your original purpose.
Right.
I think a careful and non-naive reading of your post would avoid the issues I was trying to address.
But I think a naive reading of your post might come across as something like, "Oh CFAR was just not that good at stuff I guess" / "These issues seem easy to resolve."
So I felt it was important to acknowledge the magnitude of the ambition of CFAR and that such projects are actually quite difficult to pull off, especially in the post-modern information age.
//
I wish I could say I was speaking from an interest in tackling the puzzle. I'm not coming from there.
The main ones are:
At some point I hoped that CFAR would come up with "rationality trials", toy challenges that are difficult to game and transfer well to some subset of real world situations. Something like boxing, or solving math problems. But a new entry in that row.
IMO standardized tests of this form are hard; I was going to say "mainstream academia hasn't done much better" but Stanovich published something in 2016 that I'm guessing no one at CFAR has read (except maybe Dan?). I am not aware of any sustained research attempts on CFAR's part to do this. [My sense is lots of people looked at it for a little bit, thought "this is hard", and then dug in ground that seemed more promising.]
I think there are more labor-intensive / less clean alternatives that could have worked. We could have, say, just made the equivalent of Bridgewater Baseball Cards for rationality, and had people rate each other. This is sadly still a little 'marketing' instead of 'object-level' (the metric of "am I open-minded?" grounds in "do other people think I'm open-minded?" instead of just pointing at my brain and the environment), and maybe is painful for the people involved / gets them doing weird mental patterns instead of healthy mental patterns. But I think the visibility would have been good / it would have been easier to tell when someone is making 'real progress' vs. 'the perception of progress'.
According to the yoga traditions I am familiar with, uninvestigated/impure/mixed motives are quite a big deal and a primary predictor of success in self transformation. Glad to see it in the hypothesis space. A central example of this is that if you're in the self help space for a while you'll notice that many people are coming to you with the surface story of wanting change, but behaviors consistent with wanting fancy indirect excuses to not change, including things like being able to protest that you went to expensive workshops and everything and this proves that X really is intractable. Kegan refers to this as immunity to change, I like calling it the homeostatic prior, and relatedly at some point I got a doomy sense about CFAR after inquiring with various people and not being able to get a sense of a theory of change or a process that could converge to a theory of change for being able to diagnose this and other obstacles.
Sorry for leaving a comment after only reading the summary, so maybe this is addressed in the text, but I think I have a more concrete version of what I read as the theory being falling into a trap of a local maximum.
CFAR is just too weird.
I know lots of people like weird, but weird is self-limiting. And I don't mean cute, "lol i have so many plants, i'm so weird", within the normal person overton window weird, but proper normal-people-won't-really-understand-you weird.
One of the great lessons I've learned from my years of Zen practice is "don't talk about the weird". There that means stuff like don't talk about enlightenment, what happens during meditation (except with your teacher or a close dharma friend), or the things you can only know by experiencing them for yourself. It's a distraction and only rarely helpful. Mostly you need to just keep at the everyday practice of Zen.
I claim rationality needs the same lesson. Lots of this stuff about rationality is actually, properly weird. And for the kind of person who enjoys it, they want to lean into the weird. This is a mistake. This kind of weird is for adepts and teachers to talk shop about on rare occasion. Everyday folks need to...
CFAR, to really succeed at what I see as its mission (bring rationality to the masses), needed...
IMO (and the opinions of Davis and Vaniver, who I was just chatting with), CFAR doesn't and didn't have this as much of its mission.
We were and are (from our founding in 2012 through the present) more focused on rationality education for fairly small sets of people who we thought might strongly benefit the world, e.g. by contributing to AI safety or other high-impact things, or by adding enrichment to a community that included such people. (Though with the notable exception of Julia writing the IMO excellent book "Scout Mindset," which she started while at CFAR and which I suspect reached a somewhat larger audience.)
I do think we should have chosen our name better, and written our fundraising/year-end-report blog posts more clearly, so as to not leave you and a fair number of others with the impression we were aiming to "raise the sanity waterline" broadly. I furthermore think it was not an accident that we failed at this sort of clarity; people seemed to like us and to give us money / positive sentences / etc. when we sounded like we were going to do all the things, and I f...
One Particular Center for Helping A Specific Nerdy Demographic Bridge Common Sense and Singularity Scenarios And Maybe Do Alignment Research Better But Not Necessarily The Only Or Primary Center Doing Those Things
A Center for Trying to Improve Our Non-Ideal Cognitive Inclinations for Navigating to Gigayears
ACTION ICING
The thing I want most from LessWrong and the Rationality Community writ large is the martial art of rationality. That was the Sequences post that hooked me, that is the thing I personally want to find if it exists, that is what I thought CFAR as an organization was pointed at.
When you are attempting something that many people have tried before- and to be clear, "come up with teachings to make people better" is something that many, many people have tried before- it may be useful to look and see what went wrong last time.
In the words of Scott Alexander, "I’m the last person who’s going to deny that the road we’re on is littered with the skulls of the people who tried to do this before us. . . We’re almost certainly still making horrendous mistakes that people thirty years from now will rightly criticize us for. But they’re new mistakes. . . And I hope that maybe having a community dedicated to carefully checking its own thought processes and trying to minimize error in every way possible will make us have slightly fewer horrendous mistakes than people who don’t do that."
This article right here? This is a skull. It should be noticed.
If the Best Of collection is for people who want a m...
Hi Anna, I never came to one of your workshops (far too culty for me!), but I did read your handbook (2019 edition) and found it full of useful tips, particularly TAPS, inner simulator/murphyjitsu, focussing, shaping, polaris, comfort zone expansion, and yoda timers were all new to me, and are all things that I've used occasionally ever since. They've worked a treat whenever I've been in a situation where I remembered to use them. TAPs and shaping I think are now core parts of the way I approach things.
A lot of the other things in there: (units of exchange...
Coming back to this, I think "martial art of rationality" is a phrase that sounds really cool. But there are many cool-sounding things that in reality are impossible, or not viable, or just don't work well enough. The road from intuition about a nonexistent thing, to making that thing exist, is always tricky. The success rate is low. And the thing you try to bring into existence almost always changes along the way.
Thus, AI safety did not end up serving a “reality tether” function for us, or at least not sufficiently.
Due to AI safety's absence of short feedback loops, it seems obvious to me that discussing AI safety in a rationalty training camp would pull participants away from reality (and, ironically, rationality). I predict any training camp that attempted to mix rationality with AI safety would fall into the same trap.
A rationality camp is a cool idea. An AI safety camp is a cool idea. But a rationality camp + AI safety camp is like mixing oxygen with hydrogen.
I mean... "are you making progress on how to understand what intelligence is, or other basic foundational issues to thinking about AI" does have somewhat accessible feedback loops sometimes, and did seem to me to feed back in on the rationality curriculum in useful ways.
I suspect that if we keep can our motives pure (can avoid Goodhardting on power/control/persuasion, or on "appearance of progress" of various other sorts), AI alignment research and rationality research are a great combination. One is thinking about how to build aligned intelligence in a machine, the other is thinking about how to build aligned intelligence in humans and groups of humans. There are strong analogies in the subject matter that are great to geek out about and take inspiration from, and somewhat different tests/checks you can run on each. IMO Eliezer did some great thinking on both human rationality and the AI alignment problem, and on my best guess each was partially causal of the other for him.
Teaching rationality looks more similar to AI capabilities research than AI alignment research to me.
I love this question. Mostly because your model seems pretty natural and clear, and yet I disagree with it.
To me it looks more like AI alignment research, in that one is often trying to align internal processes with e.g. truth-seeking, so that a person ends up doing reasoning instead of rationalization. Or, on the group level, so that people can work together to form accurate maps and build good things, instead of working to trick each other into giving control to particular parties, assigning credit or blame to particular parties, believing that a given plan will work and so allowing that plan to move forward for reasons that're more political than epistemic, etc.
That is, humans in practice seem to me to be partly a coalition of different subprocesses that by default waste effort bamboozling one another, or pursuing "lost purposes" without propagating the updates all the way, or whatnot. Human groups even more so.
I separately sort of think that in practice, increasing a person's ability to see and reason and care (vs rationalizing and blaming-to-distract-themselves and so on) probably helps with ethical conduct, although I agree this is not at all obvious, and I have not made any persuasive arguments for it and do not claim it as "public knowledge."
CFAR's focus on AI research (as opposed to raising the rationality water line in general) leads me to two questions:
I suspect there's a contradiction between "Politics is the Mind Killer" and "Something to Protect", in terms of the combination of training rationality (especially epistemic rationality) and evaluating real-world decisions, on topics where the instructors believe they've already come to the correct conclusion.
The AI-Safety corner of EA seems quite likely to be a topic that is hard-mode for the study of rationality.
I think I meant something in the more general sense of political issues being important topics on which to apply rationality, but very poor topics on which to learn or improve rationality. Trying to become stronger in the Bayesean Arts is a different thing than contributing to AI Safety (and blended in difficult ways with evaluating AI Safety as a worthy topic for a given aspiring-rationalist's time).
For resisting pressure and memeplexes, this is especially true, if most/all of the guides/authorities have bought into this specific memeplex and aren't particularly seeking to change their beliefs, only to "help" students reach a similar belief.
I didn't follow CFAR that closely, so I don't know how transparent you were that this was a MIX of rationality improvement AND AI-Safety evangelism. Or, as you'd probably put it, rationality improvement which clearly leads to AI-Safety as an important result.
Is there some kind of metric tracking net positive impact on the world as a result of CFAR workshops? For ex: YCombinator has made a decent amount of money and they can measure how well they did over a period of years. I understand CFAR is not a startup incubator, but I would imagine there are things you can track (e.g: fitness, income, citations etc).
I think as a starting point something like physical fitness should be tracked (e.g: you could start a running club and measure if they were able to run a 10k after the workshop is over) and publish stat...
And... my guess in hindsight is that the "internal double crux" technique often led, in practice, to people confusing/overpowering less verbal parts of their mind with more-verbal reasoning, even in cases where the more-verbal reasoning was mistaken.
I'm confused about this. The way I remember it tough was very much explicitly against this, I.e:
Participants may want to learn “rationality”/“CFAR techniques”/etc. so that they can feel cool, so others will think they’re cool, so they can be part of the group, so they can gain the favor of a “teacher” or other power structure, etc.
So what? Just embrace it, learn a ton of techniques, some of them will be useless. Probably still way better than doing nothing. Later you can selectively drop the techniques that feel useless.
(What I am trying to do here is to put the risk of imperfect action as an alternative to the risk of inaction.)
This could go wrong i...
Otherwise, you will never raise the sanity waterline of the population at large.
I want to reiterate (stated elsewhere in this thread) that the goals of CFAR were not to raise the sanity waterline of the population at large.
“CFAR is trying to raise the sanity waterline of the population at large” is, I predict, a statement that would have been labeled as true by a substantial number, possibly a majority, of people who knew about CFAR
Since I have written about stuff related to this before, I'm lucky enough to have links handy at any given time. So here's cousin_it[1] in 2017 (!):
I wonder if people at MIRI think the same way. In a sense, the funnel idea was there from the beginning, as "raising the sanity waterline". CFAR can also be seen as part of that. But these efforts are mostly aimed at outreach, and I'm not sure they ever consciously tried to build a mechanism for converting status to research. What would it take to build such a mechanism today?
Also, this donation comment for their 2015 fundraiser is hilarious in hindsight:
Donated $100 second time. Let's raise that sanity waterline!
Edit: also, here's Ben Pace (you may have heard of him) in 2013:
...CFAR is working to discover systematic training methods for increasing rationality in humans.
If they discover said methods, and make them publicly available, that could massively increase the sanity waterline on a global scale.
This will require much w
IMO, our goal was to raise the sanity of particular smallish groups who attended workshops, but wasn't very much to have effects on millions or billions (we would've been in favor of that, but most of us mostly didn't think we had enough shot to try backchaining from that). Usually when people say "raise the sanity waterline" I interpret them as discussing stuff that happens to millions.
I agree the "tens of thousands" in the quoted passage is more than was attending workshops, and so pulls somewhat against my claim.
I do think our public statements were deceptive, in a fairly common but nevertheless bad way, in that we had many conflicting visions, tended to avoid contradicting people who thought we were gonna do all the good things that at least some of us had at least some desire/hope to do, and we tended in our public statements/fundraisers to try to avoid alienating all those hopes, as opposed to the higher-integrity / more honorable approach of trying to come to a coherent view of which priorities we prioritized how much and trying to help people not have unrealistic hopes in us, and not have inaccurate views of our priorities.
I think many of us, during many intention-minutes, had fairly sincere goals of raising the sanity of those who came to events, and took many actions backchained from these goals in a fairly sensible fashion. I also think I and some of us worked to: (a) bring to the event people who were unusually likely to help the world, such that raising their capability would help the world; (b) influence people who came to be more likely to do things we thought would help the world; and (c) draw people into particular patterns of meaning-making that made them easier to influence and control in these ways, although I wouldn't have put it that way at the time, and I now think this was in tension with sanity-raising in ways I didn't realize at the time.
I would still tend to call the sentence "we were trying to raise the sanity waterline of smart rationality hobbyists who were willing and able to pay for workshops and do practice and so on" basically true.
I also think we actually helped a bunch of people get a bunch of useful thinking skills, in ways that were hard and required actual work/iteration/attention/curiosity/etc (which we put in, over many years, successfully).
Sorry, to amend my statement about "wasn't aimed at raising the sanity waterline of eg millions of people, only at teaching smaller sets":
Way back when Eliezer wrote that post, we really were thinking of trying to raise the rationality of millions, or at least of hundreds of thousands, via clubs and schools and things. It was in the inital mix of visions. Eliezer spent time trying to write a sunk costs unit that could be read by someone who didn't understand much rationality themselves, aloud to a meetup, and could cause the meetup to learn skills. We imagined maybe finding the kinds of donors who donated to art museums and getting them to donate to us instead so that we could eg nudge legislation they cared about by causing the citizenry to have better thinking skills.
However, by the time CFAR ran our first minicamps in 2012, or conducted our first fundraiser, our plans had mostly moved to "teach those who are unusually easy to teach via being willing and able to pay for workshops, practice, care, etc". I prefered this partly because I liked getting the money from the customers we were trying to teach, so that they'd be who we were responsible to (fewer principle agent problems, c...
Hi! I was writing this originally as a comment-reply to this thread, but my reply is long, so I am factoring it out into its own post for easier reading/critique.
This is more comment-reply-quality than blog post quality, so read at your own risk. I do think the topic is interesting.
Short version of my thesis: It seems to me that CFAR got less far with "make a real art of rationality, that helps people actually make progress on tricky issues such as AI risk" than one might have hoped. My lead guess is that the barriers and tricky spots we ran into are somewhat similar to those that lots of efforts at self-help / human potential movement / etc. things have run into, and are basically "it's easy and locally reinforcing to follow gradients toward what one might call 'guessing the student's password', and much harder and much less locally reinforcing to reason/test/whatever one's way toward a real art of rationality. Also, the process of following these gradients tends to corrupt one's ability to reason/care/build real stuff, as does assimilation into many parts of wider society."
Epistemic status: “personal guesswork”. In some sense, ~every sentence in the post deserves repeated hedge-wording and caveats; but I’m skipping most of those hedges in an effort to make my hypotheses clear and readable, so please note that everything below this is guesswork and might be wrong. I am sharing only my own personal opinions here; others from past or current CFAR, or elsewhere, have other views.
I wrote:
In terms of whether there is some interesting thing we [at CFAR] discovered that caused us to abandon e.g. the mainline [workshops, that we at CFAR used to run]: I can't speak for more than myself here either. But for my own take, I think we ran to some extent into the same problem that something-like-every self-help / hippy / human potential movement since the 60's or so has run into, which e.g. the documentary (read: 4-hour somewhat intense propaganda film) Century of the Self is a pretty good introduction to. I separately or also think the old mainline workshops provided a pretty good amount of real value to a lot of people, both directly (via the way folks encountered the workshop) and via networks (by introducing a bunch of people to each other who then hit it off and had a good time and good collaborations later). But there's a thing near "self-help" that I'll be trying to dodge in later iterations of mainline-esque workshops, if there are later iterations. I think. If you like, you can think with some accuracy of the small workshop we're running this week, and its predecessor workshop a couple months ago, as experiments toward having a workshop where people stay outward-directed (stay focused on inquiring into outside things, or building stuff, or otherwise staring at the world outside their own heads) rather than focusing on e.g. acquiring "rationality habits" that involve a conforming of one's own habits/internal mental states with some premade plan. [1]
gjm replied:
You refer to "the same problem that something like every self-help / hippy / human potential movement since the 60s has run into", but then don't say what that problem is (beyond gesturing to a "4-hour-long propaganda film").
I can think of a number of possible problems that all such movements might have run into (or might credibly be thought to have run into) but it's not obvious to me which of them, if any, you're referring to.
Could you either clarify or be explicit that you intended not to say explicitly what you meant? Thanks!
And later, gjm again:
I'll list my leading hypotheses so you have something concrete to point at and say "no, not that" about.
- It turns out that actually it's incredibly difficult to improve any of the things that actually stop people fulfilling what it seems should be their potential; whatever is getting in the way isn't very fixable by training.
- "Every cause wants to be a cult", and self-help-y causes are particularly vulnerable to this and tend to get dangerously culty dangerously quickly.
- Regardless of what's happening to the cause as a whole, there are dangerously many opportunities for individuals to behave badly and ruin things for everyone.
- In this space it is difficult to distinguish effective organizations from ineffective ones, and/or responsible ones from cultish/abusive ones, which means that if you're trying to run an effective, responsible one you're liable to find that your potential clients get seduced by the ineffective irresponsible ones that put more of their efforts into marketing.
- In this space it is difficult to distinguish effective from ineffective interventions, which means that individuals and organizations are at risk of drifting into unfalsifiable woo.
So, that's the prior conversational context. Now for my long-winded attempt to reply, and to explain my best current guess at why CFAR didn't make more progress toward an actually-useful-for-understanding-AI-or-other-outside-things art of rationality.
I'll write it by quoting some of gjm's hypotheses, with some of my own added, in an order that is convenient to me, and with my own numbering added. I'll skip the hypotheses that seem inapplicable/inaccurate to me, and just quote the ones that I think are at least partially descriptive of what happened.
Yes. It is hard (beyond my skill level, and beyond the skill level of others I know AFAICT) to figure out the full intended functions of various parts of the psyche.
So, when people try to re-order their own or other peoples’ psyches based on theories of what’s useful, it’s easy to mess things up.
For example, I’ve heard several stories from adults who, as kids, decided to e.g. “never get angry” (read: “to dissociate from their anger”), in an effort not to be like an angry parent or similar.
Most people would not make that particular mistake as adults, but IMO there are a lot of other questions that are tricky even as an adult, including for me (e.g.: what is suffering, is it good for anything, is it okay to mostly avoid states of mind that seem to induce it, what’s up with denial and mental flinches, is it okay to remove that, does a particular thing that looks like ‘removing’ it remove it all the way or just dissociate things, what’s up with the many places where humans don’t seem very goal-pursuing/very agent-like, is it okay to become able to 'do my work' a lot more often, is it good/okay to become poly, is it workable to avoid having children despite really wanting to or does this risk something like introducing a sign error deep in your psychology …)
So, IMO, the history of efforts at self-improvement or rationality or the human potential movement or similar is full of efforts to rewire the psyche into molds that seem like a good idea initially, and sometimes seem like a bad idea in hindsight. And this sort of error is a bit tricky to recover from, because, if you’re changing how your mind works or how your social scene works, you are thereby messing with the faculties you’ll later need to use to evaluate the change, and to notice and recover from errors.
I think this is a significant piece of how we got stuck.
I suspect that "impure motives" (motives aimed at some local goal, and not simply at "help this mind be free and rational") were also a major contributor to what kept us from getting farther at CFAR, and that this interacted with and exacerbated the "model gaps" I was listing in hypothesis 1.
Some examples of the “impure” motives I have in mind:
In groups:
(“Wanting”, here, doesn’t need to mean “conscious, endorsed wanting”; it can mean “doing gradient-descent from these motives without consciously realizing what you’re doing.”)
Things get more easily wonky in groups, but even in the simpler case of a single individual there is IMO lots of opportunity for “impure” motives:
So, in summary: people's desires to control one another, or fool one another, can combine poorly with techniques for psychological self- or other- modification. So, too, with people's desires to control their own psyches, or to fool themselves, or to dodge uncertainty. Such failure modes are particularly easy because we do not have good models of how the psyche, or the sociology, ought to work, and it is relatively easy to manage to be "honestly mistaken" in convenient ways in view of that ignorance.
In Something to Protect , Eliezer argues that the real power in rationality will come when it is developed for the sake of some outside thing-worth-caring-about that a person cares deeply about, rather than developed for the sake of "being very rational" or some such.
Relatedly, in Mandatory Secret Identities, Eliezer advocates requiring that teachers of rationality have a serious day job / hobby / non-rationality-teacher engagement with how to do something difficult, and that they do enough real accomplishment to warrant respect in this other domain, and that no one be respected more as a teacher of rationality than as an accomplisher of other real stuff. That is, he suggested we try to get respect for real traction on real, non-"rationality" tasks into any rationality dojo's social incentives.
Unfortunately, our CFAR curriculum development efforts mostly had no such strong outside mooring. That is, CFAR units rose or fell based on e.g. how much we were personally convinced they were useful, and how much the students seemed to like them and seemed to be changed by them, how much we liked the resultant changes in our students, (both immediately and at follow-ups months or years later), etc. -- but not based (much/enough) on whether those units helped us/them/whoever make real, long-term progress on outside problems.
In hindsight, I wish we had tried harder to tether our art-development to "does it help us with real outside puzzles/work/investigations/building tasks of some sort." This seems like the sort of factor that could in principle keep an "art of rationality" on a path to being about the outside world.
At the same time, taking such outside traction seriously seems quite difficult to pull off, and in ways I expect would also have made it difficult for most other groups in our shoes to pull off (and that I suspect also affected e.g. most self-help / human potential movement/ etc. efforts). So I'd like to sketch why this is hard.
Doing things with the real-world "slows you down" and makes your efforts less predictable-to-yourself (which I and others often experience as threatening/unpleasant, vs being more able to 'make up' which things can be viewed as successful). Furthermore, relatedly, such outside "check how this works in real tasks" steps are unlikely to "feel rewarding", or to cause others to think you're cool, or to cause your units to feel more compelling locally to the social group. (Appearing to have done real-world checks might make your units more socially compelling in some groups. But unfortunately this creates a pull toward "feeling as though you've done it" or "causing others to feel as though you've done it," not toward the difficult, hard-to-track work of having actually sussed out what helps in a puzzling real-world domain.)
Thus, it’s easy for those strands within an organization/effort that attempt to take real-world traction seriously, to be locally outcompeted by strands not attempting such safeguards.
That is, a CFAR instructor / curriculum-developer who initially has some interest in both approaches, will "naturally" find their attention occupied more and more by curriculum-development efforts that skip the slow/unpredictable loop of "check whether this helps with real-world problem-solving. Similarly, an ecosystem involving several "rationality developers," some of whom do the one and some the other, will "naturally" find more of its attention heading to the person who is more like "guessing the students' passwords", and less like "tracking whether this helps with building real-world maps that match the territory, in slow, real-world, messy domains."
Lots of people who came to CFAR's past workshops (like people ~everywhere) wanted to succeed at lots of different things-in-their-lives. E.g. they wanted to do well in grad school or in careers, or to have good relationships with particular people, or get better at public speaking, or get more done at their EA job, or etc.
One might have hoped (I originally did hope) that folks' varied personal goals would provide lots of fodder for developing rationality skill, and that this process would provide lots of fodder for developing an art of rationality.
However, I now like asking about a person's notions of doing "well" in a domain, whether local signals-they-will-interpret-as-progress are more easily obtained by:
It unfortunately seems to me that for most of the goals people come in with, and for most of the ways that people tend initially to evaluate whether they are making progress on that goal, the "help them feel as though they're making progress on this goal" gradient tends more like toward skill at manipulating themselves and/or others, and less like toward skill at predicting and manipulating the physical world.
So, if a person is to take "does this so-called 'rationality technique' actually help with real-world stuff?" seriously as a feed-in to the developments of a real and grounded art of rationality, they'll need to carefully pick domains of real-world stuff that are actually about the ability to model the physical world, which on my model are unfortunately relatively rare. (E.g., "doing science" works, but "being regarded as having done good science" only sort-of works; some parts of finance seem to me to work, but some parts really don't; etc.)
I might have hoped that “solve AI, allow human survival” would be an instance of “something to protect” for some of us, and that our caring about AI safety would help ground/safeguard the rationality curriculum. I.e., I might have hoped that my/our desire to have humanity survive, would lead us to want to get real rationality techniques that really work, and would lead us away from dynamics such as those in Schools Proliferating without Evidence, and toward something grounded and real.
But, no. Or at least, not nearly as much as was needed. AI safety was indeed highly motivating for some of us (at minimum, for me), but the feedback loops were too long for “is X actually helping with AI safety?” to give the “but does it actually work in reality?” tether to our art. (Though we got some of that sometimes; the programs attempting to aid MIRI research were sometimes pretty fruitful and interesting, with the thoughts on AI feeding back into better understandings of how to reason, and with some techniques, e.g. "Gendlin's 'Focusing' for research" gaining standing as a result of their role in concrete research progress sometimes.)
And in addition to the paucity of data as to whether our techniques were helping with research, there was a presence of lots and lots of data and incentives as to whether our techniques were e.g. moving people to take up careers in AI safety, moving people to think we were cool, moving people to seem like they were likely to defer to MIRI or others I thought were basically good or on-path, etc. On my model, these other incentives, and my responses to them, made my and some others' efforts worse.
I... did try to have my efforts to influence people on AI risk be based in "epistemic rationality," as I saw it. That is, I had a model in which folks' difficulty taking AI risk seriously was downstream of gaps in their epistemic rationality, and in which it was crucial that persuasion toward AI safety work be done via improving folks' general-purpose epistemic rationality, and not through e.g. causing people to like the disconnected phrase "AI safety."
I endorsed many of the correct keywords ("help people think, don't try to persuade people of anything").
Nevertheless: the feedbacks that in-practice shaped which techniques I liked/used/taught were often feedbacks from "does this cause people to look like people who will help with AI risk as I see it, i.e. does it help with my desire for a certain ideology to be in control of people," and less feedbacks from "are they making real research progress now" or other grounded-in-the-physical-world successes/failures. (Although, again, there was some of the good/researchy kind of feedback, and I value this quite a bit.)
To give an example of the somewhat-wonky way my rationality development often went: I developed a technique called "Internal double crux," and ran a lot of people through a ~90-minute exercise called "internal double crux on AI risk." The basic idea in this technique, is that you have a conversation with yourself about whether AI risk is real, and e.g. whether the component words such as "AI" even refer to anything real/sensible/grounded, and you thereby try to pool the knowledge possessed by your visceral "what do I actually expect to see happen" self with the knowledge you hold more abstractly, and to hash things out, until you have a view that all of yourself signs onto and that is hopefully more likely to be correct.
I developed the "internal double crux" technique in part by thinking about the process that I and many 'math people' naturally do when reading a math textbook, where a person reads a claim, asks themselves if it is true, finds something like "okay, the proof is solid, so the claim is true, but still, it is not viscerally obvious yet that it is true, how do I see at a glance that it has to be this way?", and something like dialogues with themselves, back and forth, until they can see why the theorem holds. (Aka, I developed the technique at least partly by trying to be virtuous, and to 'boost epistemic rationality' rather than persuade.)
Still, the feedbacks that led to me putting the technique in a prominent place in the curriculum of CFAR's "AI risk for computer scientists" and "MIRI summer fellows" workshops were significantly that it seemed to often persuade people to take AI risk seriously.
And... my guess in hindsight is that the "internal double crux" technique often led, in practice, to people confusing/overpowering less verbal parts of their mind with more-verbal reasoning, even in cases where the more-verbal reasoning was mistaken. For example, I once used the "internal double crux" technique with a person I will here call "Bob", who had been badly burnt out by his past attempts to do direct work on AI safety. After our internal double crux session, Bob happily reported that he was no longer very worried about this, proceeded to go into direct AI safety work again, and... got badly burnt out by the work a year or so later. I have a number of other stories a bit like this one (though with different people and different topics of internal disagreement) that, as a cluster, lead me to believe that "internal double crux" in practice often worked as a tool for a person to convince themselves of things they had some ulterior motive for wanting to convince themselves of. Which... makes some sense from the feedbacks that led me to elevate the technique, independent of what I told myself I was 'trying' to do.
A related problem was that, in practice, it was too tempting to approach "aid AI safety" via social fuckery, and social fuckery is bad for making a real art of rationality.
For example, during e.g. the "AI risk for computer scientists" workshops that we ran partly as a MIRI recruiting aid in 2018-2020, I aimed to make the workshops impressive, and to make them showcase our thinking skill.
My reasoning at the time was that, since we could not talk directly about MIRI's non-public research programs, it was important that participants be able to see MIRI's/our thinking skill in other ways, so that they could have some shot at evaluating-by-proxy whether MIRI had a shot at being quite good at research.
(That is: I phrased the thing to myself as being about exposing people to true evidence, but I backchained it from wanting to convince them to trust me and to trust the structures I was working with.)
In practice, this goal led me to such actions as:
I suspect these features made the workshop worse than it would otherwise have been at allowing real conversations, allowing workshop participants, me, other staff, etc. to develop a real/cooperative art of rationality, etc. (Even though these sorts of "minor deceptiveness" are pretty "normal"; doing something at the standard most people hit doesn't necessarily mean doing it well enough not to get bitten by bad effects.)
Thus, AI safety did not end up serving a “reality tether” function for us, or at least not sufficiently. (Feedbacks from "can people do research" did help in some ways, even though feedbacks from "try to be persuasive/impressive to people, or to get into a social configuration that will allow influencing their future actions" harmed in other ways.) Nor did anything else tether us adequately, although I suspect that mundane tasks such as workshop logistics were at least a bit helpful.
There were lots of other people at CFAR, or outside of CFAR but aiding our efforts in important ways, including lots who were agentic and interesting and developed a lot of interesting content and changed our directions and outcomes. I'm mostly leaving them out of my writing here, but this seems bad, because they were a lot of what happened, and a lot of the agency behind what happened. At the same time, I'm focusing here on a lot of things that... weren't the best decisions, and didn't end up with great outcomes, and I'm doing this without having consulted most of the people who played roles at CFAR and its programs in the past, and it seems a lot less socially complicated to talk about my own role in sad outcomes than to attempt to talk about anyone else's role in such things, especially when my guesses are low-confidence and others are likely to disagree. So I'm mostly sticking to describing my own roles in past stuff at CFAR, while adding this note to try to make that less confusing.
Yes. This hypothesis seems right to me. With my draft at some of the mechanisms, above.
An additional mechanism worth naming:
(I suspect this list of mechanisms is still quite partial. There are probably just lots of Goodhardt-like dynamics, whereby groups of people who are initially pursuing X may trend, over time, to pursuing "things that kind of look like X" and "things that give power/resources/control to those who seem likely to pursue things that kind of look like X" and so on.)
IMO, large parts of “the mainstream” are also cults in the sense of “entities that restrict others’ thinking in ways that are not accurate, and that have been optimized over time for the survival of the system of thought-restrictions rather than for the good of the individual, and that make it difficult or impossible for those within these systems reason freely/well.”
For example, academia is optimized to keep people in academia. Also, the mainstream culture among college-educated / middle-class Americans seems to me to be optimized to keep people believing “normal” things and shunning weird/dangerous-sounding opinions and ideas, i.e. to keep people deferring to the culture of middle-class America and shunning influences that might disrupt this deferral, even in cases where this makes it hard for people to reason, to notice what they care about, or to choose freely. More generally, it seems to me there are lots of mainstream institutions and practices that condition people against thinking freely and speaking their minds. (cf "reason as memetic immune disorder" and "moral mazes").
I bring this up partly because I suspect the "true spirit of rationality" was more alive in the rationality community of 2008-2010 than it was in the CFAR of 2018 (say), or in a lot of parts of the EA community, and I further suspect that mimicry of some mainstream practices (e.g. management practices, PR practices) is one vehicle by which the "suppression of free individual caring and reasoning and self-direction, in favor of something like allowing groups to synchronize" occurred.
I bring this up also because in my head at least there are those who would respond to events thus far in parts of rationality-space with something like “gosh, your group ended up kinda culty, maybe you should avoid deviating from mainstream positions in the future,” and I’m not into that, because reasoning seems useful and necessary, and because in the long run I don’t trust mainstream institutions to allow the kind of epistemology we need to do anything real.
Some of these experiments are by a nascent group, rather than "CFAR-directed" in a narrow sense, and that group may fork off of CFAR as their own thing at some point, but it's not ready for the internet yet, and may never become so, but I don't mean to say they are only CFAR in a classic sense.