Many good points!
Mostly wanted to say that even though CFAR got maybe "less far" than hoped for, in my view it actually got quite far. (I'm a bit worried that memetics works in a way where this post is at risk of one sentence version beeing ~ "how CFAR failed" or similar, which isn't true. )
Also, I'm wondering how large fraction of the negatives or obstacles was caused by "CFAR" vs "the environment", where into the environment I count e.g. Berkeley rationality scene, AI safety community, or similar, and even the broader Bay memetic landscape.
The hypothesis is part of the "CFAR in Berkeley" problem was, you ideally need fast and good feedback loops from reality in rationality education, but, unfortunately, x-risk oriented AI safety is a domain lacking good feedback loops even more than rationality education. The even broader context is Bay area is currently the best place in the world for production of memeplexes, influence-seeking patterns, getting money for persuasion, etc., which implies it is likely a great place where world would benefit from someone teaching rationality, but maybe not the best place for developing the skills.
Mostly wanted to say that even though CFAR got maybe "less far" than hoped for, in my view it actually got quite far.
I agree CFAR accomplished some real, good things. I'd be curious to compare our lists (and the list of whoever else wants to weigh in) as to where CFAR got.
On my best guess, CFAR's positive accomplishments include:
"Learning to run workshops where people often "wake up" and are more conscious/alive/able-to-reflect-and-choose, for at least ~4 days or so and often also for a several-month aftermath to a lesser extent"
I permanently upgraded my sense of agency as a result of CFAR workshops. Wouldn't be surprised if this happened to others too. Would be surprised if it happened to most CFAR participants.
//
I think CFAR's effects are pretty difficult to see and measure. I think this is the case for most interventions?
I feel like the best things CFAR did were more like... fertilizing the soil and creating an environment where lots of plants could start growing. What plants? CFAR didn't need to pre-determine that part. CFAR just needed to create a program, have some infrastructure, put out a particular call into the world, and wait for what shows up as a result of that particular call. And then we showed up. And things happened. And CFAR responded. And more things happened. Etc.
CFAR can take partial credit for my life starting from 2015 and onwards, into the future. I'm not sure which parts of it. Shrug.
Maybe I think most people try to slice the cause-effect pie in weird, false ways, and I'm objecting to that here.
[wrote these points before reading your list]
1. CFAR managed to create a workshop which is, in my view, reasonably balanced - and subsequently beneficial for most people.
In my view, one of the main problems with “teaching rationality” is people’s minds often have parts which are “broken” in a compatible way, making the whole work. My goto example is “planning fallacy” and “hyperbolic discounting”: because in decision making, typically only a product term of both appears, they can largely cancel out, and practical decisions of someone exhibiting both biases could be closer to optimum than people expect. Teach someone just how to be properly calibrated in planning … and you can make them worse off.
Some of the dimensions to balance I mean here could be labelled eg “S2 getting better S1 data access”, “S2 getting better S1 write access”, “S1 getting better communication channel to S2”, “striving for internal cooperation and kindness”, “get good at reflectivity”, “don’t get lost infinitely reflecting”. (all these labels are fake but useful)
(In contrast, a component which was in my view off-balance is “group rationality”)
This is non-trivial, and I’m actually worried about e.g. various EA community building or outreach events reusing parts of CFAR curriculum, but selecting only parts which e.g. help S2 "rewrite" S1.
II. Impressively good pedagogy of some classes
III. Exploration going on, to a decent degree. At least in Europe, every run was a bit different, both with new classes, but also significant variance between versions of the same class.* (Actually I don’t know if this was true for the US workshops at the same time/the whole time)
IV. Heroic effort to keep good epistemics, which often succeeded
V. In my view some amount of “self-help” is actually helpful.
VI. Container-creation: bringing in interesting groups of people in the same building
VII. Overall, I think the amount of pedagogical knowledge created is impressive, given the size of the org.
"Anti-crux" is where the two parties who're disagreeing about X take the time to map out the "common ground" that they both already believe, and expect to keep believing, regardless of whether X is true or not. It's a list of the things that "X or not X?" is not a crux of. Often best done before double-cruxing, or in the middle, as a break, when the double-cruxing gets triggering/disorienting for one or both parties, or for a listener, or for the relationship between the parties.
A common partial example that may get at something of the spirit of this (and an example that people do in the normal world, without calling it "anti-crux") is when person A has a criticism of e.g. person B's blog post or something (and is coming to argue about that), but A starts by creating common knowledge that e.g. they respect person B, so that the disagreement won't seem to be about more than it is.
The even broader context is Bay area is currently the best place in the world for production of memeplexes, influence-seeking patterns, getting money for persuasion, etc., which implies it is likely a great place where world would benefit from someone teaching rationality, but maybe not the best place for developing the skills.
Thanks for mentioning this. I think this had a big effect.
A path I wish you had taken was trying to get rationality courses taught on many college campuses. Professors have lots of discretion in what they teach. (I'm planning on offering a new course and described it to my department chair as a collection of topics I find really interesting and think I could teach to first years. Yes, I will have to dress it up to get the course officially approved.) If you offer a "course in a box" which many textbook publishers do (providing handouts, exams, and potential paper assignments to instructors) you make it really easy for professors to teach the course. Having class exercises that scale well would be a huge plus.
The hypotheses listed mostly focus on the internal aspects of CFAR.
This may be somewhat misleading to a naive reader. (I am speaking mainly to this hypothetical naive reader, not to Anna, who is non-naive.)
What CFAR was trying to do was extremely ambitious, and it was very likely going to 'fail' in some way. It's good FOR CFAR to consider what the org could improve on (which is where its leverage is), but for a big picture view of it, you should also think about the overall landscape and circumstances surrounding CFAR. And some of this was probably not obvious at the outset (at the beginning of its existence), and so CFAR may have had to discover where certain major roadblocks were, as they tried to drive forward. This post doesn't seem to touch on those roadblocks in particular, maybe because they're not as interesting as considering the potential leverage points.
But if you're going to be realistic about this and want the big-picture sense, you should consider the following:
Also:
Honestly my sense is that CFAR was significantly crippled by one or more of these egregores (partially due to its own cowardice). But that's a longer conversation, and I'm not going to have it out here.
//
All of this is just to give a taste of how difficult the original problems were that CFAR was trying to resolve. We're not in a world that's like, "Oh yeah, with your hearts and minds in the right place, you'll make it through!" Or even "If you just have the best thoughts compared to all the other people, you'll win!" Or even "If you have the best thoughts, a slick and effective team, lots of money, and a lot of personal agency and ability, you'll definitely find the answers you seek."
And so the list of hypotheses + analyses above may make it sound like if CFAR had its shit more 'together', it would have done a better job. Maybe? How much better though? Realistically?
As we move forward on this wild journey, it just seems to become clearer how hard this whole situation really is. The more collective clarity we have on the "actual ground-level situation" (versus internal ideas, hopes, wishes, and fears coloring our perspective of reality), ... honestly the more confronting it all is. The more existentially horrifying. And just touching THAT is hard (impossible?) for most people.
(Which is partially why I'm training at a place like MAPLE. I seem to be saner now about x-risk. And I get that we're rapidly running out of time without feeling anxious about that fact and without needing to reframe it in a more hopeful way. I don't have much need for hope, it seems. And it doesn't stop me from wanting to help.)
Also:
- The egregores that are dominating mainstream culture and the global world situation are not just sitting passively around while people try to train themselves to break free of their deeply ingrained patterns of mind. I think people don't appreciate just how hard it is to uninstall the malware most of us are born with / educated into (and which block people from original thinking). These egregores have been functioning for hundreds of years. Is the ground fertile for the art of rationality? My sense is that the ground is dry and salted, and yet we still make attempts to grow the art out of that soil.
- IMO the same effects that have led us to current human-created global crises are the same ones that make it difficult to train people in rationality. So, ya'll are up against a strong and powerful foe.
Honestly my sense is that CFAR was significantly crippled by one or more of these egregores (partially due to its own cowardice).
Yes; I agree with this. And it seems big. I wish I knew more legible, obviously-real concepts for trying to get at this.
I probably don't have the kinds of concepts you're interested in, but...
Some significant conceptual pieces in my opinion are:
It's originally an occult term, but my more-materialistic definition of it is "something that acts like an entity with motivations that is considerably bigger than a human and is generally run in a 'distributed computing' fashion across many individual minds." Microsoft the company is an egregore; feminism the social movement is an egregore; America the country is an egregore. The program "Minecraft" is not an egregore, an individual deer is not an egregore, a river is not an egregore.
Unreal's point is that these things 'fight back' and act on their distributed perception; if your corner of the world comes to believe that academia is a wasteful trap, for example, "academia" will notice and label you various things, which will then cause pro-academia people to avoid you and anti-academia people to start treating you as a political ally, both of which can make you worse off / twisted away from your original purpose.
Pretty much, yes.
It's possible to create an organization in a technical sense that isn't an egregore though. Lots of people have tried to create secular churches, for instance, but they mostly just fall flat because they're not a viable design to create a living distributed entity.
Some parties (as in, a group of people at some gathering) fail to congeal into an egregore. But when they do, the scene "clicks". And sometimes those spawn egregores that outlast the party — but not often.
So, it's a little complicated.
But to a first approximation, yes.
Thanks for weighing in; I trust these conversations a lot more when they have multiple people from current or former CFAR. (For anyone not tracking, Unreal worked at CFAR for awhile.) (And, sorry, I know you said you're mainly writing this to not-me, but I want to engage anyhow.)
The hypotheses listed mostly focus on the internal aspects of CFAR.
This may be somewhat misleading to a naive reader. (I am speaking mainly to this hypothetical naive reader, not to Anna, who is non-naive.)
.... It's good FOR CFAR to consider what the org could improve on (which is where its leverage is), but for a big picture view of it, you should also think about the overall landscape and circumstances surrounding CFAR. And some of this was probably not obvious at the outset (at the beginning of its existence), and so CFAR may have had to discover where certain major roadblocks were, as they tried to drive forward. This post doesn't seem to touch on those roadblocks in particular, maybe because they're not as interesting as considering the potential leverage points.
Re: the above: I was actually trying to focus on not-specific-to-us-as-individuals factors that made the problem hard, or that made particular failure modes easy to fall into. I am hoping this post and its comments might be of use to both future-CFAR (e.g., future-me), and anyone aiming to build an "art of rationality" via some other group/place/effort.
So, if you skim over my hypotheses in the side-panel, they are things like "it's difficult to distinguish effective and ineffective interventions" and "in practice, many/most domains incentivize social manipulation rather than rationality." (Not things like "such-and-such an individual had such-and-such an unusual individual weakness.")
That is, I'm trying to understand and describe the background conditions that, IMO, gradually pulled CFAR and its members toward kinds of activity that had less of a shot at creating a real art of rationality. (My examples do involve us-in-particular, but that's because that's where the data is; that's what we know that others may want to know, when trying to build out an accurate picture of what paths have a shot at getting to a real art of rationality.)
I think we're maybe tackling the same puzzle, then (the puzzle of "how can a group take a good shot at building an art of rationality / what major obstacles are in the way / what is a person likely to miss in their first attempt, that might be nice to instead know about?"). And we're simply arriving at different guesses about the answers to that puzzle?
Right.
I think a careful and non-naive reading of your post would avoid the issues I was trying to address.
But I think a naive reading of your post might come across as something like, "Oh CFAR was just not that good at stuff I guess" / "These issues seem easy to resolve."
So I felt it was important to acknowledge the magnitude of the ambition of CFAR and that such projects are actually quite difficult to pull off, especially in the post-modern information age.
//
I wish I could say I was speaking from an interest in tackling the puzzle. I'm not coming from there.
Could you clarify what egregores you meant when you said:
The egregores that are dominating mainstream culture and the global world situation
The main ones are:
These three egregores benefit from people feeling powerless, worthless, or apathetic (malware). Basically the opposite of heroic, worthy, and compassionate (liberated, loving sovereignty). Helping to start uninstalling the malware is, like, one of the things CFAR has to do in order to even start having conversations about AI with most people.
And, unfortunately... like... often, buying into one of these egregores (usually this would be unconsciously done) actually makes a person more effective. Sometimes quite 'successful' according to the egregore's standards (rich, powerful, well-respected, etc). The egregores know how to churn out 'effective' people. But these people are 'effective' in service to the egregore. They're not necessarily effective outside of that context.
So, any sincere and earnest movement has to contend with this eternal temptation:
The egregore tempts you with its multitude of resources. To some extent, I think you have to engage. Since you're trying to ultimately change the direction of history, right?
Still, ahhh, tough. Tough call. Tricky.
This egregore wants everyone to feel persecuted. ... Do the rationalists feel persecuted / victimized? Oh yeah. ... So they haven’t successfully seen through this one.
Note that this doesn't follow. It might be, for example, that the egregore causes (some) people to feel persecuted by causing them to be persecuted.
(Admittedly I'm not sure I know what it means to "see through" an egregore. Like, if you "see through" capitalism, you... recognize capitalism as an egregore that demands etc.? You recognize that there may be other ways to organize a society, though you may or may not think any of those other ways are preferable all things considered? You want fewer things that you don't need?
But presumably, "seeing through" it doesn't extract you from your capitalist society, if you live in one; you still need a job to get money, and you still need money to purchase goods and services, and so on. And if you don't live in a capitalist society but a capitalist society is coming to take your land and separate you from your children, "seeing through" capitalism doesn't protect you from that either.
And so presumably, "seeing through" an egregore that wants you to feel persecuted, doesn't make you not-persecuted. It might make you not-feel-persecuted if you're in-fact not persecuted.)
I dunno if I was clear enough here about what it means to feel persecuted.
So the way I'm using that phrase, 'feeling persecuted' is not desirable whether you are actually being persecuted or not.
'Feeling persecuted' means feeling helpless, powerless, or otherwise victimized. Feeling like the universe is against you or your tribe, and that things are (in some sense) inherently bad and may forever be bad, and that nothing can be done.
If, indeed, you are part of a group that has fewer rights and privileges than the dominant groups, you can acknowledge to yourself "my people don't have the same rights as other people" but you don't have to feel any sense of persecution around that. You can just see that it is true and happening, without feeling helpless and like something is inherently broken or that you are inherently broken.
Seeing through the egregore would help a person realize that 'oh there is an egregore feeding on my beliefs about being persecuted but it's not actually a fundamental truth about the world; things can actually be different; and I'm not defined by my victimhood. maybe i should stop feeding this egregore with these thoughts and feelings that don't actually help anything or anyone and isn't really an accurate representation of reality anyway.'
So I don't really want to get into this, my note was about the structure of the argument rather than factual claims about the world. But...
I think I feel motte-and-baileyed? When I read your original comment with the term "feel persecuted" I'm like "eh, dunno, sounds plausible I guess?". When I read it trying to substitute in the definition you give I'm like "...mm, skeptical".
Like I get that jargon sometimes just has that effect, I'm not currently saying you shouldn't use that term with that meaning. But that's my reaction.
(If you do want a different hook to use, it sounds like "feel persecuted, and also be clinically depressed" is tongue-in-cheek kinda close to what you describe? Though bringing in the concept of "depression", and especially "clinical" depression, may not help see things clearly either.)
No, it's definitely not about being depressed. That's very far from it. But I also don't want to argue about the claims here. Seems maybe beside the point.
I think I could reword my original argument in a way that wouldn't be a problem. I just wasn't careful in my languaging, but I personally think it's fine? I think you might be reading a lot into my usage of the word "So".
Scientism has ordained priests that have special access to journals (knowledge) and special privileges that give them the ability to publish in those esoteric texts.
The Ivermectin case seemed that journals are not important to Scientism. Nobody cared about peer-reviewed meta-analysis when those went counter to institutional positions.
How do you deal with the fact that people in this culture, esp rationalists?, get all sensitive around being evaluated? You need to evaluate people, in the end, because you don't have the ability to train everyone who wants it, and not everyone is ready or worth the investment. But then people tend to get all fidgety and triggered when you start putting them in different buckets
Uhm, some kind of Comfort Zone Expansion? Make people participate in competitions where they will predictably lose; and then they realize that life goes on.
Also, some kind of: "you should sincerely hope that you are not the smartest person on this planet, because if the smartest person on this planet is you, then frankly we are all going to die (and yes, this includes you)... but if there are many people smarter than you, then perhaps we might survive the AI" perspective.
Another reframe: "ego is for losers who are incapable of facing the reality. what is true is already true...".
"I am special, I have something to offer"
If everyone is special, no one is. But in reality, some people are special-special, and most people are just ordinary-special. Statistically, you most likely belong to the latter group.
But the good news is that you have something to offer even if you are not special! Some things need to be done repeatedly, or at multiple places (such as organizing a local LessWrong meetup). Some things are important but not the most important, which is why the special-special people do not have enough time to do them, so it's up to you.
Be honest and admit that it's not about what you can offer, but what status you hope to get in return.
"Things should be fair, everyone should have the same opportunities."
you're working with people who were socialized from a young age to identify with their own intelligence as a major part of their self-worth, and then they come into your community, feeling like they've finally found their people, only to be told: "Sorry you're not actually cut out for this work. It's not about you."
What a hypocrisy! If only highly intelligent people are worthy, then most humans never got a real opportunity. But those don't matter, I suppose; only the members of the intellectual elite should have the same opportunities. You are a member of the elite, but are not a member of the elite-within-elite. Congratulations, now you can better empathize with the intellectual 98%.
People who are not fit for the elite work can still be welcome in the rationalist community.
The egregores that are dominating mainstream culture and the global world situation are not just sitting passively around while people try to train themselves to break free of their deeply ingrained patterns of mind.
Wait, don't generalize so quickly. Perhaps being in Bay Area is playing on hard mode, from this perspective. Move to a place where people are... more emotionally capable of being told they are not the planet's #1.
At some point I hoped that CFAR would come up with "rationality trials", toy challenges that are difficult to game and transfer well to some subset of real world situations. Something like boxing, or solving math problems. But a new entry in that row.
IMO standardized tests of this form are hard; I was going to say "mainstream academia hasn't done much better" but Stanovich published something in 2016 that I'm guessing no one at CFAR has read (except maybe Dan?). I am not aware of any sustained research attempts on CFAR's part to do this. [My sense is lots of people looked at it for a little bit, thought "this is hard", and then dug in ground that seemed more promising.]
I think there are more labor-intensive / less clean alternatives that could have worked. We could have, say, just made the equivalent of Bridgewater Baseball Cards for rationality, and had people rate each other. This is sadly still a little 'marketing' instead of 'object-level' (the metric of "am I open-minded?" grounds in "do other people think I'm open-minded?" instead of just pointing at my brain and the environment), and maybe is painful for the people involved / gets them doing weird mental patterns instead of healthy mental patterns. But I think the visibility would have been good / it would have been easier to tell when someone is making 'real progress' vs. 'the perception of progress'.
According to the yoga traditions I am familiar with, uninvestigated/impure/mixed motives are quite a big deal and a primary predictor of success in self transformation. Glad to see it in the hypothesis space. A central example of this is that if you're in the self help space for a while you'll notice that many people are coming to you with the surface story of wanting change, but behaviors consistent with wanting fancy indirect excuses to not change, including things like being able to protest that you went to expensive workshops and everything and this proves that X really is intractable. Kegan refers to this as immunity to change, I like calling it the homeostatic prior, and relatedly at some point I got a doomy sense about CFAR after inquiring with various people and not being able to get a sense of a theory of change or a process that could converge to a theory of change for being able to diagnose this and other obstacles.
and relatedly at some point I got a doomy sense about CFAR after inquiring with various people and not being able to get a sense of a theory of change or a process that could converge to a theory of change for being able to diagnose this and other obstacles.
Can you say a bit more about what kind of a "theory of change" you'd want to see at CFAR, and why/how? I still don't quite follow this point.
Weirdly, we encountered "behaviors consistent with wanting fancy indirect excuses to not change" less than I might've expected, though still some. This might've been because a lot of the "bugs" people tackled at the workshop were more like "technical barriers," and less like what Kenzi used to call "load-bearing bugs." Or maybe I missed a lot of it, or maybe ... not sure.
what kind of a "theory of change"
Every org has a tacit theory of change implied by what they are doing, some also have an explicit one (eg poor to middling examples: business consulting orgs). Sometimes the tacit one lines up with the explicit one, sometimes not. I think having an explicit one is what allows you to reason about and iterate towards one that is functional. I don't know the specific theory of change that would be a good fit for what CFAR was trying to do, I was, at the time, bouncing off the lack of any explicit one and some felt sense of resistance towards moving in the direction of having one in 1 on 1 conversations. I think I was expecting clearer thoughts since I believed that CFAR was in the business of investigating effect sizes of various theories of change related to diagnosing and then unblocking people who could work on x-risk.
Weirdly, we encountered "behaviors consistent with wanting fancy indirect excuses to not change" less than I might've expected, though still some.
This gets much stronger once you get big effect sizes that touch on core ways of navigating the world someone holds.
Sorry for leaving a comment after only reading the summary, so maybe this is addressed in the text, but I think I have a more concrete version of what I read as the theory being falling into a trap of a local maximum.
CFAR is just too weird.
I know lots of people like weird, but weird is self-limiting. And I don't mean cute, "lol i have so many plants, i'm so weird", within the normal person overton window weird, but proper normal-people-won't-really-understand-you weird.
One of the great lessons I've learned from my years of Zen practice is "don't talk about the weird". There that means stuff like don't talk about enlightenment, what happens during meditation (except with your teacher or a close dharma friend), or the things you can only know by experiencing them for yourself. It's a distraction and only rarely helpful. Mostly you need to just keep at the everyday practice of Zen.
I claim rationality needs the same lesson. Lots of this stuff about rationality is actually, properly weird. And for the kind of person who enjoys it, they want to lean into the weird. This is a mistake. This kind of weird is for adepts and teachers to talk shop about on rare occasion. Everyday folks need to hear about the normal, everyday practice of a thing in order to do it and have it relate to their life.
Weird seems fine to Less Wrong because it's tiered: you can find the weird on all posts, the normal on curated (or at least relatively normal). CFAR, to really succeed at what I see as its mission (bring rationality to the masses), needed to be the most normal version of rationality possible and as best I can tell it totally failed at this (e.g. CFAR seems single handedly responsible for dozens of bits of new jargon that could have been talked about in normal language without jargon).
The good news is I think it's possible to start over if someone wanted. There's some evidence this could work. For example, in the last 20 years cognitive bias training has become common. A lot of it is BS presentations rather than actual training, but there's roots of rationality stuff out in the water among normal folks working jobs in big corporations. So we have some proof-of-concept that this is possible, but it requires optimizing for understanding by normal people, not the sort of weird people who are willing to spend their own money to get better at rationality.
(Note: I think CFAR succeeded in some important other ways. Those are out of scope for this comment, though.)
CFAR, to really succeed at what I see as its mission (bring rationality to the masses), needed...
IMO (and the opinions of Davis and Vaniver, who I was just chatting with), CFAR doesn't and didn't have this as much of its mission.
We were and are (from our founding in 2012 through the present) more focused on rationality education for fairly small sets of people who we thought might strongly benefit the world, e.g. by contributing to AI safety or other high-impact things, or by adding enrichment to a community that included such people. (Though with the notable exception of Julia writing the IMO excellent book "Scout Mindset," which she started while at CFAR and which I suspect reached a somewhat larger audience.)
I do think we should have chosen our name better, and written our fundraising/year-end-report blog posts more clearly, so as to not leave you and a fair number of others with the impression we were aiming to "raise the sanity waterline" broadly. I furthermore think it was not an accident that we failed at this sort of clarity; people seemed to like us and to give us money / positive sentences / etc. when we sounded like we were going to do all the things, and I failed to adjust our course away from that local reward of "sound like you're doing all the things, so nobody gets mad" to "communicate what's actually up, even when that looks bad, so you'll be building on firm ground."
One Particular Center for Helping A Specific Nerdy Demographic Bridge Common Sense and Singularity Scenarios And Maybe Do Alignment Research Better But Not Necessarily The Only Or Primary Center Doing Those Things
We were and are (from our founding in 2012 through the present) more focused on rationality education for fairly small sets of people who we thought might strongly benefit the world, e.g. by contributing to AI safety or other high-impact things, or by adding enrichment to a community that included such people.
Maybe this was a wrong strategy even given your goals.
Imagine that your goal is to train 10 superheroes, and you have the following options:
A: Identify 10 people with greatest talent, and train them.
B: Focus on scaling. Train 10 000 people.
It seems possible to me that the 10 best heroes in strategy B might actually be better than the 10 heroes in strategy A. Depends on how good you are at identifying talented heroes, whether the ones you choose actually agree to get trained by you, what kinds of people self-select for the scaled-up training, etc.
Furthermore, this is actually a false dilemma. If you find a way to scale, you can still have a part of your team identify and individually approach the talented individuals. They might be even more likely to join if you tell them that you already trained 10 000 people but they will get an individualized elite training.
The thing I want most from LessWrong and the Rationality Community writ large is the martial art of rationality. That was the Sequences post that hooked me, that is the thing I personally want to find if it exists, that is what I thought CFAR as an organization was pointed at.
When you are attempting something that many people have tried before- and to be clear, "come up with teachings to make people better" is something that many, many people have tried before- it may be useful to look and see what went wrong last time.
In the words of Scott Alexander, "I’m the last person who’s going to deny that the road we’re on is littered with the skulls of the people who tried to do this before us. . . We’re almost certainly still making horrendous mistakes that people thirty years from now will rightly criticize us for. But they’re new mistakes. . . And I hope that maybe having a community dedicated to carefully checking its own thought processes and trying to minimize error in every way possible will make us have slightly fewer horrendous mistakes than people who don’t do that."
This article right here? This is a skull. It should be noticed.
If the Best Of collection is for people who want a martial art of rationality to study then I believe this article is the most important entry, and it or the latest version of it will continue to be the most important entry until we have found the art at last. Thank you Anna for trying to build the art. Thank you for writing this and publishing it where anyone else about to attempt to build the art can take note of your mistakes and try to do better.
(Ideally it's next to a dozen things we have found that we do think work! But maybe it's next to them the way a surgeon general's warning is next to a bottle of experimental pills.)
Coming back to this, I think "martial art of rationality" is a phrase that sounds really cool. But there are many cool-sounding things that in reality are impossible, or not viable, or just don't work well enough. The road from intuition about a nonexistent thing, to making that thing exist, is always tricky. The success rate is low. And the thing you try to bring into existence almost always changes along the way.
Hi Anna, I never came to one of your workshops (far too culty for me!), but I did read your handbook (2019 edition) and found it full of useful tips, particularly TAPS, inner simulator/murphyjitsu, focussing, shaping, polaris, comfort zone expansion, and yoda timers were all new to me, and are all things that I've used occasionally ever since. They've worked a treat whenever I've been in a situation where I remembered to use them. TAPs and shaping I think are now core parts of the way I approach things.
A lot of the other things in there: (units of exchange, bucket errors, systemization, hamming questions, double crux, pedagogical content knowledge, socratic ducking, gears-level understanding, area under curve) were more formal versions of ways of thinking I've had since childhood.
And a lot of the rest of it looked useful, but hasn't made it into my toolkit mainly because I didn't spend the time to think about it and practise it. Seeing this post reminds me that I meant to re-read the handbook and see what else I could mine from it.
Nothing life-changing, but you've certainly made me into a more efficient and focussed, and possibly slightly happier, ineffective doomer than I was.
At the very least you've collectively written the best self-help manual I've ever read. At least as good for my soul as 'How to Win Friends and Influence People', and 'The Inner Game of Tennis'. Which I hope you'll take as the very high praise I intend.
I really think that's not bad, and I look forward to a new edition of the handbook to read one day.
I wish we had. Unfortunately, I don't think we did much in the way of pre-portems on our long-term goals, unless I'm forgetting something. (We discussed "what if CFAR doesn't manage to become financially able to keep existing at all", and "what if particular workshops can't be made to work," but those are shorter term.) Eliezer's sequence "The craft and the community" was written before CFAR, but after he wanted an independent rationality community that included rationality training, so we could try to compare what happened against that.
Yeah, not clear what this particular scenario would have looked like then. "We succeed financially, we get good feedback from satisfied customers, but our rationality training doesn't seem to make the alumni measurably more "rational", and so we stop."
I mean, at my very first CFAR workshop (2012, I think?), I was of the opinion "come on guys, it's almost all selection effects, I'm here for the networking / actually meeting people off the internet", and so to some extent that me wouldn't have been that surprised. If anything, I think he would have been positively surprised that CFAR generated a handful of techniques that seem real to me, and also popularized some techniques that seem real to me that I'm not sure I would have otherwise come across (mostly Circling; Focusing I already would have come across from looking up the Litany of Gendlin, I think.).
Minor point: Yes, your workshop was May 2012. That was CFAR's first workshop (what was then still called a "minicamp" due to CFAR's spiritual predecessor).
I mean, that kind of is the idea in Eliezer's post "Schools proliferating without evidence," from two years before CFAR was founded.
(Minus the "so we stop" part.)
Thus, AI safety did not end up serving a “reality tether” function for us, or at least not sufficiently.
Due to AI safety's absence of short feedback loops, it seems obvious to me that discussing AI safety in a rationalty training camp would pull participants away from reality (and, ironically, rationality). I predict any training camp that attempted to mix rationality with AI safety would fall into the same trap.
A rationality camp is a cool idea. An AI safety camp is a cool idea. But a rationality camp + AI safety camp is like mixing oxygen with hydrogen.
I mean... "are you making progress on how to understand what intelligence is, or other basic foundational issues to thinking about AI" does have somewhat accessible feedback loops sometimes, and did seem to me to feed back in on the rationality curriculum in useful ways.
I suspect that if we keep can our motives pure (can avoid Goodhardting on power/control/persuasion, or on "appearance of progress" of various other sorts), AI alignment research and rationality research are a great combination. One is thinking about how to build aligned intelligence in a machine, the other is thinking about how to build aligned intelligence in humans and groups of humans. There are strong analogies in the subject matter that are great to geek out about and take inspiration from, and somewhat different tests/checks you can run on each. IMO Eliezer did some great thinking on both human rationality and the AI alignment problem, and on my best guess each was partially causal of the other for him.
One is thinking about how to build aligned intelligence in a machine, the other is thinking about how to build aligned intelligence in humans and groups of humans.
Is this true though? Teaching rationality improves capability in people but shouldn't necessarily align them. People are not AIs, but their morality doesn't need to converge under reflection.
And even if the argument is "people are already aligned with people", you still are working on capabilities when dealing with people and on alignment when dealing with AIs.
Teaching rationality looks more similar to AI capabilities research than AI alignment research to me.
Teaching rationality looks more similar to AI capabilities research than AI alignment research to me.
I love this question. Mostly because your model seems pretty natural and clear, and yet I disagree with it.
To me it looks more like AI alignment research, in that one is often trying to align internal processes with e.g. truth-seeking, so that a person ends up doing reasoning instead of rationalization. Or, on the group level, so that people can work together to form accurate maps and build good things, instead of working to trick each other into giving control to particular parties, assigning credit or blame to particular parties, believing that a given plan will work and so allowing that plan to move forward for reasons that're more political than epistemic, etc.
That is, humans in practice seem to me to be partly a coalition of different subprocesses that by default waste effort bamboozling one another, or pursuing "lost purposes" without propagating the updates all the way, or whatnot. Human groups even more so.
I separately sort of think that in practice, increasing a person's ability to see and reason and care (vs rationalizing and blaming-to-distract-themselves and so on) probably helps with ethical conduct, although I agree this is not at all obvious, and I have not made any persuasive arguments for it and do not claim it as "public knowledge."
Ah, I see your point now, and it makes sense. If I had to summarize it (and reword it in a way that appeals to my intuition), I'd say that the choice of seeking the truth is not just about "this helps me," but about "this is what I want/ought to do/choose". Not just about capabilities. I don't think I disagree at this point, although perhaps I should think about it more.
I had the suspicion that my question would be met with something at least a bit removed inference-wise from where I was starting, since my model seemed like the most natural one, and so I expected someone who routinely thinks about this topic to have updated away from it rather than not having thought about it.
Regarding the last paragraph: I already believed your line "increasing a person's ability to see and reason and care (vs rationalizing and blaming-to-distract-themselves and so on) probably helps with ethical conduct." It didn't seem to bear on the argument in this case because it looks like you are getting alignment for free by improving capabilities (if you reason with my previous model, otherwise it looks like your truth-alignment efforts somehow spill over to other values, which is still getting something for free due to how humans are built I'd guess).
Also... now that I think about it, what Harry was doing with Draco in HPMOR looks a lot like aligning rather than improving capabilities, and there were good spill-over effects (which were almost the whole point in that case perhaps).
CFAR's focus on AI research (as opposed to raising the rationality water line in general) leads me to two questions:
Based on that: Shouldn't it be an important goal to test and popularize rationality techniques outside of subcultures in AI research if one wants to solve the alignment problem in practice? (Whether that is a job for CFAR or someone else is a different question, of course).
I suspect there's a contradiction between "Politics is the Mind Killer" and "Something to Protect", in terms of the combination of training rationality (especially epistemic rationality) and evaluating real-world decisions, on topics where the instructors believe they've already come to the correct conclusion.
The AI-Safety corner of EA seems quite likely to be a topic that is hard-mode for the study of rationality.
I suspect we need to engage with politics, or with noticing the details of how rationality (on group-relevant/political topics) ends up in-practice prevented in many groups, if we want to succeed at doing something real and difficult in groups (such as AI safety).
Is this what you mean?
One of the big modeling errors that I think was implicit in CFAR through most of its history, was that rationality was basically about making sure individuals have enough skill for individual reasoning, rather than modeling it as having a large component that is about resisting pressures from groups/memeplexes. (Some folks actually did push the latter model, at some points in our history, but even when pushed it sort of wasn't clear what to do about that, at least not to me.)
I think I meant something in the more general sense of political issues being important topics on which to apply rationality, but very poor topics on which to learn or improve rationality. Trying to become stronger in the Bayesean Arts is a different thing than contributing to AI Safety (and blended in difficult ways with evaluating AI Safety as a worthy topic for a given aspiring-rationalist's time).
For resisting pressure and memeplexes, this is especially true, if most/all of the guides/authorities have bought into this specific memeplex and aren't particularly seeking to change their beliefs, only to "help" students reach a similar belief.
I didn't follow CFAR that closely, so I don't know how transparent you were that this was a MIX of rationality improvement AND AI-Safety evangelism. Or, as you'd probably put it, rationality improvement which clearly leads to AI-Safety as an important result.
I didn't follow CFAR that closely, so I don't know how transparent you were that this was a MIX of rationality improvement AND AI-Safety evangelism.
How transparent we were about this varied by year. Also how much different ones of us were trying to do different mixes of this by different programs varied by year, which changed the ground truth we would've been being transparent about. In the initial 2012 minicamps, we were part of MIRI still legally and included a class or two on AI safety. Then we kinda dropped it from the official stuff, I still had it as a substantial motivation, Julia and some of the others didn't I think, it manifested for me mostly in trying to retain control of the organization and in e.g. wanting to get things like Bayes in the curriculum (b/c I thought those needed/helpful for parsing the AI risk argument) and in choices of who to admit. Later (2016? I don't remember) we brought it back in more explicitly in our declared missions/fundraiser posts/etc., as "rationality for its own sake, for the sake of existential risk." Also later (2015 on, I think) we ran some specialized AI safety programs, while still not having AI content explicitly in the mainline.
wanting to get things like Bayes in the curriculum (b/c I thought those needed/helpful for parsing the AI risk argument)
I do not think this is true. I snapped to 'Oh God this is right and we're all dead quite soon' as a result of reading a short story about postage stamps something like fifteen years ago, and I was totally innocent of Bayesianism in any form.
It's not a complicated argument at all, and you don't need any kind of philosophical stance to see it.
I had exactly the same 'snap' reaction to my first exposure to ideas like global warming, overpopulation, malthus, coronavirus, asteroids, dysgenics, animal suffering, many-worlds, euthanasia, etc ad inf. Just a few clear and simple facts, and maybe a bit of mathematical intuition, but nothing you wouldn't get from secondary school, lead immediately to a hideous or at least startling conclusion.
I don't know what is going on with everyone's inability to get these things. I think it's more a reluctance to take abstract ideas seriously. Or maybe needing social proof before thinking about anything weird.
I don't even think it's much to do with intelligence. I've had conversations with really quite dim people who nevertheless 'just get' this sort of thing. And many conversations with very clever people who can't say what's wrong with the argument but nevertheless can't take it seriously.
I wonder if it's more to do with a natural immunity to peer pressure, and in fact, love of being contrarian for the sake of it (which I have in spades, despite being fairly human otherwise), which may be more of a brain malformation than anything else. It feels like it's related to a need to stand up for the truth even when (possibly even because) people hate you for it.
Maybe the right path here is to find the already existing correct contrarians, rather than to try to make correct contrarians out of normal well-functioning people.
And... my guess in hindsight is that the "internal double crux" technique often led, in practice, to people confusing/overpowering less verbal parts of their mind with more-verbal reasoning, even in cases where the more-verbal reasoning was mistaken.
I'm confused about this. The way I remember it tough was very much explicitly against this, I.e:
For me IDC was very helpful to teach me how to listen to my non verbal parts. Reflecting on it, I never spent much time on the actual cruxing. When IDC-ing I mostly spend time on actually hearing both sides. And when all the evidence is out, the outcome is most often obvious.
But it was the IDC lesson and the Focusing lesson that thought me these skills. Actually even more important than the skill was to teach me this possibility.
For me probably the most important CFAR lesson was the noticing and "double-clicking" on intrusion. The one where Anna puts a glass of water on the edge of a table and/or writes expressions with the wrong number of parenthesises.
Do most people come away from a CFAR workshop listening less to their non verbal parts?
I'm not surprised if people listning less to their non verbal parts happens at all. But I would be surprised if that's the general trend.
On the surface Anna provides one datapoint, which is not much. But the fact that she brings up this datapoint, makes me suspect it's representative? Is it?
Is there some kind of metric tracking net positive impact on the world as a result of CFAR workshops? For ex: YCombinator has made a decent amount of money and they can measure how well they did over a period of years. I understand CFAR is not a startup incubator, but I would imagine there are things you can track (e.g: fitness, income, citations etc).
I think as a starting point something like physical fitness should be tracked (e.g: you could start a running club and measure if they were able to run a 10k after the workshop is over) and publish statistics on this. Another example would be to make people grind on Leetcode and improve their income.
For many people, if there is no information on how well an organisation's members are doing -- there is no incentive to join that organisation.
I don't know what kind of people are interested in CFAR in the first place[1], so maybe there is no market for something like this. Sorry for the somewhat rambling comment, but I think the basic issue was too much variation in goals of the people.
[1] I was never interested because I'm very anti-woo stuff and think Circling is ridiculous and insane etc. I got the impression a lot of people in the community were into things like this (which is fine but not for me).
Is there some kind of metric tracking net positive impact on the world as a result of CFAR workshops?
There's the longitudinal study. I do think that CFAR was, generally speaking, good for participants that attended its workshops, while suspecting that 'more was possible' and it performed somewhat poorly on its main goals.
[Like, if the goal of CFAR had been more like "increase life-satisfaction QALYs", then I think having a broad impact would have been much better, and it would have moved more from "workshops that can cultivate large changes for small numbers of people" to "online classes that can cultivate small changes for large numbers of people".]
Participants may want to learn “rationality”/“CFAR techniques”/etc. so that they can feel cool, so others will think they’re cool, so they can be part of the group, so they can gain the favor of a “teacher” or other power structure, etc.
So what? Just embrace it, learn a ton of techniques, some of them will be useless. Probably still way better than doing nothing. Later you can selectively drop the techniques that feel useless.
(What I am trying to do here is to put the risk of imperfect action as an alternative to the risk of inaction.)
This could go wrong if people keep inventing techniques for the sake of keeping people busy. Happens in cults or for-profit organizations: as long as you have customers, keep shoveling new techniques at them, and they keep giving you their money; also, make the new techniques require more time and money, so the customers cannot complete your course too fast. This could be prevented by CFAR publishing their official list of rationality techniques for free, and publishing an update every two years.
Note: "cool" is good; it motivates you to persevere in your efforts. Becoming more rational should be fun! (Otherwise, you will never raise the sanity waterline of the population at large.)
But I would predict that people will try the "cool" techniques for a year or two, then burn out. The more they can learn during that one active year, the better.
Unfortunately, our CFAR curriculum development efforts mostly had no such strong outside mooring. That is, CFAR units rose or fell based on e.g. how much we were personally convinced they were useful [...] but not based (much/enough) on whether those units helped us/them/whoever make real, long-term progress on outside problems.
Suggestion: split CFAR into "those who invent the techniques" and "those who teach them". Have the latter group find some different audiences (e.g. scientists, artists, entrepreneurs, students...) and test the techniques on them and provide reports to the inventors.
I suspect these features made the workshop worse than it would otherwise have been at allowing real conversations, allowing workshop participants, me, other staff, etc. to develop a real/cooperative art of rationality, etc. (Even though these sorts of "minor deceptiveness" are pretty "normal"; doing something at the standard most people hit doesn't necessarily mean doing it well enough not to get bitten by bad effects.)
How about doing exactly this... but at the end admitting what you did and explaining why?
Otherwise, you will never raise the sanity waterline of the population at large.
I want to reiterate (stated elsewhere in this thread) that the goals of CFAR were not to raise the sanity waterline of the population at large.
Hi! I was writing this originally as a comment-reply to this thread, but my reply is long, so I am factoring it out into its own post for easier reading/critique.
This is more comment-reply-quality than blog post quality, so read at your own risk. I do think the topic is interesting.
Short version of my thesis: It seems to me that CFAR got less far with "make a real art of rationality, that helps people actually make progress on tricky issues such as AI risk" than one might have hoped. My lead guess is that the barriers and tricky spots we ran into are somewhat similar to those that lots of efforts at self-help / human potential movement / etc. things have run into, and are basically "it's easy and locally reinforcing to follow gradients toward what one might call 'guessing the student's password', and much harder and much less locally reinforcing to reason/test/whatever one's way toward a real art of rationality. Also, the process of following these gradients tends to corrupt one's ability to reason/care/build real stuff, as does assimilation into many parts of wider society."
Epistemic status: “personal guesswork”. In some sense, ~every sentence in the post deserves repeated hedge-wording and caveats; but I’m skipping most of those hedges in an effort to make my hypotheses clear and readable, so please note that everything below this is guesswork and might be wrong. I am sharing only my own personal opinions here; others from past or current CFAR, or elsewhere, have other views.
Conversational context, leading up to this post-length comment-reply:
I wrote:
gjm replied:
And later, gjm again:
So, that's the prior conversational context. Now for my long-winded attempt to reply, and to explain my best current guess at why CFAR didn't make more progress toward an actually-useful-for-understanding-AI-or-other-outside-things art of rationality.
I'll write it by quoting some of gjm's hypotheses, with some of my own added, in an order that is convenient to me, and with my own numbering added. I'll skip the hypotheses that seem inapplicable/inaccurate to me, and just quote the ones that I think are at least partially descriptive of what happened.
Re: "Hypothesis 1 (from gjm): In [the self-help/rationality] space it is difficult to distinguish effective from ineffective interventions, which means that individuals and organizations are at risk of drifting into unfalsifiable woo."
Yes. It is hard (beyond my skill level, and beyond the skill level of others I know AFAICT) to figure out the full intended functions of various parts of the psyche.
So, when people try to re-order their own or other peoples’ psyches based on theories of what’s useful, it’s easy to mess things up.
For example, I’ve heard several stories from adults who, as kids, decided to e.g. “never get angry” (read: “to dissociate from their anger”), in an effort not to be like an angry parent or similar.
Most people would not make that particular mistake as adults, but IMO there are a lot of other questions that are tricky even as an adult, including for me (e.g.: what is suffering, is it good for anything, is it okay to mostly avoid states of mind that seem to induce it, what’s up with denial and mental flinches, is it okay to remove that, does a particular thing that looks like ‘removing’ it remove it all the way or just dissociate things, what’s up with the many places where humans don’t seem very goal-pursuing/very agent-like, is it okay to become able to 'do my work' a lot more often, is it good/okay to become poly, is it workable to avoid having children despite really wanting to or does this risk something like introducing a sign error deep in your psychology …)
So, IMO, the history of efforts at self-improvement or rationality or the human potential movement or similar is full of efforts to rewire the psyche into molds that seem like a good idea initially, and sometimes seem like a bad idea in hindsight. And this sort of error is a bit tricky to recover from, because, if you’re changing how your mind works or how your social scene works, you are thereby messing with the faculties you’ll later need to use to evaluate the change, and to notice and recover from errors.
I think this is a significant piece of how we got stuck.
Re: "[Hypothesis 2] (from me): The motives used in rewiring parts of the psyche (e.g. at ‘self-help’ programs) are often impure. This impurity leads to additional errors in how to rewire oneself/others, beyond those that would otherwise be made by someone making a simple/honest best guess at what’s good."
I suspect that "impure motives" (motives aimed at some local goal, and not simply at "help this mind be free and rational") were also a major contributor to what kept us from getting farther at CFAR, and that this interacted with and exacerbated the "model gaps" I was listing in hypothesis 1.
Some examples of the “impure” motives I have in mind:
In groups:
(“Wanting”, here, doesn’t need to mean “conscious, endorsed wanting”; it can mean “doing gradient-descent from these motives without consciously realizing what you’re doing.”)
Things get more easily wonky in groups, but even in the simpler case of a single individual there is IMO lots of opportunity for “impure” motives:
So, in summary: people's desires to control one another, or fool one another, can combine poorly with techniques for psychological self- or other- modification. So, too, with people's desires to control their own psyches, or to fool themselves, or to dodge uncertainty. Such failure modes are particularly easy because we do not have good models of how the psyche, or the sociology, ought to work, and it is relatively easy to manage to be "honestly mistaken" in convenient ways in view of that ignorance.
Re: Hypothesis 3: (from me): "Insufficient feedback loops between our 'rationality' and real traction on puzzles about the physical world"
In Something to Protect , Eliezer argues that the real power in rationality will come when it is developed for the sake of some outside thing-worth-caring-about that a person cares deeply about, rather than developed for the sake of "being very rational" or some such.
Relatedly, in Mandatory Secret Identities, Eliezer advocates requiring that teachers of rationality have a serious day job / hobby / non-rationality-teacher engagement with how to do something difficult, and that they do enough real accomplishment to warrant respect in this other domain, and that no one be respected more as a teacher of rationality than as an accomplisher of other real stuff. That is, he suggested we try to get respect for real traction on real, non-"rationality" tasks into any rationality dojo's social incentives.
Unfortunately, our CFAR curriculum development efforts mostly had no such strong outside mooring. That is, CFAR units rose or fell based on e.g. how much we were personally convinced they were useful, and how much the students seemed to like them and seemed to be changed by them, how much we liked the resultant changes in our students, (both immediately and at follow-ups months or years later), etc. -- but not based (much/enough) on whether those units helped us/them/whoever make real, long-term progress on outside problems.
In hindsight, I wish we had tried harder to tether our art-development to "does it help us with real outside puzzles/work/investigations/building tasks of some sort." This seems like the sort of factor that could in principle keep an "art of rationality" on a path to being about the outside world.
At the same time, taking such outside traction seriously seems quite difficult to pull off, and in ways I expect would also have made it difficult for most other groups in our shoes to pull off (and that I suspect also affected e.g. most self-help / human potential movement/ etc. efforts). So I'd like to sketch why this is hard.
a) Taking "does this 'rationality technique' help with the real world?" seriously, pulls against local incentive gradients.
Doing things with the real-world "slows you down" and makes your efforts less predictable-to-yourself (which I and others often experience as threatening/unpleasant, vs being more able to 'make up' which things can be viewed as successful). Furthermore, relatedly, such outside "check how this works in real tasks" steps are unlikely to "feel rewarding", or to cause others to think you're cool, or to cause your units to feel more compelling locally to the social group. (Appearing to have done real-world checks might make your units more socially compelling in some groups. But unfortunately this creates a pull toward "feeling as though you've done it" or "causing others to feel as though you've done it," not toward the difficult, hard-to-track work of having actually sussed out what helps in a puzzling real-world domain.)
Thus, it’s easy for those strands within an organization/effort that attempt to take real-world traction seriously, to be locally outcompeted by strands not attempting such safeguards.
That is, a CFAR instructor / curriculum-developer who initially has some interest in both approaches, will "naturally" find their attention occupied more and more by curriculum-development efforts that skip the slow/unpredictable loop of "check whether this helps with real-world problem-solving. Similarly, an ecosystem involving several "rationality developers," some of whom do the one and some the other, will "naturally" find more of its attention heading to the person who is more like "guessing the students' passwords", and less like "tracking whether this helps with building real-world maps that match the territory, in slow, real-world, messy domains."
b) In practice, many/most domains locally incentivize social manipulation, rather than rationality.
Lots of people who came to CFAR's past workshops (like people ~everywhere) wanted to succeed at lots of different things-in-their-lives. E.g. they wanted to do well in grad school or in careers, or to have good relationships with particular people, or get better at public speaking, or get more done at their EA job, or etc.
One might have hoped (I originally did hope) that folks' varied personal goals would provide lots of fodder for developing rationality skill, and that this process would provide lots of fodder for developing an art of rationality.
However, I now like asking about a person's notions of doing "well" in a domain, whether local signals-they-will-interpret-as-progress are more easily obtained by:
It unfortunately seems to me that for most of the goals people come in with, and for most of the ways that people tend initially to evaluate whether they are making progress on that goal, the "help them feel as though they're making progress on this goal" gradient tends more like toward skill at manipulating themselves and/or others, and less like toward skill at predicting and manipulating the physical world.
So, if a person is to take "does this so-called 'rationality technique' actually help with real-world stuff?" seriously as a feed-in to the developments of a real and grounded art of rationality, they'll need to carefully pick domains of real-world stuff that are actually about the ability to model the physical world, which on my model are unfortunately relatively rare. (E.g., "doing science" works, but "being regarded as having done good science" only sort-of works; some parts of finance seem to me to work, but some parts really don't; etc.)
c) In practice, my caring about AI safety (plus the way I was conceptualizing it) often pulled me toward patterns for influencing people, rather than toward a real art of rationality
I might have hoped that “solve AI, allow human survival” would be an instance of “something to protect” for some of us, and that our caring about AI safety would help ground/safeguard the rationality curriculum. I.e., I might have hoped that my/our desire to have humanity survive, would lead us to want to get real rationality techniques that really work, and would lead us away from dynamics such as those in Schools Proliferating without Evidence, and toward something grounded and real.
But, no. Or at least, not nearly as much as was needed. AI safety was indeed highly motivating for some of us (at minimum, for me), but the feedback loops were too long for “is X actually helping with AI safety?” to give the “but does it actually work in reality?” tether to our art. (Though we got some of that sometimes; the programs attempting to aid MIRI research were sometimes pretty fruitful and interesting, with the thoughts on AI feeding back into better understandings of how to reason, and with some techniques, e.g. "Gendlin's 'Focusing' for research" gaining standing as a result of their role in concrete research progress sometimes.)
And in addition to the paucity of data as to whether our techniques were helping with research, there was a presence of lots and lots of data and incentives as to whether our techniques were e.g. moving people to take up careers in AI safety, moving people to think we were cool, moving people to seem like they were likely to defer to MIRI or others I thought were basically good or on-path, etc. On my model, these other incentives, and my responses to them, made my and some others' efforts worse.
I did 'try' to be virtuous. But reality is a harsh place, with standards that may seem unfair
I... did try to have my efforts to influence people on AI risk be based in "epistemic rationality," as I saw it. That is, I had a model in which folks' difficulty taking AI risk seriously was downstream of gaps in their epistemic rationality, and in which it was crucial that persuasion toward AI safety work be done via improving folks' general-purpose epistemic rationality, and not through e.g. causing people to like the disconnected phrase "AI safety."
I endorsed many of the correct keywords ("help people think, don't try to persuade people of anything").
Nevertheless: the feedbacks that in-practice shaped which techniques I liked/used/taught were often feedbacks from "does this cause people to look like people who will help with AI risk as I see it, i.e. does it help with my desire for a certain ideology to be in control of people," and less feedbacks from "are they making real research progress now" or other grounded-in-the-physical-world successes/failures. (Although, again, there was some of the good/researchy kind of feedback, and I value this quite a bit.)
Case study: the "internal double crux" technique, used on AI risk
To give an example of the somewhat-wonky way my rationality development often went: I developed a technique called "Internal double crux," and ran a lot of people through a ~90-minute exercise called "internal double crux on AI risk." The basic idea in this technique, is that you have a conversation with yourself about whether AI risk is real, and e.g. whether the component words such as "AI" even refer to anything real/sensible/grounded, and you thereby try to pool the knowledge possessed by your visceral "what do I actually expect to see happen" self with the knowledge you hold more abstractly, and to hash things out, until you have a view that all of yourself signs onto and that is hopefully more likely to be correct.
I developed the "internal double crux" technique in part by thinking about the process that I and many 'math people' naturally do when reading a math textbook, where a person reads a claim, asks themselves if it is true, finds something like "okay, the proof is solid, so the claim is true, but still, it is not viscerally obvious yet that it is true, how do I see at a glance that it has to be this way?", and something like dialogues with themselves, back and forth, until they can see why the theorem holds. (Aka, I developed the technique at least partly by trying to be virtuous, and to 'boost epistemic rationality' rather than persuade.)
Still, the feedbacks that led to me putting the technique in a prominent place in the curriculum of CFAR's "AI risk for computer scientists" and "MIRI summer fellows" workshops were significantly that it seemed to often persuade people to take AI risk seriously.
And... my guess in hindsight is that the "internal double crux" technique often led, in practice, to people confusing/overpowering less verbal parts of their mind with more-verbal reasoning, even in cases where the more-verbal reasoning was mistaken. For example, I once used the "internal double crux" technique with a person I will here call "Bob", who had been badly burnt out by his past attempts to do direct work on AI safety. After our internal double crux session, Bob happily reported that he was no longer very worried about this, proceeded to go into direct AI safety work again, and... got badly burnt out by the work a year or so later. I have a number of other stories a bit like this one (though with different people and different topics of internal disagreement) that, as a cluster, lead me to believe that "internal double crux" in practice often worked as a tool for a person to convince themselves of things they had some ulterior motive for wanting to convince themselves of. Which... makes some sense from the feedbacks that led me to elevate the technique, independent of what I told myself I was 'trying' to do.
A couple other pulls from my goal of "try not to die of AI" to my promotion of social fuckery in the workshops
A related problem was that, in practice, it was too tempting to approach "aid AI safety" via social fuckery, and social fuckery is bad for making a real art of rationality.
For example, during e.g. the "AI risk for computer scientists" workshops that we ran partly as a MIRI recruiting aid in 2018-2020, I aimed to make the workshops impressive, and to make them showcase our thinking skill.
My reasoning at the time was that, since we could not talk directly about MIRI's non-public research programs, it was important that participants be able to see MIRI's/our thinking skill in other ways, so that they could have some shot at evaluating-by-proxy whether MIRI had a shot at being quite good at research.
(That is: I phrased the thing to myself as being about exposing people to true evidence, but I backchained it from wanting to convince them to trust me and to trust the structures I was working with.)
In practice, this goal led me to such actions as:
I suspect these features made the workshop worse than it would otherwise have been at allowing real conversations, allowing workshop participants, me, other staff, etc. to develop a real/cooperative art of rationality, etc. (Even though these sorts of "minor deceptiveness" are pretty "normal"; doing something at the standard most people hit doesn't necessarily mean doing it well enough not to get bitten by bad effects.)
In Summary:
Thus, AI safety did not end up serving a “reality tether” function for us, or at least not sufficiently. (Feedbacks from "can people do research" did help in some ways, even though feedbacks from "try to be persuasive/impressive to people, or to get into a social configuration that will allow influencing their future actions" harmed in other ways.) Nor did anything else tether us adequately, although I suspect that mundane tasks such as workshop logistics were at least a bit helpful.
A caveat, about something missing from my write-up here and in many places:
There were lots of other people at CFAR, or outside of CFAR but aiding our efforts in important ways, including lots who were agentic and interesting and developed a lot of interesting content and changed our directions and outcomes. I'm mostly leaving them out of my writing here, but this seems bad, because they were a lot of what happened, and a lot of the agency behind what happened. At the same time, I'm focusing here on a lot of things that... weren't the best decisions, and didn't end up with great outcomes, and I'm doing this without having consulted most of the people who played roles at CFAR and its programs in the past, and it seems a lot less socially complicated to talk about my own role in sad outcomes than to attempt to talk about anyone else's role in such things, especially when my guesses are low-confidence and others are likely to disagree. So I'm mostly sticking to describing my own roles in past stuff at CFAR, while adding this note to try to make that less confusing.
Re: Hypothesis 4 (from gjm): "Every cause wants to be a cult", and self-help-y causes are particularly vulnerable to this and tend to get dangerously culty dangerously quickly."
Yes. This hypothesis seems right to me. With my draft at some of the mechanisms, above.
An additional mechanism worth naming:
(I suspect this list of mechanisms is still quite partial. There are probably just lots of Goodhardt-like dynamics, whereby groups of people who are initially pursuing X may trend, over time, to pursuing "things that kind of look like X" and "things that give power/resources/control to those who seem likely to pursue things that kind of look like X" and so on.)
One reason the “every cause wants to be a cult” thing is harder to dodge well than one might think (though perhaps I am being defensive here):
IMO, large parts of “the mainstream” are also cults in the sense of “entities that restrict others’ thinking in ways that are not accurate, and that have been optimized over time for the survival of the system of thought-restrictions rather than for the good of the individual, and that make it difficult or impossible for those within these systems reason freely/well.”
For example, academia is optimized to keep people in academia. Also, the mainstream culture among college-educated / middle-class Americans seems to me to be optimized to keep people believing “normal” things and shunning weird/dangerous-sounding opinions and ideas, i.e. to keep people deferring to the culture of middle-class America and shunning influences that might disrupt this deferral, even in cases where this makes it hard for people to reason, to notice what they care about, or to choose freely. More generally, it seems to me there are lots of mainstream institutions and practices that condition people against thinking freely and speaking their minds. (cf "reason as memetic immune disorder" and "moral mazes").
I bring this up partly because I suspect the "true spirit of rationality" was more alive in the rationality community of 2008-2010 than it was in the CFAR of 2018 (say), or in a lot of parts of the EA community, and I further suspect that mimicry of some mainstream practices (e.g. management practices, PR practices) is one vehicle by which the "suppression of free individual caring and reasoning and self-direction, in favor of something like allowing groups to synchronize" occurred.
I bring this up also because in my head at least there are those who would respond to events thus far in parts of rationality-space with something like “gosh, your group ended up kinda culty, maybe you should avoid deviating from mainstream positions in the future,” and I’m not into that, because reasoning seems useful and necessary, and because in the long run I don’t trust mainstream institutions to allow the kind of epistemology we need to do anything real.
Some of these experiments are by a nascent group, rather than "CFAR-directed" in a narrow sense, and that group may fork off of CFAR as their own thing at some point, but it's not ready for the internet yet, and may never become so, but I don't mean to say they are only CFAR in a classic sense.