I haven't participated in CFAR workshops, I'm working from your written posts and my experience in other communities, so take these comments with the appropriate grains of salt.
I read the shift you describe as being from getting people on board with a largely preset agenda towards a more collaborative frame (and that the results have been promising so far). I wonder if the underlying issue here may be format-intent alignment. If CFAR has specific rationality techniques it believes are valuable and wants to teach, there's an inherent directiveness to that goal. Workshop/discussion formats, however, signal exploration and reciprocity. When the intent is directional but the format signals collaboration, participants can experience a disorienting mismatch.
I'm curious whether CFAR has considered hybrid formats that are explicit about which parts are directional and which are truly collaborative, or if you are leaning entirely on shifting intentions to fit the existing format—even if that means letting conversations drift away from anything adjacent to rationality or x-risk. I don't think there's an inherently right answer here, but understanding where you are on the spectrum and being transparent about this choice could help participants set their expectations, or self-filter regarding whether CFAR is a good fit for them at all.
Seeing this as a spectrum rather than a binary seems important because it prevents participants from running into "invisible" restrictions. Mutual learning and meeting people where they are at are great, but it doesn't seem realistic to try to be all things to all people. It's therefore important to be ready to say "this is what we offer, these are our constraints, respond as you will" even if where you think it is best to draw those lines has a wide range of valid answers. Sometimes a person is incompatible with an organization and that doesn't have to be anyone's fault.
A second, arguably more complex and charged dynamic I am wondering how you intend to navigate is responsibility for managing participant capacity. Is CFAR explicitly working to build participants' ability to recognize misalignment, maintain their sense of agency while in seemingly asymmetric power dynamics, and advocate for their needs? Or is the plan for CFAR to create sufficiently careful environments that participants won't need those skills? Or to screen applicants for having the capacity for self-advocacy already? Your post makes it sound like you are primarily taking the second approach, which is fine, but I am wondering how you are assessing the trade-offs involved.
For transparency, I personally believe in building capacity for self-advocacy and agency, with community care supporting that development and acting as a fallback when capacity isn't present, but I recognize that different approaches work for different purposes/people. What I am pushing for here is an explicit articulation of the balance you are striking so that participants can make a fully informed choice as to whether they wish to join.
In any case, I appreciate the thoughtfulness of your post and willingness to share CFAR's evolution publicly.
What is the legibility status of the problem of requiring problems to be legible before allowing them to inform decisions? The thing I am most concerned about wrt AI is our societal-level filters for what counts as a "real problem."
My takeaway from this post is that there are several properties of relating that people expect to converge, but in your case (and in some contexts) don't. With empathy, there's:
1. Depth of understanding of the other person's experience
2. Negative judgment
3. Mirroring
I mention 3 because I think it's strictly closer to the definition of empathy than 1, but it's mostly irrelevant to this post. If I had this kind of empathy for the woman in the video, I'd be thinking: "man, my head hurts."
The common narrative is that as 1 increases, 2 drops to zero, or even becomes positive judgement. This is probably true sometime, such as when counteracting the fundamental attribution error, but sometimes not: "This person is isn't getting their work done, that's somewhat annoying...oh, it's because they don't care about their education? Gaaahhh!!!" I can relate to this.
Regarding relating better without lowering standards, the questions that come to my mind are:
1. Is this a case where things have to get worse before they get better? As in, zero understanding leads to low judgement with suspension of disbelief, motivational understanding leads to high judgement, but full-story understanding returns to low judgment without relying on suspension of disbelief. Is there a way to test this without driving yourself crazy or taking up an inordinate amount of time?
2. Can you dissolve your moral judgement while keeping understanding constant? That is: "this teammate isn't doing their share of the work because they didn't care enough to be prepared...and this isn't a thing I need to be angry about." If this route looks interesting, my suggestion for the first step of the path is to introspect on the anger/disgust/etc. and what it's protecting.
This is a useful application of a probability map! If an important term has multiple competing definitions, create nodes for all of them, link the ones you consider important to a central p(doom) node (assuming you are interested in that concept), and let other people disagree with your assessment, but with a clearer sense of what they specifically disagree about.
The basic contention here seems to be that the biggest dangers of LLMs is not from the systems themselves, but from the overreliance, excessive trust, etc. that societies and institutions put on them. Another is that "hyping LLMs"--which I assume includes folks here expressing concerns that AI will go rogue and take over the world--increases perceptions of AI's abilities, which feeds into this overreliance. A conclusion is that promoting "x-risk" as a reason for pausing AI will have the unintended side effect of increasing (catastrophic, but not existential) dangers associated with overreliance.
This is an interesting idea, not least because it's a common intuition among the "AI Ethics" faction, and therefore worth hashing out. Here are my reasons for skepticism:
1. The hype that matters comes from large-scale investors (and military officers) trying to get in on the next big thing. I assume these folks are paying more attention to corporate sales pitches than Internet Academics and people holding protest signs--and that their background point of reference is not Terminator, but the FOMO common in the tech industry (which makes sense in a context where losing market share is a bigger threat than losing investment dollars).
2. X-risk scenarios are admittedly less intuitive in the context of self supervised learning based LLMs than they were back when reinforcement learning was at the center of development as AI learned to play increasingly broad ranges of games. These systems regularly specification-gamed their environments and it was chilling to think about what would happen when a system could treat the entire world as a game. A concern now, however, is that agency will make a comeback because it is economically useful. Imagine the brutal, creative effectiveness of RL combined with the broad-based common sense of SSL. This reintegration of agency (can't speak to the specific architecture) into leading AI systems is what the tech companies are actively developing towards. More on this concept in my Simulators sequence.
I, for one, will find your argument more compelling if you (1) take a deep dive into AI development motivations, rather than just lumping it all together as "hype", and (2) explain why AI development stops with the current paradigm of LLM-fueled chatbots or something similarly innocuous in itself but potentially dangerous in the context of societal overreliance.
The motivation of this post was to design a thought experiment involving a fully self-sufficient machine ecology that remains within constraints designed to benefit something outside of the system, not as a suggestion for how to make best use of the moon.
Agree, when discussing the alignment of simulators in this post, we are referring to safety from the subset of dangers related to unbounded optimization towards alien goals, which does not include everything within value alignment, let alone AI safety. But this qualification points to a subtle meaning drift in use of the word "alignment" in this post (towards something like "comprehension and internalization of human values") which isn't good practice and something I'll want to figure out how to edit/fix soon.
I am having difficulty seeing why anyone would regard these two viewpoints as opposed.
We discuss this indirectly in the first post in this sequence outlining what it means to describe a system through the lens of an agent, tool, or simulator. Yes, the concepts overlap, but there is nonetheless a kind of tension between them. In the case of agent vs. simulator, our central question is: which property is "driving the bus" with respect to the system's behavior, utilizing the other in its service?
The second post explores the implications of the above distinction, predicting different types of values--and thus behavior--from an agent that contains a simulation of the world and uses it to navigate vs. a simulator that generates agents because such agents are part of the environment the system is modelling vs. a system where the modes are so entangled it is meaningless to even talk about where one ends and the other begins. Specifically, I would expect simulator-first systems to have wide value boundaries that internalize (and approximation of) human values, but more narrow, maximizing behavior from agent-first systems.
It seems to me that the most robust solution is to do it the hard way: know the people involved really well, both directly and via reputation among people you also know really well--ideally by having lived with them in a small community for a few decades.
Setting prevalence aside and taking your case study as representative of some subset, there are some other things that might be going on.
First, a desire to have someone else initiate maps to the Allowing quadrant of the Wheel of Consent, which minimizes effort while maximizing feeling desired. That said, true Allowing should still be compatible with giving clear responses, so this doesn't by itself explain the aversion you are seeing.
Second, emotional reactions follow the pattern: event => meaning (via priors) => affect => narrative. Suppose this woman holds strongly negative priors about men's motivations. A consent request is not simply coordination, it's an implicit demand for legibility. But if she sees the interaction as inherently adversarial, that's giving you leverage. And if you do all the right things, that can be perceived as just more manipulation.
Now consider the internal conflict. She feels good about you initiating, then has a negative reaction to the consent request...while also consciously endorsing the belief that asking for consent is a Good Thing. Add the background tension of wanting to interact with men while viewing them as partially adversarial...and social advice to “trust your intuition” combined with long-term dissatisfaction with her relationship status and wanting to change it. That’s substantial cognitive dissonance with no widely shared conceptual handles. Hence the shutdown.
So the behavior you describe may be better explained by Allowing plus aversion to legibility (under distrust), rather than by a desire for nonconsent.
Other, non-substantive notes:
LessWrong may have high decoupling norms, but on charged topics like this, disclaimers may help prevent contextualizers from inferring views you likely don’t endorse.
Watch for selection effects! Women who give clear signals and are comfortable with explicit consent often pair off quickly. The women who remain visible in dating contexts—and thus command more of your attention—are disproportionately those who communicate more ambiguously.