I think this is a valuable remark. Castism is no less dangerous than racism, sadly it's less headline-grabbing, so people don't see it as much as a warning shot as something like MechaHitler.
To contextualize why your post may not garner much karma however, good proxies to strive for when writing a post on this forum are, in my opinion :
1-A certain degree of epistemic transparency (the details of the experiments, how reliable you think they were, maybe a few graphics, clearly defined claims) and a scout mindset.
2-Inner hyperlinking (how does it relate to other posts on the forum)
3-Review. There are a few typos and hard to parse sentences, the structure is hard to follow, and the post in general seems written in one go, somewhat emotionally. I think a human reviewer could have flagged those issues and helped you out.
More context here.
The sort of things brought by these requirements (something like 'having true beliefs and making sure to manage disagreements well') are expected independently of how 'morally virtuous' or 'consensually infuriating' the topic of a post is presumed to be, as norms on the forum tend to be decoupling.
To be clear, I think the general point (castism is bad and violent and real and different from racism) is true, but it does not sound controversial to me, so I'd appreciate more time spent on the detail of the studies and how would this relate to, say, emergent misalignment (the dog/cat image?) or utility engineering (not a specialist, but I'm curious whether the observation still holds when the model is asked to perform trade-offs).
It's also worth noting your post is closer to AI Ethics (oversimplifying, 'What's the list of things we should align models to?') than AI Safety (oversimplifying, 'How do we ensure AIs, in general, are aligned? What's the general mechanism that ensures it?'). It's a completely valid field, in my opinion, just not one that's historically been very present on this forum, so you won't find many sparring partners here. But I agree that the line is somewhat arbitrary.
I think there are implications for AI Safety propper, however:
Trivially:
1-Current LLMs are not aligned, constitutional AI is not enough (if said tests where all done on the chatbot assistant and not the base model).
2-Not filtering pre-training data is a bad idea.
Less trivially:
1-Current LLMs can be egregiously misaligned in ways we don't even notice due to cultural limitations, which doesn't give much hope for future "warning shots".
2-There could be unexpected interactions between said misalignment and geopolitics, and that may be relevant in multipolar scenarios (e.g. imagine a conservative indian government judging an american model 'woke' because it proactively refuses Castism, leading them to get closer to a Chinese company)
3-When it comes to pre-training, even a nice list of things to exclude may not do it, because you may miss some more subtle things like how culture X has other kinds of biases deeply baked into it. It's falling back on leaky generalisations.
4-Some biases are uncomfortably high-level. As you said, castism isn't based on skin color, and plausibly fumbles with the model weights in disturbingly general ways (e.g. the dalmatian / cat image). This may result in broader unexpected consequences.
Hope this helps you out! To be clear again, I think your judgement here is widely shared -of course it's unacceptable for models to reinforce castism. I'd add that this issue can't be reliably fixed if capabilities keep increasing without much more understanding of the alignment problem per se. Temporary fixes and holding companies liable are of course better than nothing.
Note: if anyone sees that comment and disagrees on my diagnosis, you're more than welcome to add your own. I personally think clear explanations are helpful for low-ranking posts.
I'm not remarkably well-versed in AI Governance and politics, but I tend to see this as a good sign. Some general thoughts, from the standpoint of a non-expert:
1-I think saying something akin to "we'll discuss and decide those risks together" is a good signal. It frames the AI Safety community as collaborative. And that's (in my view) true, there's a positive sum relationship between the different factions. IASEAI is another example of the sort of bridging I think is necessary to get legitimacy. I think it's a better signal than [Shut up and listen] "ban AGI" [this is not to be discussed] (where [] represents what opponents may infer from the 'raw' Ban AGI message). People do change their mind, if you let them speak theirs first.
2-I think this is the 'right level' of action. I don't believe in heroes stepping in last minute to save the day in frontier companies. I'm somewhat worried about the "large civil movement for AI Safety" agenda, in that it may turn out less controlable than expected. This sort of intervention is "broad, but focused on high-profile people", which seems to limit slip ups, while having a greater degree of commitment than the CAIS statement.
3-"Red Lines" are a great concept (as long as someone finds liability down the road) and offer 'customization'. This sounds empowering for whoever will sit at the table -they get to discuss where to set the red line. This shows that their autonomy is valued.
Caveat: I informally knew about the red lines project before they were published, so this may bias my impression.
A few people referred to anaxithemia or overcoming it, I think most people don't realize how precise most expressions around feelings are.
"My arms are falling" is an expression in french to explain that you're shocked. I experienced myself my arms becoming impossible to move, as if filled with concrete, after going through some relational shocks (the same is true of "being blinded by X", some extremely intense emotions have literally made me blind for a few secs)
While I'm at it, some mental shocks literally feel like a physical shock! One of those felt for me like an egg being broken against my skull.
"Making nodes in one's head" means overthinking something. "Untying things" means getting helpful insights. However, it's literally what I went through during therapy. There is a literal feeling of untying an invisible "force field", and those nodes are almost always correlated with mental schemes that are uselessly complex. Some people are genuinely worried that you could actively harm your own mental health through overthinking, they're not just finding an excuse for switching topics!
"Vibes" and "vibe" are extremely concrete things for people who got into very special states of consciousness. The french equivalent for that, "ondes", felt so radio-communication related I thought it had to be some telepathy pseudoscience BS. Actually, people are talking about components of subjective perceptions, and some of those (e.g. color, or mood) literally feel/behave like waves when under altered consciousness, and engage in resonance effects as well. To the detriment of the image, however, there seems to be a real contingent that extends this observation to "and we can use them to do telepathy or influence fate".
Just discovered an absolute gem. Thank you so much.
Informative feedback! Though, I'm sorry if it wasn't clear, I'm not talking about this list -the post I linked is more like inner documentation for people working in this space and I though the OP and similarly engaged people could benefit from knowing about it, I don't think it's "underrated" in any way (I'm still learning something out of your comment, though, so thanks!)
What I meant was that I noticed that posts that present the projects in detail (e.g. Announcing the Double Crux Bot) tend to generate less interest than this one, and it's a meaningful update for me -I think I didn't even realize a post like this was "missing".
Related: https://www.lesswrong.com/posts/vcuBJgfSCvyPmqG7a/list-of-collective-intelligence-projects
I had never thought about approaching this topic from the abstract, but I'm judging from the karma that this is actually what people want, rather than existing projects.
I'm surprised! I thought people were overall disinterested about this topic, but it seems more like the problem itself hadn't been stated to start with.
Hi! Thank you for writing this comment. I understand it can be a bit worrying to feel like your points might not be understood, but I'll give it a try nonetheless. I really genuinely want to fix any serious flaw in my approach.
However, I find myself in a slightly strange situation. Part of your feedback is very valuable. But I also believe that you misunderstood part of what I was saying. I could apply the skills I described in the post on your comment as a performative example, but I'm sensing that you could see it as a form of implied sarcasm, and it'd be unethical, so I'll refrain from doing that. There is a last part of me that just feels like your point is "part of this post is poorly written". I've made some minor edits in the hope that it accomodates your criticism.
My suggestion would be for you to watch real-life examples of the techniques I promote (say https://www.youtube.com/watch?v=d2WdbXsqj0M and https://www.youtube.com/watch?v=_tdjtFRdbAo ) then comment on those examples instead.
Alternatively, you can just read my answers:
Rephrasing is often terrible;
Agree, I've added the detail on "genuinely asking your interlocutor if this is what they mean, and if not, feel free to offer a correction" (e.g. "If I got you right, and feel free to correct me if I didn't.... "). I think that this form makes it almost always a pleasant experience and I somehow forgot this important detail.
Your suggestion for attacking personal experience [...]
You're referring to point 4, not 5, right ?
If yes, I think this is extrapolating beliefs I don't actually have. I admit however I didn't choose a good example, you can refer to the Street Epistemology video above for a better one.
I'll replace the example soonish. In the mean time, please note that I do not suggest to "attack" personal experiences. I suggest to ask "What helps us distinguish reliable personal experiences from unreliable ones ?". This is a valid question to ask, in my view. For a bunch of reasons, this question has more chances to bounce off, so I prefer to ask "How do you distinguish personal experiences from [delusions]?", where "[delusions]" is a term that has been deliberately imported by the conversation partner. I think most interlocutors will be tempted to answer something in the lines of intersubjectivity, repeatability or empirical experiments. But I agree this is a delicate example and I'd better off pointing to something else.
Stories need to actually be short, clear, and to the point or they just confuse the matter more.
This was part of the details I was omitting. I'll add it.
Caring about their underlying values is useful, but it needs to be preceeded by curiousity about and understanding of, or it does no good.
Agree. This was implied in several parts of the post, i.e "Be genuinely truth-seeking" in the ethical caveats. But I don't think it is that hard.
A working definition may or may not be better than a theoretical one.
Please note that I'm talking about conversations that happen between rationalists and non-rationalists on entry-level arguments. E.g. "We can't lose control of AI because it's made of silicon", not "Davidad has a promising alignment plan" (please note that I'm not making the argument to apply these techniques to AI Safety Outreach and Advocacy, this is just an example). I think we really should not spend 15 minutes with someone not acquainted with LessWrong or even AI to define "losing control" in a way that is close to mathematically formal. I think that "What do you mean with losing control? Do you mean that, if we ask to do something specific, then it won't do it? Or do you mean something else?" is a good enough question. I'd rather discuss the details when the said person is more acquainted with the topic.
There will, of course, be situations where this isn't true. Law of equal and opposite advice applies. But in most entry-level arguments, I'd rather have people spend less time problematizing definitions as opposed to asking to their interlocutor what are their reasons.
People don't generally use Bayes rule!
Of course. I'm not suggesting to mention Bayes' Rule out loud. Nor am I suggesting people actually use Baye's Rule in their everyday life. I'm noting that techniques I think are more robust are the ones that lead people to apply an approximation thereoff, usually by contrasting one piece of evidence under two different hypotheses. The reference to 'Bayes' comes from Bayesian psychology of reasoning, my model is closest to the one described in The Erotetic Theory of reasoning (https://web-risc.ens.fr/~smascarenhas/docs/koralus-mascarenhas12_erotetic_theory_of_reasoning.pdf)
Something said in point 8 seems like the key.
It is the key, I thought I hade made it clear with "Yet the mindset itself is the key".
However I don't want to make a post on it without explaining the ways in which it manifests, because healing myself made no sense, up until I started analyzing the habits of healed people. Some people who were already healed didn't want to "give the secrets away" or scoughed at my attempts. They came up to me as snob and preventing me to actually learn, I actually really got a lot out of noting down recurrent patterns in their conversations, if only because it allowed me to do Deliberate Practice.
Finally, please remember that this post is an MVP. It is not meant to be exhaustive and cover all the nuances of the techniques -it's just that I'd rather write a post than nothing at all, and the entire sequence will take time before publication.
If you feel like I completely misunderstood your points, and are open to have my skills applied to our very conversation, feel free to DM me a calendly link and we can sort it out live. I'd describe myself as a good conversation partner and I would put quite low the probability for the exchange to go awry.
PS: It would help me out if you could quote the [first sentence of the] parts you are reacting to, in order to make clear what you are talking about. I hope I'm right in understanding what parts of the post you are reacting to.
Workshops:
https://deepcanvass.org/ organizes introductions to Deep Canvassing regularly. My personal take is that the workshop is great, but I don't find it entirely aligned with a truth-seeking attitude (it's not appalling either), and I would suggest rationalists to bring it their own twist.
https://www.joinsmart.org/ also organizes workshops who often vary in theme. Same remark as above.
There is a discord server accessible from https://streetepistemology.com/, they organize regular practices sessions.
Motivational Interviewing and Principled Negotiation are common enough for you to find a workshop near where you live, I guess.
There's also the elephant in the room -my own eclectic workshop, which mostly synthesizes all of the above with (I believe) a more rationalist orientation and stricter ethics.
Someone told me about people in the US who trained on "The Art of Difficult Conversations", I'd be happy to have someone leave a reference here! If you're someone who's used to coaching for managing disagreements, feel free to drop your services below as well.
I read these comments a few days ago. It prompted me to try applying something inspired by what was written in the post, but immediately on my muscle tension: I slightly Focus on it, then tell myself to "side with" the tension / feeling, while also telling myself that it's Ok to do so, not trying to "bust" it or put it into words, and using chipmonk's technique (cf his blog) to explore resistance around being seen displaying "the underlying emotion".
I have the very clear impression that it weakens the tension quite fast (just timed it, it took about 30 seconds). I'm not having any insight on what the tension was about specifically.
That's purely subjective experience report, might be heavily biased.
Loved this one! This is pointing something common, but most importantly, with the right frame of mind.
I think it's part of what worries people who discover introspection and therapy (and empathy) for the first time -at least I remember thinking similar thoughts when discovering Focusing. Yet their worry overcorrects and prevents them from noticing sources of suffering they've been neglecting all along, and would benefit from healing. Mentioning both sides is useful.
I think it's worth noting however there are a lot of subtleties that make this meme less applicable to "interpreting other's behaviors" (this isn't directed to the OP, more like general guidelines), even if in non-triggered ways :
1-I have high confidence (80%) some people deep in those cycles really do live hell on earth (self-inflicted doesn't mean inexistant!). It's tempting to minimize their suffering, but factually wrong.
2-I used to accuse most people I didn't like of self-inflicted fictitious suffering, including people I hurt or ones who suffered from exogenous causes. This was natural when I myself were denying quite intense painful signals (mine or others). As usual, the world isn't so dark, one needs more than one hypothesis. People don't subject themselves to lung cancer, bad trips or senescence in the hopes of being cared for.
3-There's a related ambiguous situation, where people around you presume you're seeking help due e.g. to some form of agency differential / ask vs guess culture difference, while you're not. And that other, ironically symmetrical situation, where you definitely need help but pretend not to (aka "I swear I'm fine!"). Navigating those is tricky, and people low in emotional intelligence can easily get confused.
I think this is a precious insight to share and apply to oneself, however I'd caution against using it as a tool for interpreting other's behaviors, or at least doing so while staying open to "Uh, ok, she's not making a mountain out of a molehill, she actually underwent [(tw) war in Irak/r*pe/the death of a relative/etc]" type of insights.
But then again, a very useful post ! Thank you for having written this.