Enemies vs Malefactors

[-]Richard Korzekwa3y5938

It's maybe fun to debate about whether they had mens rea, and the courts might care about the mens rea after it all blows up, but from our perspective, the main question is what behaviors they’re likely to engage in, and there turn out to be many really bad behaviors that don’t require malice at all.

I agree this is the main question, but I think it's bad to dismiss the relevance of mens rea entirely. Knowing what's going on with someone when they cause harm is important for knowing how best to respond, both for the specific case at hand and the strategy for preventing more harm from other people going forward.

I used to race bicycles with a guy who did some extremely unsportsmanlike things, of the sort that gave him an advantage relative to others. After a particularly bad incident (he accepted a drink of water from a rider on another team, then threw the bottle, along with half the water, into a ditch), he was severely penalized and nearly kicked off the team, but the guy whose job was to make that decision was so utterly flabbergasted by his behavior that he decided to talk to him first. As far as I can tell, he was very confused about the norms and didn't realize how badly he'd been violating them. He was definitely an asshole, and he was following clear incentives, but it seems his confusion was a load-bearing part of his behavior because he appeared to be genuinely sorry and started acting much more reasonably after.

Separate from the outcome for this guy in particular, I think it was pretty valuable to know that people were making it through most of a season of collegiate cycling without fully understanding the norms. Like, he knew he was being an asshole, but he didn't really get how bad it was, and looking back I think many of us had taken the friendly, cooperative culture for granted and hadn't put enough effort into acculturating new people.

Again, I agree that the first priority is to stop people from causing harm, but I think that reducing long-term harm is aided by understanding what's going on in people's heads when they're doing bad stuff.

[-]Zack_M_Davis3y3510

I suggest minting a new word, for people who have the effects of malicious behavior, whether it's intentional or not.

Why only malicious behavior? It seems like the relevant idea is more general: oftentimes we care about what outcomes a pattern of behavior looks optimized to achieve in the world, not about the person's conscious subjective verbal narrative. (Separately from whether we think those outcomes are good or bad.)

Previously, I had suggested "algorithmic" intent, as contrasted to "conscious" intent. Claims about algorithmic intent correspond to predictions about how the behavior responds to interventions. Mistakes that don't repeat themselves when corrected are probably "honest mistakes." "Mistakes" that resist correction, that systematically steer the future in a way that benefits the actor, are probably algorithmically intentional.

[-]Linch3y53

"Mistakes" that resist correction, that systematically steer the future in a way that benefits the actor, are probably algorithmically intentional.

is benefits the actor here load-bearing for you (as opposed to just predictably bad for others)? I can think of examples of situations that rarely benefit the actor but seem unlikely to be talked out of (e.g. temper tantrums at the workplace are rarely selfishly positive in professional Western contexts).

[-]Zack_M_Davis3y112

Sorry, not load-bearing; I think "steering the future" was the important part of that sentence.

Although in the case of tantrums, I think the game-theoretic logic is pretty clear: if I predictably make a fuss when I don't get my way, then people who don't want me to make a fuss are more likely to let me get my way (to a point). The fact that tantrums don't benefit the actor when they happen, isn't itself enough to show that they're not being used to successfully extort concessions to make them happen less often. If it doesn't work in the modern workplace, it probably worked in the environment of evolutionary adaptedness.

[-]Martin Randall3y192

Sometimes also tantrums work in the training distribution of childhood and don't work in the deployment environment of professional work.

[-]RHollerith3y2512

I suggest minting a new word, for people who have the effects of malicious behavior, whether it’s intentional or not.

I've long used "destructive" for that.

[-]MichaelStJules3y85

Harmful, maybe? Not all harms involve destruction (physical or relationships, etc.).

[-]bortrand3y21

I recently started making a similar distinction in my life and using the word “toxic”

[-]Raemon3y101

I don't like the word "toxic" because it's kind of essentialist without exposing actual causes/effects/mechanisms/inputs-outputs. I think it's useful sometimes as shorthand between people who have a high degree of agreement on what "toxic" means in a given context, but it's sort of a slippery word.

[-]Mart_Korz3y2-13

I also like "problematic" - it could be used as a 'we are not yet quite sure about how bad this is' version of "destructive"

[-][anonymous]3y58

Problematic is already associated with bigotry and I don't think invoking a political frame is helpful for these sorts of situations.

[-]Ivy Mazzola3y2-1

I don't think it does invoke a political frame if you use it right but perhaps I have too much confidence in how I've used the term

[-]Going Durden3y32

problematic does not differentiate between "bad", "harmful" and "difficult". Replacing the carbouretor in Honda Civic with only a spatula and a corksrew for tools is problematic, but not necessarily harmful or bad.

[-]Ivy Mazzola3y10

Maybe "troubling"

[-]Ivy Mazzola3y0-2

I use problematic

[-]Vladimir_Nesov3y2212

Labeling (in particular) catastrophically incompetent people "maleficient" sounds malevolent. While the concern might be valid in theory, this label has connotations that probably don't help with the inherent practical witch hunt and reign of terror risks of the whole concept.

Also, the apparent Chesterton-Schelling fences my intuition is loudly hallucinating at this post say to stop before instituting a habit of using such classification. Immediately-decision-relevant concepts are autonomous superweapons, controversial norms that resist attempts at keeping their boundaries in reasonable/intended places.

[-]Lukas_Gloor3y103

My stance is "the more we promote awareness of the psychological landscape around destructive patterns of behavior, the better." This isn't necessarily at odds with what you're saying because "the psychological landscape" is a descriptive thing, whereas your objection to Nate's proposal is that it seeks to be "immediately-decision-relevant," i.e., that it's normative (or comes with direct normative implications).

So, maybe I'd agree that "maleficient" might be slightly too simplistic of a classification (because we may want to draw action-relevant boundaries in different places depending on the context – e.g., different situations call for different degrees of risk tolerance of false positives vs. false negatives).

That said, I think there's an important message in Nate's post and (if I had to choose one or the other) I'm more concerned about people not internalizing that message than about it potentially feeding ammunition to witch hunts. (After all, someone who internalizes Nate's message will probably become more concerned about the possibility of witch hunts – if only explicitly-badly-intentioned people instigated witch hunts or added fuel to the fires, history would look very different.)

[-]Vladimir_Nesov3y140

"maleficient" might be slightly too simplistic of a classification

There is an interesting phenomenon around culture wars where a crazy amount of concepts is generated to describe the contested territory with mind-boggling nuance. I have a hunch that this is not just expertise signaling, but actually useful for dissolving the conceptual superweapons in a sea of distinctions. This divests the original contentious immediately-decision-relevant concept of its special role that gives it power, by replacing it with a hundred slightly-decision-relevant distinctions where none of them have significant power.

A disagreement that was disputing a definition about placement of its boundaries becomes a disagreement about decision procedures in terms of many unchanging and uncontroversial definitions that cover all contested territory in detail. After the dispute is over, most of the technical distinctions can once again be discarded.

[-]Sweetgum3y10

Could you give some examples? I understand you may not want to talk about culture war topics on lesswrong, so it's fine if you decline, but without examples I unfortunately cannot picture what you're talking about

[-]Vladimir_Nesov3y20

so it's fine if you decline

The cost of this statement is feeding the frame where it's not necessarily fine.

[-]PeterMcCluskey3y206

Humans care about this stuff enough to bake it into their legal codes.

It's mostly Western culture that does this. There's a lot of variation in how much cultures care about bad intentions.

[-]Richard_Kennaway3y183

IANAL, but I believe that the doctrine of mens rea is different from what is suggested here, and the difference has application to the larger context.

The mens rea is simply the intention to have done the actus reus, the illegal act. If, for example, a company director puts their signature to a set of false accounts, knowing they are false, then there is mens rea. It will cut no ice in court for them to profess that "my goodness, I didn't know that was illegal!", or "oh, but surely that wasn't really fraud", or "but it was for a vital cause!"

What matters is that they did the thing, intending to do the thing.

I suggest minting a new word, for people who have the effects of malicious behavior

I thought that "toxic" was the usual word these days.

[-]Archimedes3y60

IANAL either but I do know that certain crimes explicitly do hinge on the perpetrator's knowledge that what they did was illegal, not just that they intended to do it. This isn't common but does apply to some areas with complex legislation like tax evasion and campaign finance. As a high-profile example, Trump Jr. was deemed "too dumb to prosecute" for campaign finance violations.

More generally, there are multiple levels of mens rea. Some crimes require no intent to prosecute ("strict liability"). For those that do, they can be categorized into four levels of increasing severity: acting negligently, acting recklessly, acting knowingly, and acting purposefully. This list is not universal though it is representative. Some US states refer to express/implied "malice".

I understand So8res to be saying that we can treat toxic behavior on a strict liability basis without deciding what level of knowledge and intent to assign the offender.

[-]Going Durden3y20

I think "toxic" is more narrow: it hints at indirect, social, and emotional damage, and does not work well as term in situations that are just pragmatic in nature.

[-]Slimepriestess3y147

this might be a bit outside the scope of this post, but it would probably help if there was a way to positively respond to someone who was earnestly messing up in this manner before they cause a huge fiasco. If there's a legitimate belief that they're trying to do better and act in good faith, then what can be done to actually empower them to change in a positive direction? That's of course if they actually want to change, if they're keeping themselves in a state that causes harm because it benefits them while insisting its fine, well, to steal a sith's turn of phrase: airlocked

[-]Lukas_Gloor3y*96

If there's a legitimate belief that they're trying to do better and act in good faith, then what can be done to actually empower them to change in a positive direction? That's of course if they actually want to change, if they're keeping themselves in a state that causes harm because it benefits them while insisting its fine, well, to steal a sith's turn of phrase: airlocked

I agree that it's important to give people constructive feedback to help them change. However, I see some caveats around this (I think I'm expanding on the points in your comment rather than disagreeing with it). Sometimes it's easier said than done. If part of a person's "destructive pattern" is that they react with utter contempt when you give them well-meant and (reasonably-)well-presented feedback, it's understandable if you don't want to put yourself in the crossfire. In that case, you can always try to avoid contact with someone. Then, if others ask you why you're doing this, you can say something that conveys your honest impressions while making clear that you haven't given this other person much of a chance.

Just like it's important to help people change, I think it's also important to seriously consider the hypothesis that some people are so stuck in their destructive patterns that giving constructive feedback is no longer justifiable in terms of social opportunity costs. (E.g., why invest 100s of hours helping someone become slightly less destructive if you can promote social harmony 50x better by putting your energy into pretty much anyone else.)

Someone might object as follows. "If someone is 'well-intentioned,' isn't there a series of words you* can kindly say to them so that they'll gain insight into their situation and they'll be able to change?"

I think the answer here is "no" and I think that's one of the saddest things about life. Even if the answer was, "yes, BUT, ...", I think that wouldn't change too much and would still be sad.

*(Edit) Instead of "you can kindly say to them," the objection seems stronger if this said "someone can kindly say to them." Therapists are well-positioned to help people because they start with a clean history. Accepting feedback from someone you have a messy history with (or feel competitive with, or all kinds of other complications) is going to be much more difficult than the ideal scenario.

One data point that seems relevant here is success probabilities for evidence-based treatments of personality disorders. I don't think personality disorders capture everything about "destructive patterns" (for instance, one obvious thing that they miss is "person behaves destructively due to an addiction"), nor do I think that personality disorders perfectly carve reality at its joints (most traits seem to come on a spectrum!). Still, it seems informative that the treatment success for narcissistic personality disorder seems comparatively very low (but not zero!) for people who are diagnosed with it, in addition to it being vastly under-diagnosed since people with pathological narcissism are less likely to seek therapy voluntarily. (Note that this isn't the case for all personality disorders – e.g., I think I read that BPD without narcissism as a comorbidity has something like 80% chance of improvement with evidence-based therapy.) These stats are some indication that there are differences in people's brain wiring or conditioned patterns that are deep enough that they can't easily be changed with lots of well-intentioned and well-informed communication (e.g., trying to change beliefs about oneself and others).

So, I think it's a trap to assume that being 'well-intentioned' means that a person is always likely to improve with feedback. Even if, from the outside, it looks as though someone would change if only they could let go of a particular mindset or set of beliefs that seems to be the cause behind their "destructive patterns," consider the possibility that this is more of a symptom rather than the cause (and that the underlying cause is really hard to address).

[-]Linch3y*125

One hypothesis I have for why people care so much about some distinction like this is that humans have social/mental modes for dealing with people who are explicitly malicious towards them, who are explicitly faking cordiality in attempts to extract some resource. And these are pretty different from their modes of dealing with someone who's merely being reckless or foolish. So they care a lot about the mental state behind the act.
[...]
On this theory, most people who are in effect trying to exploit resources from your community, won't be explicitly malicious, not even in the privacy of their own minds. (Perhaps because the content of one’s own mind is just not all that private; humans are in fact pretty good at inferring intent from a bunch of subtle signals.) Someone who could be exploiting your community, will often act so as to exploit your community, while internally telling themselves lots of stories where what they're doing is justified and fine.

I note that while I find both paragraphs individually reasonable [and I find myself nodding along to them], there seems to be a soft contradiction between them that needs explanation.

Namely, why is human (whether genetic or cultural) evolution maladaptive? "Which humans are bad allies" seems to be close to centrally the problems we should expect evolution in a social context to be good at, so I feel like the burden of proof is on whoever is positing a local deviance to explain why the features are off in this case. Some possibilities:

1. "Our" community is different [why?]

2. People in history are in fact object-level wrong about the existence (or at least prevalence) of evil actors. In reality "Almost no one is evil, almost everything is broken." A possible evolutionarily concordant just-so story here is something in the direction of rational irrationality, perhaps humans are better at tribal ostracism etc if they collectively pretend (and/or genuinely believe) other humans who do bad things are genuinely evil and thus worthy of ostracism.

3.???

Both explanations are possible but I don't know which one is right (or both, or neither); I just want to highlight there there is something left to be explained in your model so far.

[-]maia3y91

There's no contradiction. There are two competing sides of the evolutionary process: one side is racing to understand intentions as well as possible, the other side is racing to obscure its intentions, in this case by not having them consciously.

[-]Mart_Korz3y10

I think one aspect which softens the discrepancy is that our intuitions here might not be adapted to large-scale societies. If everyone really lives mainly with one's own tribe and has kind of isolated interactions with other tribes and maybe tribe-switching people every now and then (similar to village-life compared to city-life), I could well imagine that "are they truly part of our tribe?" actually manages to filter out a large portion of harmful cases.

Also, regarding 2): If indeed almost no one is evil, almost everyone is broken: there are strong incentives to make sure that the social rules do not rule out your way of exploiting the system. Because of this I would not be surprised if "common knowledge" around these things tends to be warped by the class of people who can make the rules. Another factor is that as a coordination problem, using "never try to harm others" seems like a very fine Schelling point to use as common denominator.

[-]Linch3y21

It's possible, but I would previously have assumed that sociopathy/intentional maleficence etc to be less common in the ancestral environment relative to other harmful social situations. My own just-so story would suggest that people's intuitions from a tribal context are maladaptive in underpredicting sociopathy or deliberate deception.

[-]Mart_Korz3y10

I am not sure we disagree with regards to the prevalence of maleficience. One reason why I would imagine that

"are they truly part of our tribe?" actually manages to filter out a large portion of harmful cases.

works in more tribal contexts would be that cities provide more "ecological" niches (would the term be sociological here?) for this type of behaviour.

intuitions [...] are maladaptive in underpredicting sociopathy or deliberate deception

Interesting. I would mostly think that people today are way more specialized in their "professions" such that for any kind of ability we will come into contact with significantly more skilled people than a typical ancestor of ours would have. If I try to think about examples where people are way too trusting, or way too ready to treat someone as an enemy, I have the impression that for both mistakes examples come to mind quite readily. Due to this, I think I do not agree with "underpredict" as a description and instead tend to a more general "overwhelmed by reality".

[-]Raemon3y105

Curated.

In some sense, I knew all this 10 years ago when I first started community-organizing and running into problems with various flavors of deception, manipulation, and people-hurting-each-other.

But, I definitely struggled to defend my communities against people who didn't quite match my preconception of what "a person I would need to defend against" looked like. My sympathy and empathy for some people made me more hesitant to enforce my boundaries.

I don't know that I'm thrilled with "malefactor" or "maleficence" as words (they seem too similar to "malicious" and don't think they convey the right set of things), but, I very much agree with the distinction being useful.

[-]weft3y106

Interpersonal abuse (eg parental, partner, etc) has a similar issue. People like to talk as if the abuser is twirling their mustache in their abuse-scheme. And while this is occasionally the case, I claim that MOST abuse is perpetrated by people with a certain level of good intent. They may truly love their partner and be the only one who is there for them when they need it, BUT they lack the requisite skills to be in a healthy relationship.

Sadly this is often due to a mental illness, or a history of trauma, or not getting to practice these skills growing up until there was a huge gulf between where they are and where they need to be.

This makes it extra difficult for the victim, because the abuser is sympathetic and seemingly ACTUALLY TRYING. Trying to get advice from the internet may not help when everyone paints your abuser as a scheming villain and you can tell they're not. They're just broken.

I've really appreciated the media that shows a more realistic picture of abusers as people who love you, but are too fucked up to not hurt you. I think more useful advice would acknowledge this harsh reality

[-]Duncan Sabien (Inactive)3y1013

Copied text from a Facebook post that feels related (separating intent from result):

In Duncan-culture, there are more mistakes you're allowed to make, up-front, with something like "no fault."
e.g. the punch bug thing—if you're in a context where lots of people play punch bug, then you're not MORALLY CULPABLE if you slug somebody on the shoulder and then they say "Ouch, I don't like that, do not do that."
(You're morally culpable if you do it again, after their clear boundary, but Duncan-culture has more wiggle room for first-trespasses.)
However, Duncan-culture is MORE strict about something like ...
"I hurt people! But it's okay, I patched the dynamic that led to the hurt. But then I hurt other people! But it's okay, because I isolated and fixed that set of mistakes, too. But then I hurt other people! But it's okay, because I isolated and fixed that set of mistakes, too. But then I hurt other people! But it's okay..."
In Duncan-culture, you can get away with about two rounds of that. On the third screwup, pretty much everybody joins in to say "no. Stop. You are clearly just capable of inventing new mistakes every time. Cease this iterative process."
And if you don't—if you keep going, making a different error with a similar result every time—
In Duncan-culture, the resulting harm on rounds three and beyond is treated as, essentially, deliberate/intentional. Because the result was predictable, and this fact failed to move you.
This is not, as far as I can tell, robustly/reliably true in the broader culture I'm currently a part of.
EDIT: More disambiguation:
We give people protection, socially speaking, when we consider them to have had good intentions, but to have made a mistake with tragic results.
In Duncan-culture, you can't really get that protection three times in a row for three similar results. If you do A and it leads to X, that's just a mistake and we treat you sympathetically/generously. If you then do B and it leads to X, well, plausibly your first patch wasn't good enough, but like, okay, things are hard, your good intentions shine through, fair game. But if you then do C and it leads to X, all future X's resulting from D and E and so on are considered "your fault" in the not-excusable-as-a-mistake way. Good intentions cease to matter after three different Xings; your job now is to do whatever it takes to avoid more X, or to accept full responsibility for all future X, approximately as if you caused X on purpose/decided X was a side effect you felt worth causing.

[-]Richard_Ngo3y20

In Duncan-culture, when people say "no. Stop", what's the thing that they're saying should stop?

[-]Duncan Sabien (Inactive)3y50

In this specific case, I was writing about a colleague who kept hurting people in their attempts to help them with rationality. They kept managing to hurt people in novel and interesting ways, every time they patched the previous failure mode. "No. Stop." would be in reference to "stop fiddling with people's brains in this way."

Similarly, Brent Dill had in fact been doing different damages to each of his romantic partners, but eventually the Berkeley community was like "no, we are horrified, we don't care if you're not making those specific mistakes anymore, we do not trust you to not make new ones." In that case "No. Stop." was in reference to "dating any of the women in our community."

[-]Harold3y*92

I don't have any terminological suggestions that I love

Following on my prior comment, the actual legal terms used for the (oxymoronic) "purposeless and unknowing mens rea" might provide an opening for the legal-social technologies to provide wisdom on operationizing these ideas - "negligent" at first, and "reckless" when it's reached a tipping point.

[-]ambigram1y80Review for 2023 Review

This is an important distinction, otherwise you risk getting into unproductive discussions about someone's intent instead of focusing on whether a person's patterns are compatible with your or your group/community's needs.

It doesn't matter if someone was negligent or malicious: if they are bad at reading your nonverbal cues and you are bad at explicitly saying no to boundary crossing behaviors, you are incompatible and that is reason enough to end the relationship. It doesn't matter if someone is trying their best: if their best is still disruptive to your team, that is reason enough to request they be transferred out.

I can't remember if this essay is where I learned this concept. But remembering this distinction protected me in meaningful ways at least twice.

[-]Noosphere891y20

Yeah, in domains where the cost of improvement/training either is too high or can't happen, this post is really helpful, and I agree with this review the most.

[-]localdeity3y*85

When dealing with someone who's doing something bad, and it's not clear whether they're conscious of it or not, one tactic is to tell them about it and see how they respond. (It is the most obviously prosocial approach.) Ideally, this will either fix the situation or lead towards establishing that they are, at the very least, reprehensibly negligent, and then you can treat them as malicious. (In principle, the difference between a malicious person and one who accidentally behaves badly is that, if both of them come to understand that their behavior causes bad results, the latter will stop while the former will keep going. Applying this to the real world can be messy.)

To take an easy example, if the scenario involves a friend repeatedly doing something that hurts you, then probably you should tell them about it. If they apologize and try to stop, this is good; if their attempts to stop fail, then you can tell them that too, and take it from there. If, contrariwise, they insist "this can't actually be hurting you", or deny that it happened, or otherwise reject your feedback, then I'd consider this evidence that they're not such a good friend.

In the case of a non-friend, there is less of a presumption of good faith. Since the effect of them agreeing with you would mean they have to restrict their behavior or otherwise do stuff they'd rather not, they may be reluctant to agree, and further they might take it as you attempting to grab power or bully them. Which are things that people sometimes do, and so the details matter: exactly what evidence there is, the relation between them and the person(s) raising the issue, etc.

Suppose the issue involves subjective judgments of how someone behaved in 1:1 contexts. If one person thought you behaved badly in a situation, and you think differently, maybe you're right. If, the last N times you were in that type of situation, with N different people, they all thought you behaved badly, then that gets to be strong evidence, as N increases, that your approach is wrong. (Depending on the issue, it's possible that all N people believe the wrong philosophy—e.g. if the interaction was that they said "Praise Jesus!" and you replied "Sorry, but I'm an atheist". Though one then asks, why are you getting into all these situations that you can predict will go badly? Are you doing what you should do to avoid them?)

At a certain point, as the evidence mounts, a responsible person in your position, when confronted with the evidence, should say, "Ok, I still don't agree, but I have to admit there's an X% chance I'm wrong, and if I am wrong and continue like this, then the impact of being wrong is Y; meanwhile, there are certain safeguards, up to and including "stop it completely", which have their own expected values, and at this point safeguards A and B are reasonable and worth doing." (A truly mature person in certain situations might even say, "I know I'm innocent, but I also know that others have no way of verifying this, and from their perspective there's an X% chance I'm guilty, and I'm in favor of the general policy of responding with these countermeasures to that level of evidence of this crime, and I'm not going to fight them on this.")

A certain kind of narcissist would completely reject the feedback and say they're being unjustly persecuted, and (assuming our evidence is in fact good) we can condemn them here. Depending on the situation, some predators would say, "Hmmph, those safeguards prevent me from doing the fun stuff or make it unacceptably risky; I'll agree and then just quietly leave the community". Some others would pretend to agree and then try to continue misbehaving in whatever way they can. There's always the possibility of an intelligent psychopath behaving exactly like an innocent person.

(If you want to get advanced about it, you could try having the "confronting" be initially done by some person who looks sane but not powerful, to maximize the likelihood that the "prideful narcissist" would openly reject it while the "reasonable, accidental misbehaver" would accept it; or, if the safeguard you have in mind is highly effective but is a major concession, you might have it be done by people who are officially "in charge" (e.g. with the power to ban people from events) so as to pressure cowardly offenders to agree.)

If you don't have enough evidence to be confident that the guy who rejects the feedback and insists he's correct is in fact wrong... Well, at the very least, by telling him, (a) if he's good but misguided, he should at least be more cautious in the future, and there is a chance you've helped; (b) if he's bad and cowardly, he knows that official eyes are on him and he'll have less benefit of the doubt in the future, which may dissuade him. (This is conventionally known as a "warning".) Having the right person tell him in the right way may help with (a) and possibly (b).

There may be circumstances in which you don't want to tell him about the evidence you do have. (Maybe it would break a confidence; maybe it would teach predator-him how to hide his behavior in the future; maybe predator-he would know who snitched on him and take revenge [though my brain volunteers that this would be an excellent way to expose him, if you can protect the witness].) There are also plenty in which this isn't a problem.

Overall, this is such a large topic, and appropriate responses depend so much on the details, that I think it would help to be more specific.

[edit: fixed link]

[-]metacoolus3y10

Yes! This is an excellent approach. Rather than focusing only on whether there is malicious intent, keeping in mind the more practical goal of wanting bad behavior to *stop* and seeking to understand how it might play out over time is a much more effective way of resolving the problem. Using direct communication to try and fix the situation or ascertain a history of established negligent or malicious behavior is very powerful.

[-]Harold3y70

(As an example, various crimes legally require mens rea, lit. “guilty mind”, in order to be criminal. Humans care about this stuff enough to bake it into their legal codes.)

Even in the law of mental states, intent follows the advice in this post. U.S. law commonly breaks down the 'guilty mind' into at least four categories, which, in the absence of a confession, all basically work by observing the defendant's patterns of behaviour. There may be some more operational ideas in the legal treatment of reckless and negligent behaviour.

acting purposely - the defendant had an underlying conscious object to act
acting knowingly - the defendant is practically certain that the conduct will cause a particular result
acting recklessly - The defendant consciously disregarded a substantial and unjustified risk
acting negligently - The defendant was not aware of the risk, but should have been aware of the risk

[-]Marcello3y53

I know this post was chronologically first, but since I read them out of order my reaction was "wow, this post is sure using some of the notions from the Waluigi Effect mega-post, but for humans instead of chatbots"! In particular, they're both pointing at the notion that an agent (human or AI chatbot) can be in something like a superposition between good actor and bad actor unlike the naive two-tone picture of morality one often gets from children's books.

[-]Zack_M_Davis1y42Review for 2023 Review

At the time, I remarked to some friends that it felt weird that this was being presented as a new insight to this audience in 2023 rather than already being local conventional wisdom.^[1] (Compare "Bad Intent Is a Disposition, Not a Feeling" (2017) or "Algorithmic Intent" (2020).) Better late than never!

The "status" line at the top does characterize it as partially "common wisdom", but it's currently #14 in the 2023 Review 1000+ karma voting, suggesting novelty to the audience. ↩︎

[-]Ben Pace1y114

Presenting the same ideas differently is pro-social and worthwhile, and can help things land with those for whom other presentations didn't.

[-]Noosphere891y40

To be fair, it's a surprisingly cultural trait where different cultures have different attitudes to how much bad intent is different from action, and there is a use in trying to distinguish between bad behavior and bad mental states, that said if the US and Europe moved more towards norms in which we didn't distinguish as much between bad behavior and bad intent for the purposes of stopping the behavior, I do think it would be better (I think Zack M Davis norms is directionally correct from a personal epistemics view and for the general population in the US, though not as far as he would go):

https://www.lesswrong.com/posts/zidQmfFhMgwFzcHhs/enemies-vs-malefactors#jCfNxzCEniu7Ak8bF

https://www.lesswrong.com/posts/zidQmfFhMgwFzcHhs/enemies-vs-malefactors#p9oYLR8wTQtYrKnnn

[-]Aleksey Bykhun3y42

After a recent article in NY Times, I realized that it's a perfect analogy. The smartest people, when motivated by money, get so high that they venture into unsafe territory. They kinda know its unsafe, but even internally it doesn't feel like crossing the red line.

It's not even about the strength of characters, when incentives are aligned 99:1 against your biology, you can try to work against it, but you most probably stand no chance.

It takes enormous willpower to quit smoking explicitly because the risks are invisible and so "small". It's not only you have to fight against this irresistible urge, BUT there's also nobody on "your side", except for intellectual realization, of which you're not even so sure of.

In the same vein, being a CEO of a big startup, being able to single-handedly choose direction, and getting used to people around you being less smart, less hard-working, less competitive, you start trusting your own decision-process much more. That's when incentives start to water down through the cracks in the shell. You don't even remember what feels right anymore, the only thing you know is taking bold actions brings you more power, more money, more dukka. And you do those.

[-]NickGabs3y43

Strong upvote. A corollary here is that a really important part of being a “good person” is being good at being able to tell when you’re rationalizing your behavior/otherwise deceiving yourself into thinking you’re doing good. The default is that people are quite bad at this but as you said don’t have explicitly bad intentions, which leads to a lot of people who are at some level morally decent acting in very morally bad ways.

[-]LVSN3y41

Very excited for there to be definitely no differences between stereotypical malefactors and actual malefactors; no differences between stereotypical maleficence and actual maleficence; very excited for there to be no gameable cultural impressions about what makes a person a probable malefactor

... Not to imply that any gaming that would take place would be intentional, of course.

This isn’t to say no coordination happens. I expect a little coordination happens openly, through prosocial slogans, just to overcome free rider problems. Remember Trivers’ theory of self-deception – that if something is advantageous to us, we naturally and unconsciously make up explanations for why it’s a good prosocial policy, and then genuinely believe those explanations. If you are rich and want to oppress the poor, you can come up with some philosophy of trickle-down or whatever that makes it sound good. Then you can talk about it with other rich people openly, no secret organizations in smoke-filled rooms necessary, and set up think tanks together. If you’re in the patriarchy, you can push nice-sounding things about gender roles and family values. There is no secret layer beneath the public layer – no smoke-filled room where the rich people get together and say “Let’s push prosocial slogans about rising tides, so that secretly we can dominate everything”. It all happens naturally under the hood, and the Basic Argument isn’t violated."

https://slatestarcodex.com/2019/01/14/too-many-people-dare-call-it-conspiracy/

[-]Trevor Fordsman Weston3y33

I agree with this very intensely. I strongly regret unilaterally promoting the CFAR Handbook on various groups on Facebook; I thought that it was critical to minimize the number of AI safety and adjacent people using Facebook and that spreading the CFAR handbook was the best way to do that, and I mistakenly believed that CFAR was bad at marketing their material instead of choosing not to in order to avoid overcomplicating things. I had no way of knowing about the long list of consequences for CFAR for spreading their research in the wrong places, and CFAR had no way of warning me because they had no idea who I was and what I would do in response to their request. Hopefully, this won't make it harder for CFAR to post helpful content to Lesswrong in the future.

There are too many outside-the-box thinkers, the chaos factor is so high that it's like herding cats even when 99% of agents want to be cooperative. There needs to be defense mechanisms that take confusion into account so that well-intentioned unilateralists don't get tangled up in systems meant for deliberate, consistently strategic harm-maximizers (who very clearly and unambiguously exist). The only thing I can think of is finding ways to discourage every cooperative person from acting unilaterally in the first place, but I agree with So8res that I can't think of good ways to do that.

[-]Linch3y108

I thought that it was critical to minimize the number of AI safety and adjacent people using Facebook and that spreading the CFAR handbook was the best way to do tha

Wait your TOC for spreading the CFAR handbook on Facebook was that doing so would be so annoying that it'd get people to quit Facebook? If true, this is rather surprising to me and I did not predict this.

[-]Dmitriy3y32

I read his thesis as

FB use reduces the effectiveness of AI safety researchers and
the techniques in the CFAR handbook can help people resist attention hijacking schemes like FB, therefore
a FB group for EAs is a high leverage place to spread the CFAR handbook

[-]Said Achmiz3y10

the long list of consequences for CFAR for spreading their research in the wrong places

What are these consequences? Is this “long list” published anywhere?

[-]Nathan Young1y20Review for 2023 Review

I have used this dichotomy, 5 - 100 times during the last few years. I am glad it was brought to my attention.

[-]Ben Pace1y20Review for 2023 Review

It does seem worth having a term here! +4 for pointing it out and the attempt.

[-]TekhneMakre3y20

I gesture at a similar model here: https://www.lesswrong.com/posts/XPwEptSSFRCnfHqFk/zoe-curzi-s-experience-with-leverage-research?commentId=EM5TKrdsLLgBK78Qz

[-]Richard_Kennaway3y20

Here is a fictional, but otherwise practical example: the attempted rape that sets in motion the action of "Thelma and Louise". Here on YouTube. Notice what Harlan says at 0:50: "I'm not gonna hurt you".

How does he experience his intentions at that moment? At the moment after Thelma slaps him and he beats her?

Does it matter?

[-]ZY1y10

focusing less on intent and more on patterns of harm

In a general context, understanding intent though will help to solve the issue fundamentally. There might be two general reasons behind harmful behaviors: 1.do not know this will cause harm, or how not to cause harm, aka uneducated on this behavior/being ignorant, 2.do know this will cause harm, and still decided to do so. There might be more nuances but these two are probably the two high level categories. Knowing what the intent is helps to create strategies to address the issue - 1.more education? 2.more punishments/legal actions?

[-]Self1y10

What people need to get is that Lying is the weaker subset of Deception. It's the type you can easily call out and retaliate against.

Which is why we evolved to have strong instinctive reactions to it.

[-]Mary Chernyshenko3y10

Yeah, we don't know if the people who sent the Boy Who Had Cried Wolf to guard the sheep were stupid or evil. But we do know they committed murder.

[-]SomeoneYouOnceKnew3y1-1

What material policy changes are being advocated for, here? I am having trouble imagining how this won't turn into a witch-hunt.

[-]Tristan Miano3y0-3

Harmful people often lack explicit malicious intent.

I was having a discussion with ChatGPT where it also claimed to believe the same thing as this. I asked it to explain why it thinks this. It's reasoning was that well-intentioned people often make mistakes, and that malign actors do not always succeed in their aims. I'll say!

I disagree completely with the idea that well-intentioned people can actually cause any harm, but even if you presume that they could, it isn't clear to me how malign actors being unable to succeed in their aims is enough to balance out the consequences such that more negativity falls on the well-intentioned. Perhaps the unsuccess of malign actors is due to correctly narrowing our focus onto them only?

Also, in my experience, I think if we follow the advice to focus on effects only, that if we were well-intentioned about doing this, we'd end up focusing on only the truly malign actors anyway. "Deploying defenses" against honest mistake-making just doesn't intuitively result in actions that don't seem a bit cartoonishly ironically villainous.

[-]Noosphere893y20

A version of this tends to happen with rather unintelligent or incompent people placed in positions of power over other people, who can unintentionally harm people without having any intention to harm them.

Probably the best example here is the Great Chinese Famine, and the Holodomor to a lesser extent. One of the major problems was that the leadership had set severely unrealistic goals because they didn't know enough and combined with incompetence, caused catastrophes on the scale of millions to tens of millions of lives.

[-]Noosphere893y-2-6

As a former EA, I basically agree with this, and I definitely agree that we should start shifting to a norm that focuses on punishing bad actions, rather than trying to infer their mental state.

On SBF, I think a large part of the issue is that he was working in an industry called cryptocurrency that is basically has fraud as the bedrock of it all. There was nothing real about crypto, so the collapse of FTX was basically inevitable.

[-]aphyer3y259

Even if you accept that all cryptocurrency is valueless, it is possible to operate a crypto-related firm that does what it says it does or one that doesn't.

For example, if two crypto exchanges accept Bitcoin deposits and say they will keep the Bitcoin in a safe vault for their customers, and then one of them keeps the Bitcoin in the vault while the other takes it to cover its founder's personal expenses/an affiliated firm's losses, I think it is fair to say that the second of these has committed fraud and the first has not, regardless of whether Bitcoin has anything 'real' about it or whether it disappears into a puff of smoke tomorrow.

[-]Kenoubi3y86

On SBF, I think a large part of the issue is that he was working in an industry called cryptocurrency that is basically has fraud as the bedrock of it all. There was nothing real about crypto, so the collapse of FTX was basically inevitable.

I don't deny that the cryptocurrency "industry" has been a huge magnet for fraud, nor that there are structural reasons for that, but "there was nothing real about crypto" is plainly false. The desire to have currencies that can't easily be controlled, manipulated, or implicitly taxed (seigniorage, inflation) by governments or other centralized organizations and that can be transferred without physical presence is real. So is the desire for self-executing contracts. One might believe those to be harmful abilities that humanity would be better off without, but not that they're just nothing.

[-]Noosphere893y20

More specifically, the issue with crypto is that the benefits are much less than promised, and there's a whole lot of bullshit claims on crypto like it being secure or not manipulatable.

On one example of why cryptocurrencies fail as an a currency, one of it's problems is that it's fixed supply and no central entity means the value of that currency swings wildly, which is a dealbreaker for any currency.

Note, this is just one of the many, fractal problems here with crypto.

Crypto isn't all fraud. There's reality, but it's built out of unsound foundations and trying to sell a fake castle to others.

[-]localdeity3y41

I definitely agree that we should start shifting to a norm that focuses on punishing bad actions, rather than trying to infer their mental state.

Do you have limitations to this in mind? Consider the political issue of abortion. One side thinks the other is murdering babies; the other side thinks the first is violating women's rightful ownership of their own bodies. Each side thinks the other is doing something monstrous. If that's all you need to justify punishment, then that seems to mean both sides should fight a civil war.

("National politics? I was talking about..." The one example the OP gives is SBF, and other language alludes to sex predators and reputation launderers, and the explicit specifiers in the first few paragraphs are "harmful people" and "bad behavior"; it's such a wide range that it seems hard to declare anything offtopic.)

[-]Noosphere893y10

You've actually mentioned a depressing possibility around morality, and it's roughly that without shared ethical assumptions, conflict is the default, and there's nothing imposing any constraints except social norms, which can break down.

My answer for people in general is: Try to see what others think, but remember that sometimes, bad outcomes will happen to stop worse outcomes, and you should always focus on your own values to decide the answers.

[+][comment deleted]3y2-2

LESSWRONG
LW

LESSWRONG
LW

229

Enemies vs Malefactors

229

229

Short version

Long version