Note: I'll be trying not to engage too much with the object level discussion here – I think my marginal time on this topic is better spent thinking and writing longform thoughts. See this comment.
Over the past couple months there was some extended discussion including myself, Habryka, Ruby, Vaniver, Jim Babcock, Zvi, Ben Hoffman, Jessicata and Zack Davis. The discussion has covered many topics, including "what is reasonable to call 'lying'", and "what are the best ways to discuss and/or deal with deceptive patterns in public discourse", "what norms and/or principles should LessWrong aspire to" and others.
This included comments on LessWrong, email, google-docs and in-person communication. This post is intended as an easier-to-read collection of what seemed (to me) like key points, as well as including my current takeaways.
Part of the challenge here was that it seemed like Benquo and I had mostly similar models, but many critiques I made seemed to Ben to be in the wrong abstraction, and vice-versa. Sometimes I would notice particular differences like "In my model, it's important that accusations be held to a high standard", whereas Ben felt "it's important that criticism not be held to higher standards than praise." But knowing this didn't seem to help much.
This post mostly summarizes existing online conversation. I'm hoping to do a followup where I make good on my promise to think more seriously through my cruxes-on-ontology, but it's slow going.
This begins with some comment highlights from LessWrong which seemed useful to gather in one place, followed by my takeaways after the fact.
I attempt to pass Ben's ITT
In the comments of "Rationalization" and "Sitting Bolt Upright in Alarm", a few things eventually clicked and I attempted to pass Ben's Ideological Turing Test:
Let me know how this sounds as an ITT:
Thinking and building a life for yourself
- Much of civilization (and the rationalsphere as a subset of it and/or memeplex that's influenced and constrained by it) is generally pointed in the wrong direction. This has many facets, many of which reinforce each other. Society tends to:
- Schools systematically teach people to associate reason with listening-to/pleasing-teachers, or moving-words-around unconnected from reality. [Order of the Soul]
- Society systematically pushing people to live apart from each other, to work until they need (or believe they need) palliatives, in a way that doesn't give you space to think [Sabbath Hard and Go Home]
- Relatedly, society provides structure that incentivizes you to advance in arbitrary hierarchy, or to tread water and barely stay afloat, without reflection of what you actually want.
- By contrast, for much of history, there was a much more direct connection between what you did, how you thought, and how your own life was bettered. If you wanted a nicer home, you built a nicer home. This came with many overlapping incentive structures reinforced something closer to living healthily and generating real value.
- (I'm guessing a significant confusion was me seeing this whole section as only moderately connected rather than central to the other sections)
We desperately need clarity
- There's a collection of pressures, in many-but-not-all situations, to keep both facts and decision-making principles obfuscated, and to warp language in a way that enables that. This is often part of an overall strategy (sometimes conscious, sometimes unconscious) to maneuver groups for personal gain.
- It's important to be able to speak plainly about forces that obfuscate. It's important to lean _fully_into clarity and plainspeak, not just taking marginal steps towards it, both because clear language is very powerful intrinsically, and there's a sharp dropoff as soon as ambiguity leaks in (moving the conversation to higher simulacrum levels, at which point it's very hard to recover clarity)
[Least confident] The best focus is on your own development, rather than optimizing systems or other people
- Here I become a lot less confident. This is my attempt to summarize whatever's going on in our disagreement about my "When coordinating at scale, communicating has to reduce gracefully to about 5 words" thing. I had an impression that this seemed deeply wrong, confusing, or threatening to you. I still don't really understand why. But my best guesses include:
- This is putting the locus of control in the group, at a moment-in-history where the most important thing is reasserting individual agency and thinking for yourself (because many groups are doing the wrong-things listed above)
- Insofar as group coordination is a lens to be looked through, it's important that groups a working in a way that respects everyone's agency and ability to think (to avoid falling into some of the failure modes associated with the first bullet point), and simplifying your message so that others can hear/act on it is part of an overall strategy that is causing harm
- Possibly a simpler "people can and should read a lot and engage with more nuanced models, and most of the reason you might think that they can't is because school and hierarchical companies warped your thinking about that?" And then, in light of all that, something is off with my mood when I'm engaging with individual pieces of that, because I'm not properly oriented around the other pieces? Does that sound right? Are there important things left out or gotten wrong?
This sounds really, really close. Thanks for putting in the work to produce this summary!
I think my objection to the 5 Words post fits a pattern where I've had difficulty expressing a class of objection. The literal content of the post wasn't the main problem. The main problem was the emphasis of the post, in conjunction with your other beliefs and behavior.
It seemed like the hidden second half of the core claim was "and therefore we should coordinate around simpler slogans," and not the obvious alternative conclusion "and therefore we should scale up more carefully, with an uncompromising emphasis on some aspects of quality control." (See On the Construction of Beacons for the relevant argument.)
It seemed to me like there was some motivated ambiguity on this point. The emphasis seemed to consistently recommend public behavior that was about mobilization rather than discourse, and back-channel discussions among well-connected people (including me) that felt like they were more about establishing compatibility than making intellectual progress. This, even though it seems like you explicitly agree with me that our current social coordination mechanisms are massively inadequate, in a way that (to me obviously) implies that they can't possibly solve FAI.
I felt like if I pointed this kind of thing out too explicitly, I'd just get scolded for being uncharitable. I didn't expect, however, that this scolding would be accompanied by an explanation of what specific, anticipation-constraining, alternative belief you held. I've been getting better at p_ointing out this pattern_ (e.g. my recent response to habryka) instead of just shutting down due to a preverbal recognition of it. It's very hard to write a comment like this one clearly and without extraneous material, especially of a point-scoring or whining nature. (If it were easy I'd see more people writing things like this.)
Summary of Private LessWrong Thread (Me/Benquo/Jessica)
One experiment we tried during the conversation was to hold a conversation on LessWrong, in a private draft (i.e. where we could respond to each other with nested threading, but only have to worry about responding to each other)
The thread was started by Ruby, with some proposals for LessWrong moderation style. At first the conversation was primarily Ruby and Zvi. At some point Ruby might make the full thread public, but for now I'm focusing on an exchange between Benquo, Jessica and I, which I found most helpful for clarifying our positions.
It might help for me to also try to make a positive statement of what I think is at stake here. [...]
What I see as under threat is the ability to say in a way that's actually heard, not only that opinion X is false, but that the process generating opinion X is untrustworthy, and perhaps actively optimizing in an objectionable direction. Frequently, attempts to say this are construed p_rimarily_ as moves to attack some person or institution, pushing them into the outgroup. Frequently, people suggest to me an "equivalent" wording with a softer tone, which in fact omits important substantive criticisms I mean to make, while claiming to understand what's at issue.
My core claim is: "right now, this isn't possible, without a) it being heard by many people as an attack, b) without people having to worry that other people will see it as an attack, even if they don't."
It seems like you see this something as "there's a precious thing that might be destroyed" and I see it as "a precious thing does not exist and must be created, and the circumstances in which it can exist are fragile." It might have existed in the very early days of LessWrong. But the landscape now is very different than it was then. With billions of dollars available and at stake, what worked then can't be the same thing as what works now.
[in public. In private things are much easier. It's *also* the case that private channels enable collusion – that was an update i've made over the course of the conversation.]
And, while I believe that you earnestly believe that the quote paragraph is important, your individual statements often look too optimized-as-an-obfuscated-attack for me to trust that they are not. I assign substantial probability to a lot of your motives being basically traditional coalition-political and you are just in denial about it, with a complicated narrative to support them. If that's not true, I realize it must be extremely infuriating to be treated that way. But the nature of the social landscape makes it a bad policy for me to take you at your word in many of the cases.
Wishing the game didn't exist doesn't make the game not exist. We could all agree to stop playing at once, but a) we'd need to credibly believe we were all actually going to stop playing at once, b) have enforcement mechanisms to make sure it continues not being played, c) have a way to ensure newcomers are also not playing.
And I think that's all possibly achievable, incrementally. I think "how to achieve that" is a super important question. But attempting to not-play the game without putting in that effort looks me basically like putting a sign that says "cold" on a broken refrigerator and expecting your food to stay fresh.
I spent a few minutes trying to generate cruxes. Getting to "real" cruxes here feels fairly hard and will probably take me a couple hours. (I think this conversation is close to the point where I'd really prefer us to each switch to the role of "Pass each other's ITTs, and figure out what would make ourselves change our mind" rather than "figure out how to explain why we're right." This may require more model-sharing and trust-building first, dunno)
But I think the closest proximate crux is: I would trust Ben's world-model a lot more if I saw a lot more discussion of how the game theory plays out over multiple steps. I'm not that confident that my interpretation of the game theory and social landscape are right. But I can't recall any explorations of it, and I think it should be at least 50% of the discussion here.
But the landscape now is very different than it was then. With billions of dollars available and at stake, what worked then can’t be the same thing as what works now.
Jessica responds to me:
Is this a claim that people are almost certainly going to be protecting their reputations (and also beliefs related to their reputations) in anti-epistemic ways when large amounts of money are at stake, in a way they wouldn't if they were just members of a philosophy club who didn't think much money was at stake?
This claim seems true to me. We might actually have a lot of agreement. And this matches my impression of "EA/rationality shift from 'that which can be destroyed by the truth should be' norms towards 'protect feelings' norms as they have grown and want to play nicely with power players while maintaining their own power."
If we agree on this point, the remaining disagreement is likely about the game theory of breaking the bad equilibrium as a small group, as you're saying it is.
(Also, thanks for bringing up money/power considerations where they're relevant; this makes the discussion much less obfuscated and much more likely to reach cruxes)
[Note, my impression is that the precious thing already exists among a small number of people, who are trying to maintain and grow the precious thing and are running into opposition, and enough such opposition can cause the precious thing to go away, and the precious thing is currently being maintained largely through willingness to forcefully push through opposition. Note also, if the precious thing used to exist (among people with strong stated willingness to maintain it) and now doesn't, that indicates that forces against this precious thing are strong, and have to be opposed to maintain the precious thing.]
An important thing I said earlier in another thread was that I saw roughly two choices for how to do the precious thing, which is something like:
- If you want to do the precious thing in public (in particular when billions of dollars are at stake, although also when narrative and community buy-in are at stake), it requires a lot of special effort, and is costly
- You can totally do the precious thing in small private, and it's much easier
- And I think a big chunk of the disagreement comes from the 'small private groups are also a way that powerful groups collude, and be duplicitous, and other things in that space.'
[There's a separate issue, which is that researchers might feel more productive, locally, in private. But failure to write up their ideas publicly means other people can't build on them, which is globally worse. So you also want some pressure on research groups to publish more]
So the problem-framing as I currently see it is:
- What are the least costly ways you can have plainspoken truth in public, without destroying (or resulting in someone else destroying) the shared public space. Or, what collection of public truthseeking norms output the most useful true things per unit of effort in a sustainable fashion
- What are ways that we can capture the benefits of private spaces (sometimes recruiting new people into the private spaces), while having systems/norms/counterfactual-threats in place to prevent collusion and duplicity, and encourage more frequent publishing of research.
And the overall strategy I currently expect to work best (but with weak confidence, haven't thought it through) is:
- Change the default of private conversations from 'stay private forever' to 'by default, start in private, but with an assumption that the conversation will usually go public unless there's a good reason not to, with participants having veto* power if they think it's important not to go public."
- An alternate take on "the conversation goes public" is "the participants write up a distillation of the conversation that's more optimized for people to learn what happened, which both participants endorse." (i.e. while I'm fine with all my words in this private thread being shared, I think trying to read the entire conversation might be more confusing than it needs to be. It might not be worth anyone's time to write up a distillation, but if someone felt like it I think that'd be preferable all else being equal)
- Have this formally counterbalanced by "if people seem to be abusing their veto power for collusion or duplicitous purposes, have counterfactual threats to publicly harm each other's reputation (possibly betraying the veto-process*), which hopefully doesn't happen, but the threat of it happening keeps people honest.
*Importantly, a formal part of the veto system is that if people get angry enough, or decide it's important enough, they can just ignore your veto. If the game is rigged, the correct thing to do is kick over the gameboard. But, everyone has a shared understanding that a gameboard is better than no gameboard, so instead, people are incentivized to not rig the game (or, if the game is currently rigged, work together to de-rig it)
Because everyone agrees that these are the rules of the metagame, betraying the confidence of the private space is seen as a valid action (i.e. if people didn't agree that these were the meta-rules, I'd consider betraying someone's confidence to be a deeply bad sign about a person's trustworthiness. But if people d_oa_gree to the meta-rules, then if someone betrays a veto it's a sign that you should maybe be hesitant to collaborate with that person, but not as strong a sign about their overall trustworthiness)
I'm first going to summarize what I think you think:
- $Billions are at stake.
- People/organizations are giving public narratives about what they're doing, including ones that affect the $billions.
- People/organizations also have narratives that function for maintaining a well-functioning, cohesive community.
- People criticize these narratives sometimes. These criticisms have consequences.
- Consequences include: People feel the need to defend themselves. People might lose funding for themselves or their organization. People might fall out of some "ingroup" that is having the important discussions. People might form coalitions that tear apart the community. The overall trust level in the community, including willingness to take the sensible actions that would be implied by the community narrative, goes down.
- That doesn't mean criticism of such narratives is always bad. Sometimes, it can be done well.
- Criticisms are important to make if the criticism is really clear and important (e.g. the criticism of ACE). Then, people can take appropriate action, and it's clear what to do. (See strong and clear evidence)
- Criticisms are potentially destructive when they don't settle the matter. These can end up reducing cohesion/trust, splitting the community, tarnishing reputations of people who didn't actually do something wrong, etc.
- These non-matter-settling criticisms can still be important to make. But, they should be done with sensitivity to the political dynamics involved.
- People making public criticisms willy-nilly would lead to a bunch of bad effects (already mentioned). There are standards for what makes a good criticism, where "it's true/well-argued" is not the only standard. (Other standards are: is it clear, is it empathetic, did the critic try other channels first, etc)
- It's still important to get to the truth, including truths about adversarial patterns. We should be doing this by thinking about what norms get at these truths with minimum harm caused along the way.
Here's a summary of what I think (written before I summarized what you thought):
- The fact that $billions are at stake makes reaching the truth in public discussions strictly more important than for a philosophy club. (After all, these public discussions are affecting the background facts that private discussions, including ones that distribute large amounts of money, assume)
- The fact that $billions are at stake increases the likelihood of obfuscatory action compared to in a philosophy club.
- The "level one" thing to do is to keep using philosophy club norms, like old-LessWrong. Give reasons for thinking what you think. Don't make appeals to consequences or shut people up for saying inconvenient things; argue at the object level. Don't insult people. If you're too sensitive to hear the truth, that's for the most part your problem, with some exceptions (e.g. some personal insults). Mostly don't argue about whether the other people are biased/adversarial, and instead make good object-level arguments (this could be stated somewhat misleadingly as "assume good faith"). Have public debates, possibly with moderators.
- A problem with "level one" norms is that they rarely talk about obfuscatory action. "Assume good faith", taken literally, implies obfuscation isn't happening, which is false given the circumstances (including monetary incentives). Philosophy club norms have some security flaws.
- The "level two" thing to do is to extend philosophy club norms to handle discussion of adversarial action. Courts don't assume good faith; it would be transparently ridiculous to do so.
- Courts blame and disproportionately punish people. We don't need to do this here, we need the truth to be revealed one way or another. Disproportionate punishments make people really defensive and obfuscatory, understandably. (Law fought fraud, and fraud won)
- So, "level two" should develop language for talking about obfuscatory/destructive patterns of social action that doesn't disproportionately punish people just for getting caught up in them. (Note, there are some "karmic" consequences for getting caught up in these dynamics, like having the organization be less effective and getting a reputation for being bad at resisting social pressure, but these are very different from the disproportionate punishments typical of the legal system, which punish disproportionately on the assumption that most crime isn't caught)
- I perceive a backslide from "level one" norms, towards more diplomatic norms, where certain things are considered "rude" to say and are "attacking people", even if they'd be accepted in philosophy club. I think this is about maintaining power illegitimately.
Here are more points that I thought of after summarizing your position:
- I actually agree that individuals should be using their discernment about how and when to be making criticisms, given the political situation.
- I worry that saying certain ways of making criticisms are good/bad results in people getting silenced/blamed even when they're saying true things, which is really bad.
- So I'm tempted to argue that the norms for public discussion should be approximately "that which can be destroyed by the truth should be", with some level of privacy and politeness norms, the kind you'd have in a combination of a philosophy club and a court.
- That said, there's still a complicated question of "how do you make criticisms well". I think advice on this is important. I think the correct advice usually looks more like advice to whistleblowers than advice for diplomacy.
Note, my opinion of your opinions, and my opinions, are expressed in pretty different ontologies. What are the cruxes?
Suppose future-me tells me that I'm pretty wrong, and actually I'm going about doing criticisms the wrong way, and advocating bad norms for criticism, relative to you. Here are the explanations I come up with:
- "Scissor statements" are actually a huge risk. Make sure to prove the thing pretty definitively, or there will be a bunch of community splits that make discussion and cooperation harder. Yes, this means people are getting deceived in the meantime, and you can't stop that without causing worse bad consequences. Yes, this means group epistemology is really bad (resembling mob behavior), but you should try upgrading that a different way.
- You're using language that implies court norms, but courts disproportionately punish people. This language is going to increase obfuscatory behavior way more than it's worth, and possibly result in disproportionate punishments. You should try really, really hard to develop different language. (Yes, this means some sacrifice in how clear things can be and how much momentum your reform movement can sustain)
- People saying critical things about each other in public (including not-very-blamey things like "I think there's a distortionary dynamic you're getting caught up in") looks really bad in a way that deterministically makes powerful people, including just about everyone with money, stop listening to you or giving you money. Even if you get a true discourse going, the community's reputation will be tarnished by the justice process that led to that, in a way that locks the community out of power indefinitely. That's probably not worth it, you should try another approach that lets people save face.
- Actually, you don't need to be doing public writing/criticism very much at all, people are perfectly willing to listen to you in private, you just have to use this strategy that you're not already using.
These are all pretty cruxy; none of them seem likely (though they're all plausible), and if I were convinced of any of them, I'd change my other beliefs and my overall approach.
There are a lot of subtleties here. I'm up for having in-person conversations if you think that would help (recorded / written up or not).
Me final response in that thread:
This is an awesome comment on many dimensions, thanks. I both agree with your summary of my position, and I think your cruxes are pretty similar to my cruxes.
There are a few additional considerations of mine which I'll list, followed by attempting to tease out some deeper cruxes of mine about "what facts would have to be true for me to want to backpropagate the level of fear it seems like you feel into my aesthetic judgment." [This is a particular metaframe I'm currently exploring]
[Edit: turned out to be more than a few straightforward assumptions, and I haven't gotten to the aesthetic or ontology cruxes yet]
Additional considerations from my own beliefs:
I define clarity in terms of what gets understood, rather than what gets said. So, using words with non-standard connotations, without doing a lot of up-front work to redefine your terms, seems to me to be reducing clarity, and/or mixing clarity, rather than improving it.
I think it's especially worthwhile to develop non-court language, for public discourse, if your intent is not to be punative – repurposing court language for non-punative action is particularly confusing. The first definition for "fraud" that comes up on google is "wrongful or criminal deception intended to result in financial or personal gain". The connotation I associate it with is "the kind of lying you pay fines or go to jail for or get identified as a criminal for".
By default, language-processing is a mixture of truthseeking and politicking. The more political a conversation feels, the harder it will be for people to remain in truthseeking mode. I see the primary goal of a rationalist/truthseeking space to be to ensure people remain in truthseeking mode. I don't think this is completely necessary but I do think it makes the space much more effective (in terms of time spent getting points across).
I think it's very important for language re: how-to-do-politics-while-truthseeking be created separately from any live politics – otherwise, one of the first things that'll happen is the language get coopted and distorted by the political process. People are right/just to fear you developing political language if you appear to be actively trying to wield political weapons against people while you develop it.
Fact that is (quite plausibly) my true rejection – Highly tense conversations that I get defensive at are among the most stressful things I experience, which cripple my ability to sleep well while doing them. This is high enough cost that if I had to do it all the time, I would probably just tune them out.
This is a selfish perspective, and I should perhaps be quite suspicious of the rest of my arguments in light of it. But it's not obviously wrong to me in the first place – having stressful weeks of sleep wrecked is really bad. When I imagine a world where people are criticizing me all the time [in particular when they're misunderstanding my frame, see below about deep model differences], it's not at all obvious that the net benefit I or the community gets from people getting to express their criticism more easily outways the cost in productivity (which would, among other things, be spent on other truthseeking pursuits). When I imagine this multiplied across all orgs it's not very surprising or unreasonable seeming for people to have learned to tune out criticism.
Single Most Important Belief that I endorse – I think trying to develop a language for truthseeking-politics (or politics-adjaecent stuff) could potentially permanently destroy the ability for a given space do politics sanely. It's possible to do it right, but also very easy to fuck up, and instead of properly transmitting truthseeking-into-politics, politics backpropogates into truthseeking, causes people to view truthseeking norms as a political weapon. I think this is basically what happened with the American Right Wing and their view of science (and I think things like the March for Science are harmful because they exacerbate Science as Politics).
In the same way that it's bad to tell a lie, to accomplish some locally good thing (because the damage you do to the ecosystem is far worse than whatever locally good thing you accomplished), I think it is bad to try to invent truthseeking-politics-on-the-fly without explaining well what you are doing while also making claims that people are (rightly) worried will cost them millions of dollars. Whatever local truth you're outputting is much less valuable than the risks you are playing with re: the public commons of "ability to ever discuss politics sanely."
I really wish we had developed good tools to discuss politics sanely before we got access to billions of dollars. That was an understandable mistake (I didn't think about it until just this second), but it probably cost us deeply. Given that we didn't, I think creating good norms requires much more costly signaling of good faith (on everyone's part) than it might have needed. [this paragraph is all weak confidence since I just thought of it but feels pretty true to me]
People have deep models, in which certain things seem obvious them that are not obvious to others. I think I drastically disagree with you about what your prior should be that "Bob has a non-motivated deep model (or, not any more motivated than average) that you don't understand", rather than "Bob's opinion or his model is different/frightening because he is motivated, deceptive and/or non-truth-tracking."
My impression is that everyone with a deep, weird model that I've encountered was overly biased in favor of their deep model (including you and Ben), but this seems sufficiently explained by "when you focus all your attention on one particular facet of reality, that facet looms much larger in your thinking, and other facets loom less large", with some amount of "their personality or circumstance biased them towards their model" (but, not to a degree that seems particularly weird or alarming).
Seeing "true reality" involves learning lots of deep models into narrow domains and then letting them settle.
[For context/frame, remember that it took Eliezer 2 years of blogging every day to get everyone up to speed on how to think in his frame. That's roughly the order-of-magnitude of effort that seems like you should expect to expend to explain a counterintuitive worldview to people]
In particular, a lot of the things that seem alarming to you (like, Givewell's use of numbers that seem wrong) is pretty well (but not completely) explained by "it's actually very counterintuitive to have the opinions you do about what reasonable numbers are." I have updated more towards your view on the matter, but a) it took me a couple years, b) it still doesn't seem very obvious to me. Drowning-Children-are-Rare is a plausible hypothesis but doesn't seem so overdetermined that anyone thinks otherwise must be deeply motivated or deceptive.
I'm not saying this applies across the board. I can think of several people in EA or rationalist space who seem motivated in important ways. My sense of deep models specifically comes from the combination of "the deep model is presented to me when I inquire about it, and makes sense", and "they have given enough costly signals of trustworthiness that I'm willing to give them the benefit of the doubt."
I have updated over the past couple years on how bad "PR management" and diplomacy are for your ability to think, and I appreciate the cost a bit more, but it still seems less than the penalties you get for truthseeking when people feel unsafe.
I have (low confidence) models that seem fairly different from Ben (and I assume your) model of what exactly early LessWrong was like, and what happened to it. This is complicated and I think beyond scope for this comment.
Unknown Unknowns, and model-uncertainty. I'm not actually that worried about scissor-attacks, and I'm not sure how confident I am about many of the previous models. But they are all worrisome enough that I think caution is warranted.
Many of the above bullet-points are cruxy and suggest natural crux-reframes. I'm going to go into some detail for a few:
I could imagine learning that my priors on "deep model divergence" vs "nope, they're just really deceptive" are wrong. I don't actually have all that many data points to have longterm confidence here. It's just that so far, most of the smoking guns that have been presented to me didn't seem very definitive.
The concrete observations that would shift this are "at least one of the people that I have trusted turns out to have a smoking gun that makes me think their deep model was highly motivated" [I will try to think privately about what concrete examples of this might be, to avoid a thing where I confabulate justifications in realtime.]
It might be a lot easier than I think to create a public truthseeking space that remains sane in the face of money and politics. Relatedly, I might be overly worried about the risk of destroying longterm ability to talk-about-politics-sanely.
If I saw an existing community that operated on a public forum and onboarded new people all the time, which had the norms you are advocating, and interviewing various people involved seemed to suggest it was working sanely, I'd update. I'm not sure if there are easier bits of evidence to find.
The costs that come from diplomacy might be higher than the costs of defensiveness.
Habryka has described experiences where diplomacy/PR-concerns seemed bad-for-his-soul in various ways. [not 100% sure this is quite the right characterization but seems about right]. I think so far I haven't really been "playing on hard mode" in this domain, and I think there's a decent chance that I will be over the next few years. I could imagine updating about how badly diplomacy cripples thought after having that experience, and for it to turn out to be greater than defensiveness.
I might be the only person that suffers from sleep loss or other stress-side-effects as badly as I do.
These were the easier ones. I'm trying to think through the "ontology doublecrux" thing and think about what sorts of things would change my ontology. That may be another while.
Criticism != Accusation of Wrongdoing
Later on, during an in-person conversation with Jessica, someone else (leaving them anonymous) pointed out an additional consideration, which is that criticism isn't the same as accusations.
[I'm not sure I fully understood the original version of this point, so the following is just me speaking for myself about things I believe]
There's an important social technology, which is to have norms that people roughly agree on. The costs of everyone having to figure out their own norms are enormous. So most communities have at least some basic things that you don't do (such as blatantly lying)
Several important properties here are:
- You can ostracize people who continuously violate norms.
- If someone accuses you of a norm violation, you feel obligated to defend yourself. (Which is very different from getting criticized for something that's not a norm violation)
- If Alice makes an accusation of someone violating norms, and that accusation turns out to be exaggerated or ill-founded, that Alice loses points, and people are less quick to believe her or give her a platform to speak next time.
I think one aspect of the deep disagreements going on here is something like "what exactly are the costs of everyone having to develop their own theory of goodness", and/or what are the benefits of the "there are norms, that get enforced and defended" model.
I understand Benquo and Jessica are arguing that we do not in fact have such norms, we just have the illusion of such norms, and in fact what we have are weird political games that benefit the powerful. And they see their approach as helping to dispel that illusion.
Whereas I think we do in fact have those norms – there's a degree of lying that would get you expelled from the rationalsphere and EAsphere , and this is important. And so insisting on being able to discuss, in public, whether Bob lied [a norm violation], while claiming that this is not an attack on Bob, just an earnest discussion of the truth or model-building of adversarial discourse... is degrading not only the specific norm of "don't lie" but also "our general ability to have norms."
My current state
I'm currently in the process of mulling this all over. The high level questions are something like:
- [Within my current ontology] What sorts of actions by EA leaders would shift my position from "right now we actually have a reasonably good foundation of trustworthiness" to "things are not okay, to the point where it makes more sense to kick the game board over rather than improve things." Or, alternately "things are not okay, and I need to revise my ontology in order to account for it."
- How exactly would/should I shift my ontology if things were sufficiently bad?
I expect this to be a fairly lengthy process, and require a fair amount of background processing.
There are other things I'm considering here, and writing them up turned out to take more time than I have at the moment. Will hopefully have a Pt 2 of this post.
Defining clarity in terms of what gets understood results in obfuscation winning automatically, by effectively giving veto power to motivated misunderstandings. (As Upton Sinclair put it, "It is difficult to get a man to understand something when his salary depends upon his not understanding it," or as Eliezer Yudkowsky put it more recently, "politically motivated incomprehension makes people dumber than cassette tape recorders.")
If we may be permitted to borrow some concepts from law (while being wary of unwanted transfer of punitive intutions), we may want concepts of willful blindness, or clarity to the "reasonable person".
Imagine that this had already happened. How would you go about starting to fix it, other than by trying to describe the problem as clearly as possible (that is, "invent[ing] truthseeking-politics-on-the-fly")?
Huh. This just seems obviously the opposite to me – if Alice's salary depends on not understanding you, you don't get points for having stated the thing in a way that seemed straightforward to you but that Alice misunderstood (willfully or otherwise). You have a hard problem, and you shouldn't pretend to have solved it when you haven't.
It's a viable strategy to speak in public and hoping someone _other_ than Alice understands you (and maybe a collection of people can convince Alice by adding new incentives to counterbalance her salary-dependency, or maybe you just get enough consensus that it stops mattering whether Alice understands or not). But this strategy still depends on someone having understood you.
Possibly clearer version of what Jessica is saying:
Imagine three levels of explanation: Straightforward to you, straightforward to those without motivated cognition, straightforward even to those with strong motivated cognition.
It is reasonable to say that getting from level 1 to level 2 is often a hard problem, that it is on you to solve that problem.
It is not reasonable, if you want clarity to win, to say that level 2 is insufficient and you must reach level 3. It certainly isn't reasonable to notice that level 2 has been reached, but level 3 has not, and thus judge the argument insufficient and a failure. It would be reasonable to say that reaching level 3 would be *better* and suggest ways of doing so.
If you don't want clarity to win, and instead you want to accomplish specific goals that require convincing specific people that have motivated cognition, you're on a different quest. Obfuscation has already won, because you are being held to higher standards and doing more work, and rewarding those who have no desire to understand for their failure to understand. Maybe you want to pay that price in context, but it's important to realize what you've lost.
Can you taboo "clarity"?
I think perhaps it has (ironically, perhaps), unclear to me what this even means. In particular what it means for "clarity to win." It doesn't make any sense to me to define clarity as something other than "communicating in such a way that people can understand what you meant." What else would it mean?
[Recall, as I mentioned elsethread, that the primary thing I'm arguing against in this subthread is using words in nonstandard ways.]
(You said you didn't want more back-and-forth in the comments, but this is just an attempt to answer your taboo request, not prompt more discussion; no reply is expected.)
We say that clarity wins when contributing to accurate shared models—communicating "clearly"—is a dominant strategy: agents that tell the truth, the whole truth, and nothing but the truth do better (earn more money, leave more descendants, create more paperclips, &c.) than agents that lie, obfuscate, rationalize, play dumb, report dishonestly, filter evidence, &c.
Creating an environment where "clarity wins" (in this sense) looks like a very hard problem, but it's not hard to see that some things don't work. Jessica's example of a judged debate where points are only awarded for arguments that the opponent acknowledges, is an environment where agents who want to win the debate have an incentive to play dumb—or be dumb—never acknowledging when their opponent made a good argument (even if the opponent in fact made a good argument). In this scenario, being clear (or at least, clear to the "reasonable person", if not your debate opponent) doesn't help you win.
Appreciate it. That does help.
I think main thing I want to avoid with the back-and-forth is feeling a sense of urgency to respond (esp. if I'm feeling frustrated about being misunderstood). Gonna try an experiment of "respond to comments here once per day").
Will probably respond tomorrow.
Curious how that experiment ended and think this type of rule is healthy in general (e.g. rate limiting how often one checks and responds) and I'm doing my best to follow a similar one.
It certainly seemed better than rapid-fire commenting.
I don't know whether it was better than not commenting at all – I spent this thread mostly feeling exasperated that after 20 hours of debate and doublecrux it seemed like the conversation hadn't really progressed. (Or at least, I was still having to re-explain things that I felt I had covered over and over again)
I do think Zack's final comment is getting at something fairly important, but which still felt like a significant topic shift to me, and which seemed beyond scope for the current discussion.
Responding in somewhat more depth: this was a helpful crystallization of what you're going for here.
I'm not 100% sure I agree as stated – "Tell the truth, whole truth and nothing but the truth" doesn't (as currently stated) have a term in the equation for time-cost.
(i.e. it's not obvious to me that a good system incentives always telling the whole-truth, because it's time intensive to do that. Figuring out how to communicate a good ratio of "true, useful information per unit of mutual time/effort" feels like it should be part of the puzzle to me. But I generally agree that it's good to have a system wherein people are incentivized to share useful, honest information to each other, and do not perform better by withholding information with [conscious or otherwise] intent to deceive)
((but I'm guessing your wording was just convenient shorthand rather than a disagreement with the above))
But on the main topic:
Jessica's Judge example still feels like a nonsequitor that doesn't have much to do with what I was talking about. Telling the truth/whole-truth/nothing-but still only seems useful insofar as it generates clear understanding in other people. As I said, even if the Judge example, Carol has to understand Alice's claims.
I don't know what it'd mean to care about truth-telling, without having that caring be grounded out in other people understanding things. And "hypothetical reasonable person" doesn't seem that useful a referent to me.
What matters is whatever people in the system you're trying to communicate with. If they're reasonable, great, the problem you're trying to solve is easier. If they're so motivatedly-unreasonable that they won't listen at all, the problem may be so hard that maybe you should go to some other place where more reasonable people live and try there instead. (Or, if you're Eliezer in 2009, maybe you recurse a bit and write the Sequences for 2 years so that you gain access to more reasonable people).
(Part of the reason I'm currently very interesting in Double Crux is that having it be the default frame seems much more resistant to motivated reasoning. People can fake/obfuscate their way through a doublecrux, but my current experience is that it's much harder to do so convincingly than during debate)
P.S. (to sister comment), I'm going to be traveling through the 25th and probably won't check this website, in case that information helps us break out of this loop of saying "Let's stop the implicitly-emotionally-charged back-and-forth in the comments here," and then continuing to do so anyway. (I didn't get anything done at my dayjob today, which is an indicator of me also suffering from the "Highly tense conversations are super stressful and expensive" problem.)
Yes, trivially; Jessica and I both agree with this.
Indeed, it may not have been relevant to the specific thing you were trying to say. However, being that as it may, I claim that the judge example is relevant to one of the broader topics of conversation: specifically, "what norms and/or principles should Less Wrong aspire to." The Less Wrong karma and curation systems are functionally a kind of Judge, insofar as ideas that get upvoted and curated "win" (get more attention, praise, general acceptance in the rationalist community, &c.).
If Alice's tendency to lie, obfuscate, rationalize, play dumb, report dishonestly, filter evidence, &c. isn't an immutable feature of her character, but depends on what the Judge's behavior incentivizes (at least to some degree), then it really matters what kind of Judge you have.
We want Less Wrong specifically, and the rationalist community more generally, to be a place where clarity wins, guided by the beauty of our weapons. If we don't have that—if we live in a world where lies and bullshit outcompete truth, not just in the broader Society, but even in the rationalist community—then we're dead. (Because you can't solve AI alignment with lies and bullshit.)
As a moderator and high-karma user of lesswrong.com, you, Raymond Arnold, are a Judge. Your strong-upvote is worth 10 karma; you have the power to Curate a post; you have the power to have the power to tell Alice to shape up or ship out. You are the incentives. This is a huge and important responsibility, your Honor—one that has the potential to influence 10¹⁴ lives per second. It's true that truthtelling is only useful insofar as it generates understanding in other people. But that observation, in itself, doesn't tell you how to exercise your huge and important responsibility.
If Jessica says, "Proponents of short AI timelines are lying, but not necessarily consciously lying; I mostly mean covert deception hidden from conscious attention," and Alice says, "Huh? I can't understand you if you're going to use words in nonstandard ways," then you have choices to make, and your choices have causal effects.
If you downvote Jessica because you think she's drawing the category boundaries of "lying" too widely in a way that makes the word less useful, that has causal effects: fewer people will read Jessica's post; maybe Jessica will decide to change her rhetorical strategy, or maybe she'll quit the site in disgust.
If you downvote Alice for pretending to be stupid when Jessica explicitly explained what she meant by the word "lying" in this context, then that has causal effects, too: maybe Alice will try harder to understand what Jessica meant, or maybe Alice will quit the site in disgust.
I can't tell you how to wield your power, your Honor. (I mean, I can, but no one listens to me, because I don't have power.) But I want you to notice that you have it.
I agree that "retreat" and "exert an extraordinary level of interpretive labor" are two possible strategies for dealing with unreasonable people. (Personally, I'm a huge fan of the "exert arbitrarily large amounts of interpretive labor" strategy, even though Ben has (correctly) observed that it leaves me incredibly vulnerable to certain forms of trolling.)
The question is, are there any other strategies?
The reason "retreat" isn't sufficient, is because sometimes you might be competing with unreasonable people for resources (e.g., money, land, status, control of the "rationalist" and Less Wrong brand names, &c.). Is there some way to make the unreasonable people have to retreat, rather than the reasonable people?
I don't have an answer to this. But it seems like an important thing to develop vocabulary for thinking about, even if that means playing in hard mode.
Put another way: my current sense is that the reason truth-telling-is-good is basically "increased understanding", "increased ability to coordinate" and "increase ability to build things/impact reality". (where the latter two is largely caused by the first).
I'm not confident that list is exhaustive, and if you have other reasons in mind that truth-telling is good that you think I'm missing I'm interested in hearing about that.
It sounds something like you think I'm saying 'clarity is about increasing understanding, and therefore we should optimizing naively for understanding in a goodharty way', which isn't what I mean to be saying.
In some sense that list is rather exhaustive because it includes "know anything" and "do anything" as goals that are helped, and that pretty much includes everything. But in that sense, the list is not useful. In the sense that the list is useful, it seems woefully incomplete. And it's tricky to know what level to respond on. Most centrally, this seems like an example of the utilitarian failure mode of reducing the impact of a policy to the measured, proven direct impact of that policy, as a default (while still getting a result that is close to equal to 'helps with everything, everywhere, that matters at all').
"Increased ability to think" would be one potential fourth category. If truth is not being told because it's not in one's interest to do so, there is strong incentive to destroy one's own ability to think. If one was looking to essentially accept the error of 'only point to the measurable/observable directly caused effects.'
Part of me is screaming "do we really need a post explaining why it is good when people say that which is, when they believe that would be relevant or useful, and bad when they fail to do so, or say that which is not?"
Suppose Carol is judging a debate between Alice and Bob. Alice says "X, because Y". Bob acknowledges the point, but argues "actually, a stronger reason for believing not-X is Z". Alice acts like she doesn't understand the point. Bob tries explaining in other words, without success.
Carol, following your advice, says: "Alice made a clear point in favor of X. Bob failed to make a clear point against X." Therefore, she judges the debate outcome to be in favor of X.
However, this is Carol abdicating her responsibility to use her own judgment of how clear Bob's point was. Maybe it is really clear to Carol, and to a hypothetical "reasonable person" (significantly less smart than Carol), that Z is a good reason to believe not-X. Perhaps Z is actually a very simple logical argument. And so, the debate outcome is misleading.
The thing is that in any judgment of clarity, one of the people involved is the person making that judgment; and, they are obligated to use their own reasoning, not only to see whether the point was understood by others. You can't define clarity by whether someone else understood the point, you have to judge it for yourself as well. (Of course, after making your own judgment about how clear the point was, you can define the statement's clarity as whether you judged it to be clear, but this is tautological)
But in this scenario, understanding still lives inside Carol's head, not Alice's.
I wasn't suggesting that someone like Carol abdicate responsibility in this sort of situation. The point is that it's still on Alice to get someone to understand her. Who needs to understand her depends on the situation. Clarity without understanding seems meaningless to me. (Perhaps see reply to Zvi: can we taboo 'clarity?')
Note that a lot of my motivation here was to address Jessicata using words in non-standard ways (i.e lie/fraud/outlaw/scam).
In this case the issue isn't that anyone is willfully misunderstanding anyone – if you're using a word with a different definition than people are used to, it's a fairly straightforward outcome for people to not understand you.
That makes sense. I, personally, am interested in developing new terminology for talking about not-necessarily-conscious-and-yet-systematically-deceptive cognitive algorithms, where Ben and Jessica think that "lie"/"fraud"/&c. are fine and correct.
I see great need for some way to indicate "not-an-accident but also not necessarily conscious or endorsed." And ideally the term doesn't have a judgmental or accusatory connotation.
This seems pretty hard to do actually. Maybe an acronym?
Alice lied (NIANOA) to Bob about X.
Not Intentionally And Not On Accident
For 'things that aren't an accident but aren't necessarily conscious or endorsed', another option might be to use language like 'decision', 'action', 'choice', etc. but flagged in a way that makes it clear you're not assuming full consciousness. Like 'quasi-decision', 'quasi-action', 'quasi-conscious'... Applied to Zack's case, that might suggest a term like 'quasi-dissembling' or 'quasi-misleading'. 'Dissonant communication' comes to mind as another idea.
When I want to emphasize that there's optimization going on but it's not necessarily conscious, I sometimes speak impersonally of "Bob's brain is doing X", or "a Bob-part/agent/subagent is doing X".
The most important thing from my perspective is to separate out:
In my mind, the core disagreement here is that I think Benquo has often mixed the third thing in with the first thing (and sort of skipped over the second thing?), which I consider actively harmful to the epistemic health of the discourse.
How do you figure out good policies, or convince others of the need for such policies, without pointing out the problem with current policies? If that is not possible, how does one point them out without being seen as accusing individuals of wrongdoing?
I'd said this a few times – you can talk it over in private first, and/or if it seems important to talk through the example publicly, take special care to be clear that you're not accusing people of wrongdoing.
How about focusing on the evidence, and on demonstrating good epistemics?
The styles encouraged by peer-review provide examples of how to minimize unnecessary accusations against individuals and accidental appearances of accusations against individuals (but peer-review includes too many other constraints to be the ideal norm).
Compare the paper When Will AI Exceed Human Performance? Evidence from AI Experts to The AI Timelines Scam. The former is more polite, and looks more epistemically trustworthy, when pointing out that experts give biased forecasts about AI timelines (more biased than I would have inferred from The AI Timelines Scam), but may err in the direction of being too subtle.
See also Bryan Caplan's advice.
Raemon's advice here doesn't seem 100% right to me, but it seems pretty close. Accusing a specific person or organization of violating an existing norm seems like something that ought to be kept quite separate from arguments about what policies are good. But there are plenty of ways to point out patterns of bad behavior without accusing someone of violating an existing norm, and I'm unsure what rules should apply to those.
Good epistemics says: If X, I desire to believe X. If not-X, I desire to believe not-X.
This holds even when X is "Y person did Z thing" and Z is norm-violating.
If you don't try to explicitly believe "Y person did Z thing" in worlds where in fact Y person did Z thing, you aren't trying to have good epistemics. If you don't say so where it's relevant (and give a bogus explanation instead), you're demonstrating bad epistemics. (This includes cases of saying a mistake theory where a conflict theory is correct)
It's important to distinguish good epistemics (having beliefs correlated with reality) with the aesthetic that claims credit for good epistemics (e.g. the polite academic style).
Don't conflate politeness with epistemology. They're actually opposed in many cases!
Does the AI survey paper say experts are biased in any direction? (I didn't see it anywhere)
Is there an accusation of violation of existing norms (by a specific person/organization) you see "The AI Timelines Scam" as making? If so, which one(s)?
I personally wouldn't point to "When Will AI Exceed Human Performance?" as an exemplar on this dimension, because it isn't clear about the interesting implications of the facts it's reporting. Katja's take-away from the paper was:
I don't know whether Katja's co-authors agree with her about that summary, but if there's disagreement, I think the paper still could have included more discussion of the question and which findings look relevant to it.
The actual Discussion section makes the opposite argument instead, listing a bunch of reasons to think AI experts are good at foreseeing AI progress. The introduction says "To prepare for these challenges, accurate forecasting of transformative AI would be invaluable. [...] The predictions of AI experts provide crucial additional information." And the paper includes a list of four "key findings", none of which even raise the question of survey respondents' forecasting chops, and all of which are worded in ways that suggest we should in fact put some weight on the respondents' views (sometimes switching between the phrasing 'researchers believe X' and 'X is true').
The abstract mentions the main finding that undermines how believable the responses are, but does so in such a way that someone reading through quickly might come away with the opposite impression. The abstract's structure is:
If it slips past your attention that G and H are massively inconsistent, it's easy for the reader to come away thinking the abstract is saying 'Here's a list of of credible statements from experts about their area of expertise' as opposed to 'Here's a demonstration that what AI researchers think is not a decent guide to what's going to happen'.
By bias, I mean the framing effects described in this SlateStarCodex post.
It's unclear to me whether that post makes such an accusation.
Question: do you mean this as a strictly denotative claim (Benquo is, as a matter of objective fact, mixing the things, which is, as a matter of fact, actively harmful to the discourse, with no blame whatsoever implied), or are you accusing Benquo of wrongdoing?
I think* (*but this is not a domain where I fully trust my introspection, or can credibly claim off-the-cuff that I've been consistently principled), that my intent is to criticize Benquo for following bad strategy according to his own principles (and mine), in a way that I consider blameworthy, but not norm-violation style blameworthiness. i.e. there is no staghunt to coordinate against this, so de-facto we're not coordinating against this.
I definitely hadn't thought concretely about the question until just now (I hadn't had the "norm violations != criticism" crisply spelled out until a couple weeks ago). And so I assume that, by default, I have not necessarily been attending to this principle consistently over the past couple years of debate.
I liked most of this post a lot.
But the references to billions of dollars don't feel quite right. The kind of trust that jessicata and Benquo seem to want sometimes happens in small (e.g. 10-person) companies, and almost always gets destroyed by politics well before the business grows to 1000 people. The patterns that I've seen in business seem better explained by the limits on how many peoples' epistemics I can evaluate well, than they are by the amount of money involved.
LessWrong and the rationalist/EA movements seem to have grown large enough that I'd expect less trust than exists in a good 10-person group, based purely on the size.
I think there's a few things going on.
I'd definitely agree monetary incentives aren't the whole picture. (Also, I don't think it's necessary for the answer to be 'billions' to start producing distortionary effects from monetary incentives – "thousands" can be perfectly sufficient. It just so happens that "billions" is the order of magnitude of money available)
"Number of people who's epistemics I can evaluate well" seems relevant, but I've also found some kinds of distortionary effects within startups with 5 people.
One noteworthy update I made:
A central disagreement seems to be: If you see a person who looks obviously wrong about a thing, and you have a plausible story for them being politically motivated... is it more like that:
a) their position is mostly explained via politically motivation
b) their position is mostly explained via them having a very different model than you, built out of legitimate facts and theories?
It seemed like Jessica and Ben lean towards assuming A. I lean towards assuming B.
My reason is that many of the times I've seen someone be accused of A (or been accused of A myself), there's been an explanation of a different belief/worldview that actually just seemed reasonable to me. People seem to have a tendency to jump to uncharitable interpretations of things, esp. from people who are in some sense competitors.
But, asking myself "what sort of evidence would lead me to an opposite prior?", one thing that comes to mind is: if I saw people regularly shifting their positions in questionable ways that didn't seem defensible. And what then occurred to me that if I'm looking at the median effective altruist, I think I totally see this behavior all the time. And I see this sort of behavior non-zero among the leaders of EA/x-risk/rationality orgs.
And this didn't register as a big deal to me, cuz, I dunno, rank-and-file EA and rationalist newbies are going to have bad epistemics, shrug. And meanwhile EA leadership still seemed to have generally good epistemics on net (and/or be on positive trajectories for their epistemics).
But I can definitely imagine an order-of-experiences where I first observed various people having demonstrably bad epistemics, and then raising to attention the hypothesis that this was particularly troubling, and then forming a prior based on it, and then forming a framework built around that prior, and then interpreting evidence through that framework.
This isn't quite the same as identifying a clear crux of mine – I still have the salient experiences of people clearly failing to understand each other's deep models, and there still seem like important costs of jumping to the "motivated reasoning" hypothesis. So that's still an important part of my framework. But imagining the alternate order-of-experiences felt like an important motion towards a real crux.
My model of politically motivated reasoning is that it usually feels reasonable to the person at the time. So does reasoning that is not so motivated. Noticing that you feel the view is reasonable isn't even strong evidence that you weren't doing this, let alone that others aren't doing it.
This also matches my experience - the times when I have noticed I used politically motivated reasoning, it seemed reasonable to me until this was pointed out.
I agree with this, but it doesn't feel like it quite addresses the thing that needs addressing.
[I started writing a reply here, and then felt like it was necessary to bring up the object level disagreements to really disentangle anything.
I actually learn slightly towards "it would be good to discuss the object level of which people/orgs have confusing and possibly deceptive communication practices, but in a separate post, and taking a lot of care to distinguish what's an accusation and what's thinking out loud"]
What makes you think A and B are mutually exclusive? Or even significantly anticorrelated? If there are enough very different models built out of legitimate facts and theories for everyone to have one of their own, how can you tell they aren't picking them for political reasons?
Not saying they're exclusive.
Note: (not sure if you had this in mind when you made your comment), the OP comment here wasn't meant to be an argument per se – it's meant to be trying to articulate what's going on in my mind and what sort of motions would seem necessary for it to change. It's more descriptive than normative.
My goal here is expose the workings of my belief structure, partly so others can help untangle things if applicable, and partly to try to demonstrate what doublecrux feels like when I do it (to help provide some examples for my current doublecrux sequence)
There a few different (orthogonal?) ways I can imagine my mind shifting here:
All of these are knobs that can be tweaked, rather than booleans to be flipped. And (hopefully obvious) this isn't actually an exhaustive list of how my mind might change, just trying to articulate some of the more salient options.
It seems plausible that I should do A, B, or C (but, I have not yet been persuaded that my current weights are wrong). It does not seem plausible currently that I should do D. E is sufficiently complicated that I'm not sure I have a sense of how plausible it is, but current arguments I've encountered haven't seemed that overwhelming.
Clarification question: Is this default to B over A meant to apply to the population at large, or for people who are in our orbits?
It seems like your model here actually views A as more likely than B in general but thinks EA/rationality at higher levels constitutes an exception, despite your observation of many cases of A in that place.
I am specifically talking about EA/rationality at higher levels (i.e. people who have been around a long time, especially people who read the sequences or ideally who have worked through some kind of epistemological issue in public)
There's never been much of a fence around EA/rationality space, so it shouldn't be surprising that you can find evidence of people having bad epistemics if you go looking for it. (Or, even if you’re just passively tracking the background rate of bad epistemically)
From my perspective, it's definitely a huge chunk of the problem here that people are coming from different ontologies, paradigms, weighing complicated tradeoffs against each other and often making different judgment calls of "exactly which narrow target in between the rock and the hard place are you trying to hit?"
It might also be part of the problem that people are being motivated or deceptive.
But, my evidence for the former is "I've observed it directly" (at the very least, in the form of Ben/you/Jessica/Zack not understanding my paradigm despite 20 hours of discussion, and perhaps vice versa), and the evidence for the latter is AFAICT more like "base rates".
("But base rates tho" is actually a pretty good argument, which is why I think this whole discussion is real important)
When we talked 28 June, it definitely seemed to me like you believed in the existence of self-censorship due to social pressure. Are you not counting that as motivated or deceptive, or have I misunderstood you very badly?
Note on the word "deceptive": I need some word to talk about the concept of "saying something that has the causal effect of listeners making less accurate predictions about reality, when the speaker possessed the knowledge to not do so, and attempts to correct the error will be resisted." (The part about resistence to correction is important for distinguishing "deception"-in-this-sense from simple mistakes: if I erroneously claim that 57 is prime and someone points out that it's not, I'll immediately say, "Oops, you're right," rather than digging my heels in.)
I'm sympathetic to the criticism that lying isn't the right word for this; so far my best alternatives are "deceptive" and "misleading." If someone thinks those are still too inappropriately judgey-blamey, I'm eager to hear alternatives, or to use a neologism for the purposes of a particular conversation, but ultimately, I need a word for the thing.
If an Outer Party member in the world of George Orwell's 1984 says, "Oceania has always been at war with Eastasia," even though they clearly remember events from last week, when Oceania was at war with Eurasia instead, I don't want to call that deep model divergence, coming from a different ontology, or weighing complicated tradeoffs between paradigms. Or at least, there's more to the story than that. The divergence between this person's deep model and mine isn't just a random accident such that I should humbly accept that the Outside View says they're as likely to be right as me. Uncommon priors require origin disputes, but in this case, I have a pretty strong candidate for an origin dispute that has something to do with the the Outer Party member being terrified of the Ministry of Love. And I think that what goes for subjects of a totalitarian state who fear being tortured and murdered, also goes in a much subtler form for upper-middle class people in the Bay Area who fear not getting invited to parties.
Obviously, this isn't license to indiscriminately say, "You're just saying that because you're afraid of not getting invited to parties!" to any idea you dislike. (After all, I, too, prefer to get invited to parties.) But it is reason to be interested in modeling this class of distortion on people's beliefs.
Judging a person as being misleading implies to me that I have a less accurate model of the world if I take what they say at face value.
Plenty of self-censorship isn't of that quality. My model might be less accurate then the counterfactual model where the other person shared all the information towards which they have access, but it doesn't get worse through the communication.
There are words like 'guarded' that you can use for people who self center a lot.
Apologies. A few things to disambiguate and address separately:
1. In that comment I was referring primarily to discussions about the trustworthiness and/or systematic distortion-ness of various EA and rationalist orgs and/or leadership, which I had mentally bucketed as fairly separate from our conversation. BUT even in that context "Only counterargument is base rates" is not a fair summary. I was feeling somewhat frustrated at the time I wrote that but that's not a good excuse. (The behavior I think I endorse most is trying to avoid continuing the conversation in a comment thread at all, but I've obviously been failing hard at that)
2. My take on our prior conversation was more about "things that are socially costly to talk about, that are more like 'mainstream politics' than like 'rationalist politics.'" Yes, there's a large cluster of things related to mainstream politics and social justice where weighing in at all just feels like it's going to make my life worse (this is less about not getting invited to parties and more about having more of my life filled with stressful conversations for battles that I don't think are the best thing to prioritize fighting)
OK. Looking forward to future posts.
The word "self-deception" is often used for this.
The reason it's still tempting to use "deception" is because I'm focusing on the effects on listeners rather than the self-deceived speaker. If Winston says, "Oceania has always been at war at Eastasia" and I believe him, there's a sense in which we want to say that I "have been deceived" (even if it's not really Winston's fault, thus the passive voice).
Self-deception doesn't imply other people aren't harmed, merely that the speaker is deceiving themselves first before they deceive others. Saying "what you said to me was based on self-deception" doesn't then imply that I wasn't deceived, merely points at where the deception first occurred.
For instance, the Arbinger institute uses the term "self-deception" to refer to when someone treats others as objects and forgets they're people.
FWIW I think "deceptive" and "misleading" are pretty fine here (depends somewhat on context but I've thought the language everyone's been using in this thread so far was fine)
I think the active-ingredient in the "there's something resisting correction" has a flavor that isn't quite captured by deceptive (self-deceptive is closer). I think the phrase that most captures this for me is perniciously motivated, or something like that.
This is excellent. I especially enjoyed the elucidation of cruxes by you and Jessica.
FWIW, the sleep thing you mentioned feels especially cruxy from a systemic perspective, even though you only mentioned it as a personal concern.
Note: I think it's not really a good idea for me to wade into the object level discussion in this thread. Some things I plan to do but haven't yet, and would want to do before continuing the conversation, include:
I had written up an overview of my current approach/worldview on my shortform page. Reposting here for ease of reference. (I think this is sort of covered in the OP, but it's interwoven with various disagreement-untanglings that I think make it harder to parse)
This is awesome! I cannot sufficiently express my admiration for trying to make these kinds of discussions transparent and accessible.
There's a lot of surface area in this, even in the summary, so I don't think I can do justice in a comment. I'll instead just highlight a few things that resonated or confused me.
By "billions of dollars are at stake" I concretely meant "OpenPhil exists, which LW has demonstrably had at least some influence on, which has influence over billions of dollars." (And a couple other things in that reference class but lower order of magnitude, and potentially more things in the future at a higher order of magnitude)
There are other relevant things that _also_ change the landscape, but that was the reason I phrased it as "billions of dollars are at stake."
The precious thing is all of those things and also other things. It's generally "the ability to have good, real, rational discourse."
It's the sort of thing where getting more explicit about the thing might get in the way of clarity rather than improve it. See this comment by Zvi:
(note: this is more antagonistic than I feel - I agree with much of the direction of this, and appreciate the discussion. But I worry that you're ignoring a motivated blind spot in order to avoid biting some bullets).
So, there's something precious that dissolves when defined, and only seems to occur in low-stakes conversations with a small number of people. It's related to trust, ability to be wrong (and to point out wrongness). It feels like the ability to have rational discourse, but that feeling is not subject to rational discourse itself.
Is it possible that it's not truth-seeking (or more importantly, truth itself) you're worried about, but unstated friendly agreement to ignore some of the hard questions? In smaller, less important conversations, you let people get away with all sorts of simplifications, theoretical constructs, and superficial agreements, which results in a much more pleasant and confident feeling of epistemic harmony.
When it comes time to actually commit real resources, or take significant risks, however, you generally want more concrete and detailed agreement on what happens if you turn out to be incorrect in your stated, shared beliefs. Which indicates that you're less confident than you appear to be. This feels bad, and it's tempting for all participants to now accuse the other of bad faith. This happens very routinely in friends forming business partnerships, people getting married, etc.
Maybe it's not a loss in truth-seeking ability, it's a loss of the ILLUSION of truth-seeking ability. Humans vary widely in their levels of rationality, and in their capability to hold amounts of data and make predictions, and in their willingness to follow/override their illegible beliefs in favor of justifiable explicit ones. It's not the case that the rationalist community is no better than average: we're quite a bit better than average (and conversations like this may well improve it further). But average is TRULY abysmal.
I've long called it the "libertarian dilemma": agency and self-rule and rational decision-making is great for me, and for those I know well enough to respect, but the median human is pretty bad at it, and half of them are worse than that. When you're talking about influencing other people's spending decisions, it's a really tough call whether to nudge/manipulate them into making better decisions than they would if you neutrally present information in the way you (think you) prefer. Fundamentally, it may be a question of agency: do you respect people's right to make bad decisions with their money/lives?
I think this is importantly not what's going on here.
If anything, Ben's position is something like the above sentence representing what I've been pushing towards (whether accidentally or on purpose), as opposed to "actually being able to have honest, truthseeking conversations about hard questions."
And Ben's whole point is that this is bad. (and the point of my original "precious thing" paragraph was trying communicate that I understood Ben's concern, but was coming at it from a different angle, and that I also care about having honest, truthseeking conversations about hard things.)
[I'm not sure Ben would quite endorse this description though, and would be interested in him clarifying if it seemed off]
A major reason that private conversations are important, IMO, is that they enable people to talk through fuzzy things that are hard to articulate, but where you can ask probing questions that make sense to you-and-only-you in order to check whether you're actually talking about the same, hard-to-articulate-thing. You can't jump to making them explicit because you're running off a collection of intuitions, with lots of experiences baked into your intuition. But in private conversation it's easier (for me at least) to get a sense of whether you're talking about the same pre-explicit thing.
(The problem with having the conversation in public is precisely that other people will be asking "wait, what precious thing, exactly?" which derails the high context conversation. There's a sort of two-way-street that I think needs building, where people-who-have-high-context-conversations make more effort to write them up, but everyone else kinda accepts that it might not always be achievable for them to follow along that easily)
I get that, but if the high-context extensive private conversation doesn't or can't) identify the precious thing, it seems somewhat likely that either you're both politely accepting that the other may be thinking about something else entirely, and/or it may not actually be a thing.
I very much like your idea that you should have the conversation with the default expectation of publishing at a later time. If you haven't been able to agree on what the thing is by then, I think the other people asking "wait, what precious thing exactly" are probably genuinely confused.
Note that I realize and have not resolved the tension between my worry that indescribable things aren't things, and my belief that much (and perhaps most) of human decision-making is based on illegible-but-valid beliefs. I wonder if at least some of this conversation is pointing to a tendency to leak illegible beliefs into intellectual discussions in ways that could be called "bias" or "deception" if you think the measurable world is the entirety of truth, but which could also be reasonably framed as "correction" or "debiasing" a limited partial view toward the holistic/invisible reality. I'm not sure I can make that argument, but I would respect it and take it seriously if someone did.
As someone who was involved in the conversations, and who cares about and focuses on such things frequently, this continues to feel important to me, and seems like one of the best examples of an actual attempt to do the thing being done, which is itself (at least partly) an example of the thing everyone is trying to figure out how to do.
What I can't tell is whether anyone who wasn't involved is able to extract the value. So in a sense, I "trust the vote" on this so long as people read it first, or at least give it a chance, because if that doesn't convince them it's worthwhile, then it didn't work. Whereas if it does convince them, it's great and we should include it.
This was the first major, somewhat adversarial doublecrux that I've participated in.
(Perhaps this is a wrong framing. I participated in many other significant, somewhat adversarial doublecruxes before. But, I dunno, this felt significantly harder than all the previous ones, the point where it feels like a difference in kind)
It was a valuable learning experience for me. My two key questions for "Does this actually make sense as part of the 2019 Review Book" are:
On the object level, my tl;dr takes the form of "which blogposts should someone write as a followup?", which I think are:
Alice in fact didn't understand exactly what they were trying to say.
(I realize Benquo/Jessica probably still disagreement with my beliefs/emphasis on the first part. But this was a concrete update I made while reviewing the post, and a mistake I think I was making a lot. Even if I later change my mind about how much harmony matters for group truthseeking, it'd still be necessary for the post to directly address the benefits in order to be understood by past-me)
I think there are more points worth lifting out of here, but I'm not sure how oddly specific they were to the particular people in this conversation, rather than generally useful.
On "how did this go as a doublecrux", I notice:
That might be fine. I don't think this is was Benquo/Jessica's goal (I think their goal was more like 'figure out if the LessWrong Team is aligned with them enough to be worth investing in LessWrong', and I think they succeeded at that)
On "can other people learn from this as a doublecrux?"
I... don't know. I think maybe, but that's mostly up to other people.
Note on Framing:
I notice is that a large chunk of the text of this post are direct quotes from Benquo and Jessicata, but it's wrapped in a post where I control the frame. If this were considered for inclusion-in-the-book, I'd be interested in having them write reviews of their year-later-takeaways, written in their own frames.
Some notes regarding object level ideas in this post and the discussion:
(each quoted section is basically a new topic)
Benquo: (emphasis mine)
While I still have many complaints about the overall strategy Benquo was following at the time, I think (hope?) I'm more understanding now about the failure mode pointed at here. I do think I've contributed to that failure mode, i.e. "try to be more diplomatic to preserve group harmony, in a way that comes at expense of clarity."
I still think there are good truthseeking reasons to preserve group harmony. But I think the concrete updates I've made are that we need to (at least) be very clear about when we're doing that, and notice when attempts to smooth things over are destroying information.
In particular, it is pretty orwellian/gaslighty to have someone tell you "You're being too mean. Here's a different thing you could have said with the same truth value that wouldn't have been as mean, see?" and watch in horror as they then describe a sentence that leaves out important information you meant to convey.
In my other review, I mentioned "hmm, I think I still have promises to keep regarding 'what aesthetic updates should I make?'". I think one aesthetic update I am happy to make is that I should have some kind of disgust/horror when someone (including me) claims to be preserving local truth value, or implying that truth value is preserved, when in fact it wasn't.
(This is a specific subset of the overall worldview/aesthetic I think Benquo was trying to convey, and I'm guessing there is still major disagreement in other nearby areas)
I am still mulling this over. I think it might be pointing at something I haven't yet fully grokked.
I would agree with the phrase "we should scale up more carefully, with an uncompromising emphasis on some aspects of quality control". (I think I would have agreed with it at the time, which is part of why the doublecrux was tricky. I eventually realized that Benquo meant a stronger version of this sentence than I meant)
My current (revealed) belief is something like "We don't really have the luxury of stopping all mobilization while we figure out the ideal coordination mechanisms. Meanwhile I think current mobilization efforts are net positive. I also think the process of actually mobilizing is also useful for forcing your ivory tower coordination process to be more connected with the reality of how large scale coordination actually works."
(My understanding is that Benquo-at-the-time thought the current way large scale coordination works is fundamentally doomed and don't have much choice but to start over. That does feel pretty cruxy – if I believed that I'd be doing different things.)
"Can't possibly solve FAI" still sounds like an obviously false marketing claim to me. I wrote a blogpost arguing you should be suspicious when you find yourself saying this.
(By contrast, I do agree with the first half of the sentence, that our current coordination mechanisms are massively inadequate, and am grateful for various gears about what's going on there that I gained during this conversation)
This feels aesthetically cruxy. I think it's a few steps removed from whatever the real disagreement is about.
I think a key piece here is the distinction between "criticism" and "accusations of norm violation." I mention this at the bottom of the post, but I think it warrants a separate top level post that delves into more details.
One thing I noticed at the time and still notice now is that it's not actually obvious to me (from Jessica's written words in the preceding section) that our claims are in different ontologies. I derive that they must be in different ontologies (given observations about how challenging this whole conversation was). But, it is worth noting that Jessica's claims/beliefs seem to make sense in my ontology.
Zack, in the comments:
I was distracted by another piece of this comment, but I agree that having a good answer for this is pretty important.
After writing this post, there was significant disagreement in the comments about this line of mine:
I'm still not entirely sure what happened here, but the failure mode that Jessica/Zvi/Zack were pointing at was "You auto-lose if you incentive people not to understand." That seems true to me, but mostly unrelated to what I was trying to say here, and some of my own response was perhaps overly exasperated with them seeming to change the subject on me.
Zvi eventually said:
I think it's possible that at that point I could have said "Okay. I'm talking about level 2, and the point is you make it much harder to get to level-2 if you're making up new words or using them with nonstandard connotations." But by the time we got to that point of the conversation I was pretty exhausted and still confused about how everything fit together. Today, I'm not 100% sure whether my hypothetical reply was straightforwardly true.
I feel like I want to tie this all up together somehow, but I think I mostly did that in the tl;dr at the top. Thanks for reading I guess. Still interested in delving into individual threads if people are interested.
I'm probably going to write a second review that is more accessible. But, first: I made a couple vague promises here:
Did I do those things?
Re: the first thing... I thought about it for... probably 30 minutes. I think I also applied some artificial layer of cynicism/distrust of powerful people that I trusted, as a hedge.
I invested a bit into trying to re-architect myself such that if I lost trust in anyone powerful/important, I'd have contingency plans.
I'm not sure I did any of that skillfully or usefully. I think it might have caused some problems in making me less trusting in a way that made some discussions harder than they needed to be, but I'm not sure.
I'm not sure whether I still endorse the frame of this particular crux/
Re: the second thing... I haven't done this really, but I did spend a lot of time thinking about how and why to adjust my aesthetics, and later wrote Propagating Facts into Aesthetics. I have a vague post brewing called "Should I Feel More Disgust?", which I think about periodically, but not deeply. I do expect to get around to writing that, and for the writing process to engage with my vague promise here.
I'll have more to say later about this overall conversation, but seemed good to take stock of my commitments.
I was sadly not part of the conversations involved, but this writeup is pretty helpful and I think important.
I changed my mind on a lot of things around the time these conversations happened. I don't know how much this writeup catches the generators of those updates, but I do think it captures more than any other post I know of, and I do think the things I learned from Jessica, Ben and Zack are quite valuable and important.
The end of this sentence appears to be missing.
More generally, I appreciate this post, and I think it's a good distillation - as someone who can't read what it's a distillation of.
I also think that evaluating distillation quality well is easier with access to the conversation/data being distilled.
Absent any examples of conversations becoming public, it looks like distillation is the way things are going. While I don't have any reason to suspect there are one or more conspiracies, given this:
was brought up, I am curious how robust distillations are (intended to be) against such things, as well as how one goes about incentivizing "publishing". For example, I have a model where pre-registered results are better* because they limit certain things like publication bias. I don't have such a model for "conversations", which, while valuable, are a different research paradigm. (I don't have as much of a model for, in general, how to figure out the best thing to do, absent experiments.)
*"Better" in terms of result strength, and not necessarily the best thing (in a utilitarian sense).
btw, full sentence here was supposed to be something like:
The key thing I'd want (and do encourage) from Benquo and Jessicata and others is to flag where the distillation seems to be missing important things or mischaracterizing things. (A key property of a good conversation-distillation is that all parties agree that it represents them well)
That said, in this case, I'm mostly just directly using everyone's words as they originally said them. Distortions might come from my selection process – it so happened that me/Benquo/Jessica wrote comments that seemed like fairly comprehensive takes on our worldviews so hopefully that's not an issue here.
But I could imagine it being an issue if/when I try to summarize the 8-hour-in-person conversation, which didn't leave as much written record. (My plan is to write it up in google doc form and give everyone who participated in the conversation opportunity to comment on it before posting publicly)
"Collusion" was something that Benquo had specifically mentioned as a concern.
(early on, I had sent him an email that was sort of weird, where I was doing a combination of "speaking privately" but also not really speaking any more frankly than I would have in public. I think it made sense at the time for me to do this because I didn't have a clear sense of how much trust there was between us. But I think it made sense for that to be a red-flag for Benquo)
I agree that if you're worried about Benquo/me colluding, there's not a great way to assuage your concerns fully. But I'm hoping the general practice of doing public distillations that aim to be as clear/honest as possible is at least a step in the right direction.
(My first stab at an additional step is to have common practices of signaling meta-trust, such as flagging places where some kind of collusion was at least plausibly suspicious. This is already fairly common in the form of declaring conflicts of interest. Although I have some alternate concerns about how that allocates attention that I'll try to write up later)