All of Zack_M_Davis's Comments + Replies

We don't need to rely on Huemer's gloss; the distaste for map-territory distinctions and reasoning under uncertainty being too subjective can also be seen in the source material. Consider this line from Atlas Shrugged:

"Dagny", he said, looking at the city as it moved past their taxi window, "think of the first man who thought of making a steel girder. He did not say, 'It seems to me', and he did not take orders from those who say, 'In my opinion.'"

(Psychologically, Rand is totally in the right in that people very often do use such language to evade res... (read more)

I would have expected this post to mention Rand's excellent intuitive grasp of extortion-resistant decision theories!

There was a different look in Mr. Thompson's eyes when he drew back, as if cornered, yet looked straight at Galt and said slowly, "Without me, you couldn't get out of this room, right now."

Galt smiled. "True."

"You wouldn't be able to produce anything. You could be left here to starve."

"True."

"Well, don't you see?" The loudness of homey joviality came back into Mr. Thompson's voice as if the hint given and received were now to be safely eva

... (read more)

Thanks. The thing that threw me off is that the ingredients label for the coffee-flavored Postum variant includes "natural coffee flavor". I can't quickly find reliable information about what "natural coffee flavor" means: a blog post from another beverage maker reports that natural coffee flavor "may be extracted from a variety of plants like chicory, garlic, and yes, sometimes coffee beans" but that the author "can't guarantee that the flavor company I buy natural coffee flavor from didn't extract one of the flavor compounds from coffee beans". I'm surpr... (read more)

The mechanisms are complementary: the drug attracts people into acquiring a taste for something that's not naturally tasty—but once acquired, people still have positive associations with the taste. (I've been drinking iced Postum every morning, despite objective reports that Postum doesn't actually taste good.)

The statement not saying much is essential for getting an impressively comprehensive list of signatories: the more you say, the more likely it is that someone whom you want to sign will disagree.

Relatedly, when we made DontDoxScottAlexander.com, we tried not to wade into a bigger fight about the NYT and other news sites, nor to make it an endorsement of Scott and everything he's ever written/done. It just focused on the issue of not deanonymizing bloggers when revealing their identity is a threat to their careers or personal safety and there isn't a strong ethical reason to do so. I know more high-profile people signed it because the wording was conservative in this manner.

I miss coffee. I used to have iced-coffee in the morning a lot, but I've been scared of caffeine ever since my insomnia scare of January 2021. (I've been to psych prison for sleep deprivation madness twice, and I never want that to happen again.) Yes, I know about decaf, but they don't get all of the caffeine out, and if you haven't been using, your tolerance is all gone and you're super-sensitive to even small doses; I feel safer just abstaining altogether.

I was catching up with Seinfeld on Netflix out of '90s nostalgia, and in one episode they mentioned ... (read more)

2Adam Zerner11d
I like this. I think this sort of chatter is both fun and useful because babbling [https://www.lesswrong.com/tag/babble-and-prune] is underrated. What do you miss about coffee? The caffeine? Taste? I've always been confused about the popularity of coffee. I drink it a few times a week simply because I like to get out of my apartment and coffee shops are the place to go. But the taste is nothing special to me, nor is the feeling of being caffeinated. And I get the sense that the same is true of others; I don't think others particularly love the taste or feeling. So what explains the popularity? Is it possible that it's a sort of cached behavior that people just do automatically without really thinking about? Probably not - my money is on some type of addictiveness - but I wonder.
4eukaryote11d
There are a bunch of coffee-tasting substitutes made from roasted grain or other stuff! Coffee beans or anything caffeine-producing don't enter the equation at all (as opposed to decaf coffee which is derived from coffee beans), the roasted plant taste is just similar. Chicory or dandelion roots are pretty well-known plant for this. Inka is another grain brand that's good and easy to make, you do it like instant coffee. I've seen others at large natural/health/hippie food type stores.
4Noosphere8912d
That's what I get for only looking at the local situation, and not looking at all the comments. But thanks for answering my question!

We should distinguish between possible worlds that have low realityfluid due to implausible physics (like FTL travel, which only occurs in simulations, because causal universes don't look like that), and those that have low realityfluid due to implausibly fine-tuned sociology/psychology/&c. (which mostly occur in simulations rather than the basement, because the fine-tuning is most parsimoniously explained in terms of what Powers in the basement are interested in simulating).

This thought occasioned by how I'm really enjoying Hello, Tomorrow! on Apple T... (read more)

2Raemon11d
(strong upvoted mostly to get more object-level discussion higher-in-the-queue)
2Noosphere8912d
I'm confused at what relevance this comment has to the thread here.

I complained at the time about you substituting a word definition without acknowledging it, which I think you're doing again here.

Bloom specifically used the phrase "Platonic ideal Art of Discourse"! When someone talks about Platonic ideals of discourse, I think it's a pretty reasonable reading on my part to infer that they're talking about simple principles of ideal reasoning with wide interpersonal appeal, like the laws of probability theory, or "Clarifying questions aren't attacks", or "A debate in which one side gets unlimited time, but the other si... (read more)

According to my subjective æsthetic whims, it's cute and funny to imagine the protagonist as not remembering both author's names, in accordance with only being a casual formal epistemology fan. (The mentions of casual fandom, family archives, not reading all of chapter 3, &c. make this a short story that happens to be told in the second person, rather than "you" referring to the reader.)

4localdeity22d
I expected there to be some wordplay on "casual"/"causal" somewhere, but I'm not sure if I saw any.  This is obviously a central component of such a post's value proposition.

seems like it is missing something in terms of making predictions about any world

I mean, you're right, but that's not what I was going for with that sentence. Suppose we were talking about a tiny philosophical "world" of opaque variables, rather than the real physical universe in all its richness and complexity. If you're just drawing samples from the original joint distribution, both networks will tell you exactly what you should predict to see. But if we suppose that there are "further facts" about some underlying mechanisms that generate that distrib... (read more)

I mean, it's true as a special case of minimum description length epistemology favoring simpler models. Chapter 18 of the Koller and Friedman book has a section about the Bayesian score for model comparison, which has a positive term involving the mutual information between variables and their parents (rewarding "fit"), and a negative term for the number of parameters (penalizing complexity).

What's less clear to me (Wentworth likely knows more) is how closely that kind of formal model comparison corresponds to my intuitive sense of causality. The first net... (read more)

2johnswentworth17d
Yup, that's right.

My model is a hybrid system [...] I think this gets you the properties that you're saying would be good [about Rule of Law]

This alleged "hybrid system" doesn't get you the benefits of rule of law, because the distinguishing feature of the rule of law is that the law is not an optimizer. As Yudkowsky explains in "Free to Optimize", the function of the legal system is "to provide a predictable environment in which people can optimize their own futures." In a free country (as contrasted to an authoritarian dictatorship), a good citizen is someone who pays ... (read more)

5Ruby25d
This question "optimization" is pretty interesting. I'm not sure I consider "not an optimizer to be the distinguishing feature, but nonetheless. I agree that in this sense, the LW "law/moderation" is an optimizer and will interfere with optimization it disagrees with much more than the law. It might be a matter of degree though. I like this comment and have more to say, but I spilled soy sauce on my keyboard recently and my spacebar is sticky, making it hard to type. I'll say a little more tomorrow when I have an external keyboard again.
4Raemon25d
I agree with this particular statement, but there are two nearby statements that also seem true and important: * Probability theory absolutely informs what sorts of communication styles are going to identity useful truths most efficiently. For example, you should be more likely to make utterances like "this updates my probability that X will happen" (rather than "X will happen" or "X will not happen" in a more boolean true/false paradigm, for example) * Human psychology and cognitive science (as well as the general study of minds-in-general) absolutely inform the specific question of "what sort of politeness norms are useful for conversations optimized for truth-tracking". There might be multiple types of conversations that optimize for different truth-tracking strategies. Debate vs collaborative brainstorming vs doublecrux might accomplish slightly different things and benefit from different norms. Crockers rules might create locally more truth-tracking in some situations but also make an environment less likely to include people subconsciously maneuvering such that they won't have deal with painful stimuli. There is some fact-of-the-matter about what sort of human cultures find out the most interesting and important things most quickly. I argued a bunch with your post at the time and generally don't think it was engaging with the question I'm considering here. I complained at the time about you substituting a word definition without acknowledging it, which I think you're doing again here.
4habryka25d
FWIW, my own take here is indeed that we should try to get some of the benefits of the rule of law (and indeed that one of the central components here is putting limits on the power of the moderators), but that an online forum should aspire to a much lower standard of justice than a country, and should be closer to the standard that we hold companies to (where at least in the U.S. you have things like at-will employment and a broad understanding that it's often the right choice to fire someone even if they didn't do anything legibly bad). I don't feel super confident on this though. 
8habryka25d
I feel like I must be missing something? The post seems to be about something else, and I don't really know how it relates to this. Or maybe you are misunderstanding the metaphor here?  To quote you directly from your post:  This feels to me like it's actually making a very similar point. The OP isn't arguing that there should be a single "rationalist discourse". It's pretty explicitly saying that "different discourse algorithms (the collective analogue of 'cognitive algorithm') leverage the laws of rationality to convert information into optimization in somewhat different ways, depending on the application and the population of interlocutors at hand". The Art of Discourse should indeed be general and we should be careful to distinguish between what is the locally correct application of those rules (i.e. what we do around here given our local constraints) and what the general rules are and how they would apply to different environments. This is what I understood the part about "the moderators try to learn the art of discourse, and then separately the moderators will set rules and guidelines and write explanations based on their best understanding of the art, and how it applies to this specific forum" to be about.

I think I mostly meant "answer" in the sense of "reply" (to my complaint about rate-limiting Achmiz being an outrage, rather than to a narrower question); sorry for the ambiguity.

I have a lot of extremely strong disagreements with this, but they can wait three months.

2habryka1mo
Cool, makes sense. Also happy to chat in-person sometime if you want. 

I think the last three months are a pretty definitive demonstration that talking about "norms" is toxic and we should almost never do it. I'm not interested, at all, in "norms." (The two posts I wrote about them were "defensive" in nature, arguing that one proposed norm was bad as stated, and expressing skepticism about the project of norms lists.)

I'm intested in probability theory, decision theory, psychology, math, and AI. Let's talk about those things, not "norms." If anyone dislikes a comment about probability theory, decision theory, psychology, math,... (read more)

7Celarix1mo
I'm afraid I don't have the time for a full writeup, but the Stack Exchange community went through a similar problem: should the site have a place to discuss the site? Jeff Atwood, cofounder, said [no](https://blog.codinghorror.com/meta-is-murder/) initially, but the community wanted a site-to-discuss-the-site so badly, they considered even a lowly phpBB instance. Atwood eventually [realized he was wrong](https://blog.codinghorror.com/listen-to-your-community-but-dont-let-them-tell-you-what-to-do/) and endorsed the concept of Meta StackExchange.

I also think it makes sense to have a pretty strong bias against talking about what the "norms" of a space are, instead of asking about what thing is true, or what thing is optimal under various game-theoretic considerations. 

That said, there is definitely a real thing that the "norms" of a space are talking about. Different spaces share different assumptions. There is value in coordinating on shared meaning of words and shared meaning of gestures of social punishment and reward. It seems quite important to help people orient around how people in a sp... (read more)

Thanks, to clarify: I don't intend to make a "how dare the moderators moderate Less Wrong" objection. Rather, the objection is, "How dare the moderators permanently restrict the account of Said Achmiz, specifically, who has been here since 2010 and has 13,500 karma." (That's why the grandparent specifies "long-time, well-regarded", "many highly-upvoted contributions", "We were here first", &c.) I'm saying that Said Achmiz, specifically, is someone you very, very obviously want to have free speech as a first-class citizen on your platform, even though y... (read more)

We were here first. This is our garden, too—or it was. Why is the mod team persecuting us? By what right—by what code—by what standard?

I endorse much of Oliver's replies, and I'm mostly burnt out from this convo at the moment so can't do the followthrough here I'd ideally like. But, it seemed important to publicly state some thoughts here before the moment passed:

Yes, the bar for banning or permanently limiting the speech of a longterm member in Said's reference class is very high, and I'd treat it very differently from moderating a troll, crank, or confus... (read more)

Hmm, I am still not fully sure about the question (your original comment said "I think Oli Habryka has the integrity to give me a staight, no-bullshit answer here", which feels like it implies a question that should have a short and clear answer, which I am definitely not providing here), but this does clarify things a bit. 

There are a bunch of different dimensions to unpack here, though I think I want to first say that I am quite grateful for a ton of stuff that Said has done over the years, and have (for example) recently recommended a grant to him ... (read more)

7Ruby1mo
Not to respond to everything you've said, but I question the argument (as I understand it) that because someone is {been around a long-time, well-regarded, many highly-upvoted contributions, lots of karma}, this means they are necessarily someone who at the end of the day you want around / are net positive for the site. Good contributions are relevant. But so are costs. Arguing against the costs seems valid, saying benefits outweigh costs seems valid, but assuming this is what you're saying, I don't think just saying someone has benefits means that obviously obviously you want them as unrestricted citizen. (I think in fact how it's actually gone is that all of those positive factors you list have gone into moderators decisions so far in not outright banning Said over the years, and why Ray preferred to rate limit Said rather than ban him. If Said was all negatives, no positives, he'd have been banned long ago.) Correct me though if there's a deeper argument here that I'm not seeing.

I continue to be disgusted with this arbitrary moderator harrassment of a long-time, well-regarded user, apparently on the pretext that some people don't like his writing style.

Achmiz is not a spammer or a troll, and has made many highly-upvoted contributions. If someone doesn't like Achmiz's comments, they're free to downvote (just as I am free to upvote). If someone doesn't want to receive comments from Achmiz, they're free to use already-existing site functionality to block him from commenting on their own posts. If someone doesn't like his three-year-o... (read more)

I think Oli Habryka has the integrity to give me a staight, no-bullshit answer here.

Sure, but... I think I don't know what question you are asking. I will say some broad things here, but probably best for you to try to operationalize your question more. 

Some quick thoughts: 

  • LessWrong totally has prerequisites. I don't think you necessarily need to be an atheist to participate in LessWrong, but if you straightforwardly believe in the Christian god, and haven't really engaged with the relevant arguments on the site, and you comment on posts that as
... (read more)
8philh1mo
Arnold says he is thinking about maybe proposing that, in future, after he has done the work to justify it and paying attention to how people react to it.

I think it is strawmanning Zack's post still has some useful frames and it's reasonable for it to be fairly upvoted. [...] I think the amount of strawmanning here is just not bad enough

Why do you think it's strawmanning, though? What, specifically, do you think I got wrong? This seems like a question you should be able to answer!

As I've explained, I think that strawmanning accusations should be accompanied by an explanation of how the text that the critic published materially misrepresents the text that the original author published. In a later comment,... (read more)

4[DEACTIVATED] Duncan Sabien1mo
Just noting that "What specifically did it get wrong?" is a perfectly reasonable question to ask, and is one I would have (in most cases) been willing to answer, patiently and at length. That I was unwilling in that specific case is an artifact of the history of Zack being quick to aggressively misunderstand that specific essay, in ways that I considered excessively rude (and which Zack has also publicly retracted). Given that public retraction, I'm considering going back and in fact answering the "what specifically" question, as I normally would have at the time. If I end up not doing so, it will be more because of opportunity costs than anything else. (I do have an answer; it's just a question of whether it's worth taking the time to write it out months later.)
6Raemon1mo
I meant the primary point of my previous comment to be "Duncan's accusation in that thread is below the threshold of 'deserves moderator response' (i.e. Duncan wishes the LessWrong moderators would intervene on things like that on his behalf [edit: reliably and promptly], and I don't plan to do that, because I don't think it's that big a deal. (I edited the previous comment to say "kinda" strawmanning, to clarify the emphasis more) My point here was just explaining to Vladimir why I don't find it alarming that the LW team doesn't prioritize strawmanning the way Duncan wants (I'm still somewhat confused about what Vlad meant with his question though and am honestly not sure what this conversation thread is about)

I'm not sure what other user you're referring to besides Achmiz—it looks like there's supposed to be another word between "about" and "and" in your first sentence, and between "about" and "could" in the last sentence of your second paragraph, but it's not rendering correctly in my browser? Weird.

Anyway, I think the pattern you describe could be generated by a philosophical difference about where the burden of interpretive labor rests. A commenter who thinks that authors have a duty to be clear (and therefore asks clarifying questions, or makes attempted cr... (read more)

Thanks for engaging, I found this comment very… traction-ey? Like we’re getting closer to cruxes. And you’re right that I want to disagree with your ontology.

I think “duty to be clear” skips over the hard part, which is that “being clear” is a transitive verb. It doesn’t make sense to say if a post is clear or not clear, only who one is clear and unclear to. 

To use a trivial example:  Well taught physics 201 is clear if you’ve had the prerequisite physics classes or are a physics savant, but not to laymen. Poorly taught physics 201 is clear to a ... (read more)

I don't think we can make that many moderation calls on users this established that there [sic] this controversial without causing some pretty bad things to happen.

Indeed. I would encourage you to ask yourself whether the number referred to by "that many" is greater than zero.

I did list "actually just encourage people to use the ban tool more" is an option. [...] If you actually want to advocate for that over a Said-specific-rate-limit, I'm open to that (my model of you thinks that's worse).

Well, I'm glad you're telling actual-me this rather than using your model of me. I count the fact your model of me is so egregiously poor (despite our having a number of interactions over the years) as a case study in favor of Said's interaction style (of just asking people things, instead of falsely imagining that you can model them).

Yes... (read more)

4Raemon1mo
I've been busy, so hadn't replied to this yet, but specifically wanted to apologize for the hostile paraphrase (I notice I've done that at least twice now in this thread, I'm trying to better but seems important for me to notice and pay attention to). I think I the corrigible about actually integrating the spirit-of-our-models into his commenting style" line pretty badly, Oliver and Vaniver also both thought it was pretty alarming. The thing I was trying to say I eventually reworded in my subsequent mod announcement as: i.e. this isn't about Said changing this own thought process, but, like, there is a spirit-of-the-law relevant in the mod decision here, and whether I need to worry about specification-gaming. I expect you to still object to that for various reasons, and I think it's reasonable to be pretty suspicious of me for phrasing it the way I did the first time. (I think it does convey something sus about my thought process, but, fwiw I agree it is sus and am reflecting on it)

We already let authors write their own moderation guidelines! It's a blank text box!

Because it's a blank text box, it's not convenient for commenters to read it in detail every time, so I expect almost nobody reads it, these guidelines are not practical to follow.

With two standard options, color-coded or something, it becomes actually practical, so the distinction between blank text box and two standard options is crucial. You might still caveat the standard options with additional blank text boxes, but being easy to classify without actually reading is the important part.

a high level goal of "users who want to have the sorts of conversations that actually depend on a different culture/vibe than Said-and-some-others-explicitly-want are able to do so".

We already have a user-level personal ban feature! (Said doesn't like it, but he can't do anything about it.) Why isn't the solution here just, "Users who don't want to receive comments from Said ban him from their own posts"? How is that not sufficient? Why would you spend more dev time than you need to, in order to achieve your stated goal? This seems like a question you s... (read more)

8philh2mo
Stipulating that votes on this comment are more than negligibly informative on this question... it seems bizarre to count karma rather than agreement votes (currently 51 agreement from 37 votes). But also anyone who downvoted (or disagreed) here is someone who you're counting as not being taken into account, which seems exactly backwards.
2Vaniver2mo
I thought it was a reference to, among other things, this exchange [https://www.lesswrong.com/posts/9DhneE5BRGaCS2Cja/moderation-notes-re-recent-said-duncan-threads?commentId=wpnpjtgQH5gb2H9SF] where Said says one of Duncan's Medium posts was good, and Duncan responds that his decision to not post it on LW was because of Said. If you're observing that Said could just comment on Medium instead, or post it as a linkpost on LW and comment there, I think you're correct. [There are, of course, other things that are not posted publicly, where I think it then becomes true.]
2Raemon2mo
I do want to acknowledge that based on various comments and vote patterns, I agree it seems like a pretty controversial call, and I model is as something like "spending down and or making a bet with a limited resource (maybe two specific resources of "trust in the mods" and "some groups of people's willingness to put up with the site being optimized a way they think is wrong.")  Despite that, I think it is the right call to limit Said significantly in some way, but I don't think we can make that many moderation calls on users this established that there this controversial without causing some pretty bad things to happen.
5Raemon2mo
Some other random notes (probably not maximally cruxy for you but 1. If Said seemed corrigible about actually integrating the spirit-of-our-models into his commenting style (such as proactively avoiding threads that benefit from a more open/curiosity/interpretative mode, without needing to wait for an author or mod to ban him from that post), then I'd be much more happy to just leave that as a high-level request from the mod team rather than an explicit code-based limitation. But we've had tons of conversations with Said asking him to adjust his behavior, and he seems pretty committed to sticking to his current behavior. At best he seems grudgingly willing to avoid some threads if there are clear-cut rules we can spell out, but I don't trust him to actually tell the difference in many edge cases. We've spent a hundred+ person hours over the years thinking about how to limit Said's damage, have a lot of other priorities on our plate. I consider it a priority to resolve this in a way that won't continue to eat up more of our time.  2. I did list "actually just encourage people to use the ban tool more" is an option. (DirectedEvolution didn't even know it was an option until pointed out to him recently). If you actually want to advocate for that over a Said-specific-rate-limit, I'm open to that (my model of you thinks that's worse). (Note, I and I think several other people on the mod team would have banned him from my comment sections if I didn't feel an obligation as a mod/site-admin to have a more open comment section) 3. I will probably build something that let's people Opt Into More Said. I think it's fairly likely the mod team will probably generally do some more heavier handed moderation in the nearish future, and I think a reasonable countermeasure to build, to alleviate some downsides of this, is to also give authors a "let this user comment unfettered on my posts, even though the mod teams have generally restricted them in some way." (I don't expect th
4Raemon2mo
My prediction is that those users are primarily upvoting it for what it's saying about Duncan rather than about Said.

Did we read the same verdict? The verdict says that the end of the ban is conditional on the users in question "credibly commit[ting] to changing their behavior in a fairly significant way", "accept[ing] some kind of tech solution that limits their engagement in some reliable way that doesn't depend on their continued behavior", or "be[ing] banned from commenting on other people's posts".

The first is a restriction on variety of speech. (I don't see what other kind of behavioral change the mods would insist on—or even could insist on, given the textual natu... (read more)

The tech solution I'm currently expecting is rate-limiting. Factoring in the costs of development time and finickiness, I'm leaning towards either "3 comments per post" or "3 comments per post per day". (My ideal world, for Said, is something like "3 comments per post to start, but, if nothing controversial happens and he's not ruining the vibe, he gets to comment more without limit." But that's fairly difficult to operationalize and a lot of dev-time for a custom-feature limiting one or two particular-users).

I do have a high level goal of "users who want ... (read more)

Don't apologize; please either take your time, or feel free to just not reply at all; I am also very time-poor at the moment.

I condemn the restrictions on Said Achmiz's speech in the strongest possible terms. I will likely have more to say soon, but I think the outcome will be better if I take some time to choose my words carefully.

6the gears to ascension2mo
his speech is not being restricted in variety, it's being ratelimited. the difference there is enormous.

Huh, I happened to glance at the moderation page, and the ban was still there; I guess I must have forgotten to click "Submit" when I tried to remove it the other month? It should be fixed now, ChristianKI.

Thanks for writing this!! There's a number of places where I don't think you've correctly understood my position, but I really appreciate the engagement with the text I published: if you didn't get what I "really meant", I'm happy to do more work to try to clarify.

TEACH, so that B ends up believing X if X is right and Y if Y is right.
CONVINCE, so that B ends up believing X.
EXPOUND, so that the audience ends up believing X.

I'm unhappy with the absence of an audience-focused analogue of TEACH. In the following, I'll use TEACH to refer to making someone... (read more)

4gjm2mo
(Content-free reply just to note that I have noticed this and do intend to reply to it properly, when unlike now I have a bit of time to give it the attention it deserves. Apologies for slowness.)

(Considering the general problem of how forum moderation should work, rather than my specific guilt or innocence in the dispute at hand) I think positing non-truth-tracking motivations (which can be more general than "malice or antipathy") makes sense, and that there is a real problem here: namely, that what I called "the culture of unilateral criticism and many-to-many discourse" in the great-grandparent grants a structural advantage to people who have more time to burn arguing on the internet, analogously to how adversarial court systems grant a structur... (read more)

I think this is the consensus view around LW [that AI can't help with alignment research] but haven't seen anything I found persuasive as a defense

I thought it was an argument from inaccessible information: we know how to specify rewards for "Win a Go game", "Predict the next token", or "A human approved this output"; we don't know how to specify rewards for "Actually good alignment research".

I'm imagining that the counterargument might be that earlier weak alignment techniques (and the generation/verification gap) might be enough to bootstrap later, more automated alignment techniques?

Yeah, I don't find "we can't verify good alignment research" nearly as persuasive as other people around here:

  • Verification does seem way easier, even for alignment research. This is probably the most interesting and perplexing disagreement.
  • Even if verification isn't easier than generation, you can still just do what a human would do faster. That seems like a big deal, and quite a lot of what early AI systems will be doing. Focusing only on generation vs verification seems like it's radically understating the case.
  • AI systems can also help with verification,
... (read more)
3Ninety-Three2mo
Can't these be the same thing? If we have humans who can identify actually good alignment research, we can sit them down in the RLHF booth and have the AI try to figure out how to make them happy. Now obviously a sufficiently clever AI will infer the existence of the RLHF booth and start hacking the human in order to escape its box, which would be bad for alignment research. But it's looking increasingly plausible that e.g. GPT-6 will be smart enough to provide actually good mathematical research without being smart enough to take over the world (that doesn't happen until GPT-8). So why not alignment research? To break the comparison I think you need to posit either that alignment research is way harder than math research (as Eli understands Eliezer does) such that anything smart enough to do it is also smart enough to hack a human, or I suppose it could be the case that we don't have humans who can identify actually good alignment research.

My current guess is that we will continue to see small 10-person teams push the cutting-edge forward in AI

Possible counterevidence (10 months later)?—the GPT-4 contributors list lists almost 300 names.[1]


  1. Methodology: I copied text from the contributors page (down to just before it says "We also acknowledge and thank every OpenAI team member"), used some quick Emacs keyboard macros to munge out the section headers and non-name text (like "[topic] lead"), deduplicated and counted in Python (and subtracted one for a munging error I spotted after the fa

... (read more)
2habryka2mo
Yep, that is definitely counterevidence! Though my model did definitely predict that we would also continue seeing huge teams make contributions, but of course each marginal major contribution is still evidence. I have more broadly updated against this hypothesis over the past year or so, though I still think there will be lots of small groups of people quite close to the cutting edge (like less than 12 months behind).  Currently the multiple on stuff like better coding tools and setting up development to be AI-guided just barely entered the stage where it feels plausible that a well-set-up team could just completely destroy large incumbents. We'll see how it develops in the next year or so.

I don't understand the motivation for defining "okay" as 20% max value. The cosmic endowment, and the space of things that could be done with it, is very large compared to anything we can imagine. If we're going to be talking about a subjective "okay" standard, what makes 20% okay, but 0.00002% not-okay?

I would expect 0.00002% (e.g., in scenarios where AI "'pension[s] us off,' giv[ing] us [a percentage] in exchange for being parents and tak[ing] the rest of the galaxy for verself", as mentioned in "Creating Friendly AI" (2001)) to subjectively feel great. ... (read more)

4Eliezer Yudkowsky2mo
Arbitrary and personal.  Given how bad things presently look, over 20% is about the level where I'm like "Yeah okay I will grab for that" and much under 20% is where I'm like "Not okay keep looking."
4niplav2mo
I think this depends on whether one takes an egoistic or even person-affecting perspective ("how will current humans feel about this when this happens?") or a welfare-maximising consequentialist perspective ("how does this look on the view from nowhere"): If one assumes welfare-maximised utility to be linear or near-linear in the number of galaxies controlled, the 0.00002% outcome is far far worse than the 20% outcome, even though I personally would still be happy with the former.
7Viliam2mo
Scale sensitivity. From our perspective today, 20% max value and 0.00002% max value both emotionally mean "infinity", so they are like the same thing. When we get to the 0.00002% max value, the difference between "all that we can ever have" and "we could have had a million times more" will feel differently. (Intuition: How would you feel if you found out that your life could have been literally million times better, but someone decided for you that both options are good enough so it makes no sense to fret about the difference?)

I agree that it often makes sense to write "This seems X to me" rather than "This is X" to indicate uncertainty or that the people I'm talking to are likely to disagree.

you even think that me saying "treat these statements differently" is me generically trying to forbid you from saying one of them.

Thanks for clarifying that you're not generically trying to forbid me from saying one of them. I appreciate it.

When you shot from the hip with your "this is insane" comment at me, you were [...] culpably negligent

Yes, I again agree that that was a bad com... (read more)

That's not what I meant. I affirm Vaniver's interpretation ("Zack's worry is that [...] establishing the rule with user-chosen values [...] will mean there's nothing stopping someone from deciding that criticism has to be above 8 and below 6").

(In my culture, it's important that I say "That's not what I meant" rather than "That's a strawman", because the former is agnostic about who is "at fault". In my culture, there's a much stronger duty on writers to write clearly than there is on readers to maintain uncertainty about the author's intent; if I'm unhapp... (read more)

2[DEACTIVATED] Duncan Sabien3mo
Separately: I'm having a real hard time finding a coherently principled position that says "that's a strawman" is off-limits because it's too accusatory and reads too much into the mind of the author, but is fine with "this is insane."
3[DEACTIVATED] Duncan Sabien3mo
This is ignoring the fact that you're highly skilled at deluding and confusing your audience into thinking that what the original author wrote was X, when they actually wrote a much less stupid or much less bad Y. (e.g. repeatedly asserting that Y is tantamount to X and underplaying or outright ignoring the ways in which Y is not X; if you vehemently shout "Carthage delenda est" enough times people do indeed start becoming more and more afraid of Carthage regardless of whether or nor this is justified.) You basically extort effort from people, with your long-winded bad takes, leaving the author with a choice between: a) allowing your demagoguery to take over everyone's perceptions of their point, now that you've dragged it toward a nearby (usually terrible) attractor, such that even though it said Y everybody's going to subsequently view it through the filter of your X-interpretation, or b) effortfully rebutting every little bit of your flood of usually-motivated-by-antipathy words. Eventually, this becomes exhausting enough that the correct move is to kick Mallory out of the garden, where they do not belong and are making everything worse far disproportionate to their contribution. Mallory can go write their rebuttals in any of the other ten thousand places on the internet that aren't specifically trying to collaborate on clear thinking, clear communication, and truth-seeking. The garden of LessWrong is not particularly well-kept, though.

There's absolutely nothing that Zack is currently accomplishing that couldn't have been accomplished if he'd first written a comment to Rob saying "did you mean X?" [...] Acting like a refusal to employ that bare minimum of social grace is a virtue is bullshit

It's not that I think refusing to employ the bare minimum of social grace is a virtue. It's that I wasn't aware—in fact, am still not aware—that confirming interpretations with the original author before publishing a critical essay constitutes the bare minimum of social grace. The idea that it's so... (read more)

how he confidently dismisses ANNs

I don't think this is a fair reading of Yudkowsky. He was dismissing people who were impressed by the analogy between ANNs and the brain. I'm pretty sure it wasn't supposed to be a positive claim that ANNs wouldn't work. Rather, it's that one couldn't justifiably believe that they'd work just from the brain analogy, and that if they did work, that would be bad news for what he then called Friendliness (because he was hoping to discover and wield a "clean" theory of intelligence, as contrasted to evolution or gradient des... (read more)

One of Yudkowsky's claims in the post you link is:

It's hard to build a flying machine if the only thing you understand about flight is that somehow birds magically fly.  What you need is a concept of aerodynamic lift, so that you can see how something can fly even if it isn't exactly like a bird.

This is a claim that lack of the correct mechanistic theory is a formidable barrier for capabilities, not just alignment, and it inaccurately underestimates the amount of empirical understandings available on which to base an empirical approach.

It's true that ... (read more)

I asked GPT-4 to write a "highly critical comment" in response to this post. It replied:

Zack M. Davis' blog post, "Maybe Lying Can't Exist?!", is a prime example of overcomplicating a simple concept with unnecessarily convoluted language and philosophical ramblings. Davis spends an inordinate amount of time discussing sender-receiver games, formal models, and the evolution of communication, only to conclude that deception is "ontologically parasitic." This statement is not only pretentious but also blatantly obvious. Of course, lying depends on the exist

... (read more)

I asked GPT-4 to write a "highly critical comment" in response to this post. It replied:

I find this blog post both convoluted and problematic in its attempt to explore the concept of deception as cooperation. The author, Zack M. Davis, delves into the intricacies of information theory and signaling games, yet the essence of the argument is lost in a sea of technical jargon and excessive parenthetical digressions.

The premise that deception can be understood as a cooperative effort between sender and receiver is intriguing, but the author fails to clearly

... (read more)

The problem isn't the first speaker wanting to persuade third parties; the problem is the second person "moving the goalposts" there.

2the gears to ascension3mo
cool. then we're on the same page.

I mean, I agree that I have soapbox-like tendencies (I often have an agenda, and my contributions to our discourse often reflect my agenda), but I thought I've been meeting the commonsense relevance standard—being an Alighieri scholar who only brings it up when there happens to be a legitimate Alighieri angle on the topic, and not just randomly derailing other people's discussions.

I could be persuaded that I've been getting this wrong, but, again, I'm going to need more specific examples (of how some particular post I made misses the relevance standard) be... (read more)

Note that in the original footnote in my post, "on the same side" is a hyperlink going to a comment by Val

Thanks for pointing this out. (I read Val's comment while writing my post, but unfortunately neglected to add the hyperlink when pasting the text of the footnote into my draft.) I have now edited the link into my post.

the goal isn't to trick people into thinking your disagreements are small, it's to make typical disagreements feel less like battles between warring armies

I think the fact that disagreements often feel like battles between warring ... (read more)

5Said Achmiz3mo
Very strongly seconding this. (I have noticed this pattern on Less Wrong in the past, in fact, and more than once. It is no idle worry, but a very real thing that already happens.)

I'm definitely doing #2. I can see your case that the paragraph starting with "But there's a reason for that" is doing #4. But ... I'm not convinced that this kind of "frame manipulation" is particularly bad?

If someone is unhappy with the post's attempt to "grab the frame" (by acting as if my conception of rationalist is the correct one), I'm happy to explain why I did that in the comments. Do I have to disclaim it in the post? That just seems like it would be worse writing.

2Raemon3mo
I think in isolation it wouldn't be particularly bad, no. I think it'd rise to the level of 'definitely better to avoid' (given [probably?] shared assumptions about truthseeking and honesty), but, it's within the set of mistakes I think are fairly normal to make. I feel like it is part of a broader pattern that (I think probably) adds up to something noticeably bad, but it'd take me awhile of active effort to find all the things that felt off to me and figure out if I endorse criticizing it as a whole.  (So, like, for now I'm not trying to make a strong argument that there's a particular thing that's wrong, but, like, I think you have enough self-knowledge to notice 'yeah something is off in a sticky way here' and figure it out yourself. ((But, as previously stated, I don't have a strong belief that this makes sense to be your priority atm)))

I think it's significant that the "blantant lying" example was an in-person conversation, rather than a published blog post. I think I'm much more prone to exaggerate in real-time conversations (especially emotionally-heated conversations) than I am in published writing that I have time to edit.

2Raemon3mo
Yeah I do agree with that.

Thanks for chiming in; this is encouraging to hear. I'm imagining the pretty emotional debate you're thinking of is the one on "My Dating Plan ala Geoffrey Miller" in July 2020? Interestingly, I think my behavior there was much ruder than anything Duncan's objected to from me, so I think your reaction is evidence that there's a lot of interpersonal variation in how much "softening" different people think is desirable or necessary.

4Rafael Harth3mo
It was that general debate about content moderation. Pretty sure it wasn't all in the comments of that post (though that may have been the start); I don't remember the details. It's also possible that my recollection includes back and forth you had with [other people who defended my general position].

fairly frequently does a "Write a reply to a person's post as if it's a rebuttal to the post, which mostly goes off and talks about an unrelated problem/frame that Zack cares about

Would it help if we distinguished between a "reply" (in which a commentator explains the thoughts that they had in reaction to a post, often critical or otherwise negative thoughts) and a "rebuttal" (in which the commentator directly contradicts the original post, such that the original post and the rebuttal can't "both be right")? I often write replies that are not rebuttals, but I think this is fine.

Everyone sometimes issues replies that are not rebuttals, but there is an expectation that replies will meet some threshold of relevance. Injecting "your comment reminds me of the medieval poet Dante Alighieri" into a random conversation would generally be considered off-topic, even if the speaker genuinely was reminded of him. Other participants in the conversation might suspect this speaker of being obsessed with Alighieri, and they might worry that he was trying to subvert the conversation by changing it to a topic no one but him was interested in. They... (read more)

5philh3mo
We might distinguish between * Reaction: I read your post and these are the thoughts it generated in me * Reply: ...and these thoughts seem relevant to what the post was talking about * Rebuttal: ...and they contradict what you said. I've sometimes received comments where I'd have found it helpful to know which of these was intended. (Of course a single comment can be all of these in different places. Also a reaction should still not misrepresent the original post.)

I do think Zack should acknowledge his judgment here has not been good and the result is not living up to the standards that flow fairly naturally from the sequences

Sorry, I'm going to need more specific examples of me allegedly "lying by exaggeration/overconfidence" before I acknowledge such a thing. I'm eager to admit my mistakes, when I've been persuaded that I've made a mistake. If we're talking specifically about my 4 December 2021 comment that started with "This is insane", I agree that it was a very bad comment that I regret very much. If we're t... (read more)

2Raemon3mo
The previous example I had onhand was in a private conversation where you described someone as "blatantly lying" [https://www.lesswrong.com/posts/QB9eXzzQWBhq9YuB8/rationalizing-and-sitting-bolt-upright-in-alarm] (you're anonymized in the linked post), and we argued a bit and (I recall) you eventually agreeing that 'blatantly lying' was not an accurate characterization of 'not-particularly-blatantly-rationalizing' (even if there was something really important about that rationalizing that people should notice).  I think I recall you using pretty similar phrasing a couple weeks later, which seemed like there was something sticky about your process that generated the objection in the first place. I don't remember this second part very clearly though. (I agree this is probably still not enough examples for you to update strongly at the moment if you're going entirely off my stated examples, and they don't trigger an 'oh yeah' feeling that prompts you to notice more examples on your own)

Thanks for telling me (strong-upvoted). That makes sense as a reason for you to be furious with me. As the grandparent says, I owe you a bigger apology than my previous apology, which appears below.

I hereby apologize for my blog comment of 4 December 2021, on an earlier revision of "Basics of Rationalist Discourse". In addition to the reasons that it was a bad comment in context that I listed in my previous apology, it was also a bad comment for failing to acknowledge that the text of the post contained a paragraph addressing the comment's main objection, ... (read more)

1[DEACTIVATED] Duncan Sabien3mo
I don't know what to say in response.  Empirically, this apology did zero to reduce the extremely strong deterrent of "God dammit, if I try to post something on LessWrong, one way or another Zack and Said are going to find a way to make that experience miserable and net negative," which, in combination with the energy that this thread burned up, has indeed resulted in me not posting, where counterfactually I would've posted three essays. (I'm only here now because you're bumping the threads.) (Like, there are three specific, known essays that I have not posted, because of my expectations coming off of this thread and the chilling effect of "I'll have to deal with Zack and Said's responses.") (Also the reason my Basics post ended up being so long-winded was because, after my experience with the partial draft going up by mistake, I was trying quite hard to leave a future Zack no ways to make me regret publishing/no exposed surfaces upon which I could be attacked. I ended up putting in about 20 extra hours because of my past experience with you, which clearly did not end up paying off; I underestimated just how motivated you would be to adversarially interpret and twist things around.) I tried blocking, and that wasn't enough to get you to leave me alone.   Sounds like you win.

In your view, is there an important difference between frame control, and the author having a particular frame that they use in a particular essay?

I'm proud of this blog post. I think it's a good blog post that clearly explains my ideas in a way that's engaging to read. If someone wants to talk about my motivations for writing this post and why I chose the analogies I did, I'm happy to have that discussion in the comment section, like we're doing now.

But it seems to me that a blog post that talked about my objections to Bensinger's Goodwill element, withou... (read more)

6Raemon3mo
Yep! Distinctions in Frame Control I'm still working through this, which is part of why the post isn't written up yet. I'm also not sure if I'm actually going to use the phrase 'frame control' because it might just be too easy to weaponize in a way that makes it more unhelpful than helpful. (i.e. the concept I have in mind here is something it makes to have the norm of 'notice when you do it, and be careful with it', not 'don't do it ever') But, here are my current thoughts on how I currently carve up the space here: 1. having a frame, at all [i.e. set of ways to conceptualize a problem or solution-space or what questions to ask [https://www.lesswrong.com/posts/GkxxfdCukyGuyKXQQ/shared-frames-are-capital-investments-in-coordination]] 2. having a strongly held/presented frame, such as by speaking confidently/authoritatively (which many people who don't hold their own frames very strongly sometimes find disorienting) 3. having an insistently held frame (where when someone tries to say/imply 'hey, my frame is X' you're like 'no, the frame is Y' and if they're like 'no, it's X') 4. frame manipulation (where you change someone else's frame in a subtle way without them noticing, i.e. presenting a set of assumptions in a way that aren't natural to question, or equivocating on definitions of words in ways that change what sort of questions to think about without people noticing you've done so) #2, #3 and #4 can be mixed and matched.  The places where people tend to use the word 'frame control' most often refer to #3 and #4, frame-manipulation and frame-insistence. I'm a bit confused about how to think about 'strong frames' – I think there's nothing inherently wrong with them, but if Alice is 'weaker willed' than Bob, she may end up adopting his frame in ways that subtly hurt her. This isn't that different from, like, some people being physically bigger and more likely to accidentally hurt a smaller person. I wouldn't want so

Thanks for your thoughts. (Strong-upvoted.)

the essay where I finally gave up [...] This was not an aberrant event. [...] one more straw on the camel's back

Yes, that December 2021 incident was over the line. I'm sorry. In retrospect, I wish I hadn't done that—but if I had taken a few more moments to think, I would have been able to see it without retrospect. That was really stupid of me, and it made things worse for both of us.

You're also correct to notice that the bad behavior that I don't endorse on reflection can be seen as a more extreme version of ... (read more)

(I think I would have noticed if that paragraph had been in the December 2021 version, but if you say it was, I'll take your word for it—which would imply that my December 2021 behavior was even worse than I've already admitted; I owe you a much bigger apology in that case.)

It was. That's why I was (and remain) so furious with you (Edit: and also am by default highly mistrustful of your summaries of others' positions).

4[DEACTIVATED] Duncan Sabien3mo
Literally only you and Said have these twin problems (among long-lasting prolific LW participants). This is you saying "but but but if you claim ZERO is too little and a BILLION is too much, then how is there any room for legitimate criticism to exist?" It's somewhere between zero and a billion, like every other person on LessWrong manages to do just fine all the time. Late edit: we have a term for this thing; it's called "fallacy of the grey."

I'll agree that the "physicist motors" analogy in particular rests on the "one who studies" definition, although I think a lot of the points I make in this essay don't particularly depend on the analogy and could easily be written up separately.

I guess you could view the "foreign policy" motivating this post as being driven by two motives: first, I'd rather not waste precious time (in the year 2023, when a lot of us have more important things to do) fighting over the "rationalist" brand name; if someone else who also cares about thinking well, thinks that ... (read more)

so when I see the brand name being used to market a particular set of discourse norms without a clear explanation of how these norms are derived from the law, that bothers me enough to quickly write an essay or two about it

Seems great to me! I share your intuition that Goodwill seems a bit odd to include. I think it's right to push back on proposed norms like these and talk about how justified they are, and I hope my list can be the start of a conversation like that rather than the end.

I do have an intuition that Goodwill, or something similar to Goodwill,... (read more)

2Raemon3mo
To be clear, I endorse you doing that, but I would like you to do it without sleight-of-hand-frame-control. (I do agree you could probably have written the second half of the post without relying on the first half's structure, but, that's not what you did) I have on my todo list to write up a post that's like "hey guys here is an explanation of Frame Control/Manipulation that is more rigorous and more neutrally worded than Aella's post about it, and here's why I think we should have a habit of noticing it.". And then, maybe afterwards, a post going: "Hey, I think 'notice your own frame control, and be a bit careful about it' should graduate to a thing you are obligated to learn, as a good LW citizen. What do people think of that?", and get some sense of how The People think about it. And, depending on how that goes, maybe it becomes an actual LW norm. I haven't done that and doesn't seem fair to rush it or assume how that'll play out, so, currently this is more of a suggestion that I think you should probably agree to on your own terms rather than something I'm enforcing as a moderator, but, flagging that that's a longer term agenda of mine.

Then it would appear that we're in a conflict over a shared resource: I want to post "Zack-type" things on Less Wrong—including long-form criticism of other posts on Less Wrong—and (assuming I'm reading your comment correctly; feel free to correct me if not) it seems like you want me to not do that.

It looks like we can't both get what we want at the same time. That's a very unfortunate situation for us to be in. If you have any suggestions for Pareto improvements, I'm listening. I'm not sure what else I can say.

Load More