Last night, I posted a question seeking advice on publishing an AI research project, using a program that generates causal DAGs in an LLM-readable format along with test questions designed to evaluate the LLM's ability to do causal inference. As a hobbyist, I am not too worried about whether it's novel or interesting research, although it would be good to know. My main concern is that it might be infohazardous research, and my goal was to get some advice on whether or not others think it might be.

Unfortunately, it appears that the response was to downvote it without commenting or PMing me to explain why. If I posted on other topics and got downvoted, I normally wouldn't worry about it - it might mean others found my post wrong, annoying, offensive, or contrary to ingroup common sense, and I would then have to decide what to do next in light of those possibilities.

But in this specific case, it means that the possibility that my research project is seen as infohazardous and is being downvoted for that reason is mixed in with all those other possibilities. If it's being downvoted on infohazard grounds, it deprives me of an opportunity to learn why others think as they do. If it's being downvoted on other grounds, then I'm left in a state where I have to make a judgment call on whether to publish a topic of interest to me despite a few downvotes, or whether to second-guess and silence myself.

My personal belief is that, unless you have good reason to think a topic is infohazardous, you should go ahead and publish - there are far too many examples of politically motivated or simply reactive people silencing things they disagree with on trumped-up charges of infohazards, even if not expressed using that specific term, to silence oneself without good reason. Even if LessWrong would downvote the final product as well, there are plenty of other outlets. So me posting this question here on LessWrong is me giving this community a privileged opportunity to weigh in on my publishing behavior. When I receive downvotes and no comments or PMs in exchange for doing that, it makes me feel like in the future, I should just publish what I want in an outlet I expect to be receptive to it.

So I am going to give some advice, and announce my publishing policy on infohazards going forward.

  1. If AI safety researchers feel they have some skill in evaluating potential research for infohazardous content and how to manage it, they should publicly offer to evaluate research proposals and make it easy to submit and get a constructive, explicit evaluation via a private message.
  2. If nobody is willing to perform this service, then a second-best alternative is for people to publish and debate self-evaluation guidelines.
  3. If nobody is willing to publish such guidelines, then when somebody like myself asks for advice, that advice should be delivered explicitly in a comment or PM, and the karma reaction should be neutral-to-positive unless there's a clear reason to downvote it - in which case that reason should be communicated explicitly to the poster. Otherwise, this creates a disincentive to ask for advice about potential infohazards.
  4. In general, if you are a believer in a principle of conformity to prevent the unilateralist's curse and in the need to take infohazards seriously, and if you believe you are capable of judging infohazardous content, then you have an ethical obligation to find a transparent and informative way to help the authors of the content you are judging improve their own models about infohazards, and to minimize the level of silencing that you impose in your quest to reduce infohazards. Those who cannot or will not accept that obligation have no business stepping into the role of evaluating or enforcing policy against infohazards.

First, I am once again requesting explicit feedback in comment or PM about the research project I linked above. If I don't receive any infohazard-related feedback, I will find an outlet to publish it when I'm done. If I do receive it, I will consider it thoughtfully and consider whether or not it should modify or cancel my decision to publish. If I receive net downvotes on this post here and insufficient comments explaining their reasoning, or if that reasoning and the downvotes seem excessive, harsh, and nonconstructive, then I'll stop asking LessWrong for feedback about infohazards in the future and make my own independent decisions about whether and where to publish.

Note that although I'm quite defensive about potential harsh or negative reactions to this post, and although this post contains an ultimatum, I'm very receptive to a constructive and friendly debate on any of these points. I don't feel any urgency about publishing my project, and I'd much rather have my final decision be the result of a debate carried through to consensus, or at least crux-identification, than to unilaterally take action.

New Comment
9 comments, sorted by Click to highlight new comments since: Today at 9:17 PM
[-]Dagon10mo42

The question was posted a few hours ago on a weekend, and is currently sitting at 10 karma with 5 votes.  I suspect you have to give it some time to see if anyone with the requisite expertise takes the time to actually comment (or send a DM).

Basic advice for this site: don't worry too much about votes - it's a useful feedback for a VERY coarse-grained "please do more like this" or "please do less like this", but there's almost no information content about exactly what part of the message is generating the reaction.  

It's not really an option not to worry about votes, as they determine how many people see the post.

[-]Dagon10mo20

It's not a very strong relationship between post karma (vote total) and number of people who are shown the title, even less so the number who click on it, let alone really engage with it.  Certainly there is some causality, especially the difference between negative, slightly postive (<10), medium (11-30), and quite popular (31+).  But it's not particularly linear, and it's probably overwhelmed by how accessible it is, how useful and interesting people find the topic, and whether the mods frontpage it.

Note that I said "don't worry too much about", not "it's completely meaningless".  It IS an important feedback channel, especially if taken as fairly coarse indicators of how a few readers react (most posts get fewer than 20 votes during their first week).  Learn a bit from it, but unless it's literally negative, don't worry too much about it.  

For topics where you're seeking feedback and interaction, pay a lot more attention to the comments than to the votes.  These tend to be sparse as well, for most topics, but they're much more direct, as they're the reason you're posting in the first place (feedback to make your beliefs and models less wrong).

I agree with you about the roughness and changeability of karma. My main issue with it - particularly with downvotes and on this specific topic - is that it is too effective at silencing, while being too frustration, for too little informational gain. Even that wouldn’t be too big a deal because it does offer benefits and is an attractively simple solution for drawing on crowd wisdom.

Where the difficulty lies, I think, is when requesting advice about infohazards is met with negative vibes - frowns and sternness in real life, or downvotes online - without useful explicit feedback about the question at hand. That does a disservice to both the person asking for advice and to the people who think infohazards are worth taking seriously. I think that advocating for a change of vibes is better than putting up with inappropriate vibes, which is partly why I chose not to just put up with a few random downvotes and instead spoke up about it.

[-]Dagon10mo40

I don't think I downvoted the advice post, but I do recall that I skimmed it and decided I don't have much to say about it.  I'm probably guilty of making that face when someone brings up the word "infohazard" in real life - I don't think it's a very useful generalization for most things.   I don't know how representative I am, and it seemed a good-faith discussion, so I left it alone.

IMO, "infohazard" is the kind of term that aggregates a number of distinct things in such a way as to make the speaker sound erudite, rather than to illuminate any aspect of the topic.  It's also almost always about fear of others' freedom or abilities, not about the information itself.

[-]Max H10mo40

Object-level feedback on the linked project / idea: it looks neat, and might make for an interesting eval. I'm not sure that it would demonstrate anything really fundamental (to me, at least) about LLM capabilities the way you claim, but I'd be interested in reading more in any case.

Aside: whether or not it advances capabilities, I think "infohazardous" is slightly the wrong term. Publishing such work might be commons-burning or exfohazardous, in the same way that e.g.  publishing part of a recipe for making a bomb is. But I think "infohazard" should be reserved for knowledge that is directly harmful to a specific person, e.g. a spoiler, a true lesson that leads them to a valley of bad rationality, or something emotionally / psychologically damaging.

On whether your idea is net-positive to publish or not: I agree with Nate's take here about publishing interpretability research, and I think this kind of project falls into the same category. Ideally, you would be able to circulate it among a large but closed group of researchers and peers who understand the risks of where such research might eventually lead. 

Absent the existence of such a closed community though, I'm not sure what to do. Publishing on LW seems plausibly net-positive, compared to the alternative of not doing the work at all, or not having it be read by anyone. I think your own proposed policy is reasonable. I'd also suggest adding a disclaimer to any work you think might plausibly give capabilities researchers ideas, to make it clear where you stand. Something like "Disclaimer: I'm publishing this because I think it is net-positive to do so and have no better alternatives. Please don't use it to advance capabilities, which I expect to contribute to the destruction of everything I know and love." (Of course, use your own words and only include such a disclaimer if you actually believe it.) I think disclaimers and public statements of that kind attached to popular / general-interest research would help build common knowledge and make it easier to get people on board with closure in the future.

Thank you for your thoughts, I think you are supplying valuable nuance. In private conversation I do see a general path by which this offers a strategy for capabilities enhancement, but I also think it's sufficiently low-hanging fruit that I'd be surprised if a complete hobbyist like myself discovered a way to contribute much of anything to AI capabilities research. Then again, I guess interfacing between GPT-4-quality LLMs and traditional software is a new enough tool to explore that maybe there is enough low-hanging fruit for even a hobbyist to pluck. I agree with you that it would be ideal if there was a closed but constructive community to interface with on these issues, and I'm such a complete hobbyist that I wouldn't know about such a group even if it existed, which is why I asked. I'll give it some more thought.

[-]Max H10mo60

Ah, I wasn't really intending to make a strong claim about whether your specific idea is likely to be useful in pushing the capabilities frontier or not, just commenting generally about this general class of research (which again I think is plausibly net positive to do and publish on LW).

I do think you're possibly selling yourself short / being overmodest, though. Inexperienced researchers often start out with the same bad ideas (in alignment or other fields), and I've seen others claim that any hobbyist or junior researcher shouldn't worry about exfohazards because they're not experienced enough to have original ideas yet. This is maybe true / good advice for some overconfident newbies, but isn't true in general for everyone.

Also, this is not just true of alignment, but applies to academic research more generally: if a first year grad student has some grand new Theory of Everything that they think constitutes a paradigm shift or contradicts all previous work, they're probably full of it and need to read some more whitepapers or textbooks. If they just have some relatively narrow technical idea that combines two things from different domains (e.g. LLMs and DAGs) in a valid way, it might constitute a genuinely novel, perhaps even groundbreaking insight. Or they might have just overlooked some prior work, or the idea might not pan out, or it might just not be very groundbreaking after all, etc. But in general, first year grad students (or hobbyists) are capable of having novel insights, especially in relatively young and fast-moving fields like AI and alignment.

I have to agree that commentless downvoting is not a good way to combat infohazards. I'd probably take it a step further and argue that it's not a good way to combat anything, which is why it's not a good way to combat infohazards (and if you disagree that infohazards are ultimately as bad as they are called, then it would probably mean it's a bad thing to try and combat them). 

Its commentless nature means it violates "norm one" (and violates it much more as a super-downvote).  

It means something different than "push stuff that's not that, up", while also being an alternative to doing that.  

I think a complete explanation of why it's not a very good idea doesn't exist yet though, and is still needed.

However, I think there's another thing to consider: Imagine if up-votes and down-votes were all accurately placed. Would they bother you as much? They might not bother you at all if they seemed accurate to you, and therefore if they do bother you, that suggests that the real problem is that they aren't even accurate. 

My feeling is that commentless downvotes are likely a contributing mechanism to the process that leads them to be placed inaccurately, but it is possible that something else is causing them to do that.