Imagine a place where speaking certain thoughts out loud is widely considered to be harmful. Not because they are lies, spam, threats, or insults, most people in most places can agree those are harmful. No, the thing that makes this place unusual is that those who dwell there believe the following: That even if an idea is true, and unlikely to annoy, coerce, or offend the listener, some ideas should still be suppressed, lest they spread across the world and cause great damage. Ideas of this type are deemed "Risky", and people carefully avoid communicating them.

The strange thing is that there is no authoritarian government banning the speaking of Risky ideas. Rather, the people there just seem to have decided that this is what they want for themselves. Most people reason about the world in such a way that it's simply obvious that some ideas are Risky and if we want to have nice things, we ought to avoid saying anything that could be Risky. Not everyone agrees with this, there are a few outliers who don't care too much about Riskiness. But it's common to see such people rebuked by their peers for saying Risky things, or even for suggesting that they might in the future say Risky things. Occasionally some oblivious researcher will propose a project, only to be warned that it's ill-advised to conduct a project of that nature because it could turn up Risky results. When it comes to Risky ideas, self-censorship is the rule, even if it's more of a social norm than a legal rule.

Of course because of this self-censorship, people in this place find it much harder to reason as a group. You can still think freely inside your own head, but if you need to know something that's inside someone else's head, you're likely to have a difficult time. Whenever speaking, they'll have to carefully avoid coming too close to Risky ideas. Sometimes by making an intellectual detour, they will manage to convey a roughly similar notion. More often, they'll opt to simply discard the offending branch of thought. Even when reasoning about topics that are on the surface unrelated to Riskiness, self censorship still gets in the way, slowing down the discussion and introducing errors. How could it not? It's all one causally-connected world. Yet whatever the cost to good epistemology, the people there seem willing to pay it.

You may consider such a place strange, but I assure that all this seems perfectly logical to its people; morally necessary, even. It doesn't occur to many of them that there's any other way things could be. Perhaps to some of you, this place is starting to sound a little familiar?

I am talking, of course, about LessWrong, and the question of "things that advance AI capabilities".

Now, to be clear, I'm not saying that the concept of Riskiness is bullshit in its entirety. While I may disagree about where we should draw the line, I certainly would not publish the blueprints for a microwave nuke, and neither would I publish the design of a Strong AGI. What I am suggesting is that maybe we should consider our inability to talk openly about "capabilities stuff" to be a major handicap, and apply a lot more effort to removing it. We hope to solve AI alignment. If we can't even talk about large chunks of the very problem we're trying to solve, that sounds like we're in trouble.

What can we do?

I encourage everyone to try and think of their own ideas, but here are some that I came up with:

  • The very first thing is to keep in mind that capabilities insights are not an infohazard. They may be an exfohazard, something that you want to keep other people from learning, but any individual can expect to be better off the more capabilities knowledge they have. This also applies to groups if there is enough trust and similarity of goals within the group. In particular, note that it's possible to do a research project without publishing the results. It's harmless to sit down, write some code, run the code, and then not tell anyone about it. (Assuming you haven't accidentally written a seed AGI.) Even if the only benefits are that a few of your trusted alignment researcher friends get to learn about it and it helps improve your own understanding, that still might totally be worth it!

  • Make groups where we can share capabilities research and trust that it won't be leaked. Eliezer has made MIRI into such a group if I recall correctly, and probably other alignment orgs also do a bit of secret-keeping. I bet plenty of other people could also benefit by starting up such groups, eg. a group for timelines forecasters might be helpful.

  • The problem with just making groups is that most capabilities ideas aren't coming from MIRI (or replace "MIRI" with any other specific group, like the timelines forecasters. Call it Group X). To improve on this, we can at least set up one-way communication from the outside to Group X by publishing an email address where people can send their capabilities ideas. The most obvious problem I can think of with this plan is that it will be a magnet for cranks. My solution (should it turn out to be necessary) is to require working demo with any submission. This is safe because writing code without publishing anything is safe. A panel reviews submissions and passes them along if significant.

  • This one is a bit of a feature request for LessWrong: If would be nice if there were an option to limit visibility on particular posts and comments to only logged-in LW users. Maybe even make it possible to specify a Karma threshold that readers should have. People seem to be self-censoring even for capabilities insights that are already known, because they want to avoid giving them a signal boost. For example, not mentioning some paper that they thought made a big advance, or not mentioning their opinion on what high-level ingredients are needed for AGI. I think adding a little bit of a cloak of obscurity would make it much easier for people to be comfortable discussing such things.

New to LessWrong?

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 4:52 AM

The title of this post is misleading. This post is not about self-censorship; it is specifically about whether or not "things that advance AI capabilities" should be discussed on LessWrong.