In which I worry that the Less Wrong project might go horribly right. This post belongs to my Altruist Support sequence.

Every project needs a risk assessment.

There's a feeling, just bubbling under the surface here at Less Wrong, that we're just playing at rationality. It's rationality kindergarten. The problem has been expressed in various ways:

And people are starting to look at fixing it. I'm not worried that their attempts - and mine - will fail. At least we'd have fun and learn something.

I'm worried that they will succeed.

What would such a Super Less Wrong community do? Its members would self-improve to the point where they had a good chance of succeeding at most things they put their mind to. They would recruit new rationalists and then optimize that recruitment process, until the community got big. They would develop methods for rapidly generating, classifying and evaluating ideas, so that the only ideas that got tried would be the best that anyone had come up with so far. The group would structure itself so that people's basic social drives - such as their desire for status - worked in the interests of the group rather than against it.

It would be pretty formidable.

What would the products of such a community be? There would probably be a self-help book that works. There would be an effective, practical guide to setting up effective communities. There would be an intuitive, practical guide to human behavior. There would be books, seminars and classes on how to really achieve your goals - and only the materials which actually got results would be kept. There would be a bunch of stuff on the Dark Arts too, no doubt. Possibly some AI research.

That's a whole lot of material that we wouldn't want to get into the hands of the wrong people.

Dangers include:

  • Half-rationalists: people who pick up on enough memes to be really dangerous, but not on enough to realise that what they're doing might be foolish. For example, building an AI without adding the friendliness features.
  • Rationalists with bad goals: Someone could rationally set about trying to destroy humanity, just for the lulz.
  • Dangerous information discovered: e.g. the rationalist community develops a Theory of Everything that reveals a recipe for a physics disaster (e.g. a cheap way to turn the Earth into a block hole). A non-rationalist decides to exploit this.

If this is a problem we should take seriously, what are some possible strategies for dealing with it?

  1. Just go ahead and ignore the issue.
  2. The Bayesian Conspiracy: only those who can be trusted are allowed access to the secret knowledge.
  3. The Good Word: mix in rationalist ideas with do-good and stay-safe ideas, to the extent that they can't be easily separated. The idea being that anyone who understands rationality will also understand that it must be used for good.
  4. Rationality cap: we develop enough rationality to achieve our goals (e.g. friendly AI) but deliberately stop short of developing the ideas too far.
  5. Play at rationality: create a community which appears rational enough to distract people who are that way inclined, but which does not dramatically increase their personal effectiveness.
  6. Risk management: accept that each new idea has a potential payoff (in terms of helping us avoid existential threats) and a potential cost (in terms of helping "bad rationalists"). Implement the ideas which come out positive.

In the post title, I have suggested an analogy with AI takeoff. That's not entirely fair; there is probably an upper bound to how effective a community of humans can be, at least until brain implants come along. We're probably talking two orders of magnitude rather than ten. But given that humanity already has technology with slight existential threat implications (nuclear weapons, rudimentary AI research), I would be worried about a movement that aims to make all of humanity more effective at everything they do.

New Comment
18 comments, sorted by Click to highlight new comments since: Today at 9:25 PM

Half-rationalists: people who pick up on enough memes to be really dangerous, but not on enough to realise that what they're doing might be foolish. For example, building an AI without adding the friendliness features.

Not everything is about AI and existential risk. . We already have a section in the sequences about how knowing about cognitive biases can hurt you. It seems unlikely that anyone is going to get the knowledge base to build an AGI from simply being exposed to a few memes here. If there's one thing that we've seen from AI research in the last fifty years is that strong AI is really, really hard.

Rationalists with bad goals: Someone could rationally set about trying to destroy humanity, just for the lulz.

This seems extremely unlikely. Some humans like making things bad for other people. Those people don't generally want to destroy the world, because the world is where their toys are. Moreover, destroying humanity is something that takes a lot of effort. Barring making bad AGI, getting hold of a large nuclear arsenal, engineering a deadly virus, or making very nasty nanotech, humans don't have many options for any of these. All of these are very tough. And people who are doing scientific research are generally (although certainly not always) people who aren't getting much recognition and are doing it because they want to learn and help humanity. The people likely to even want to cause large scale destruction don't have much overlap with the people who have the capability. The only possible exceptions to this might be some religious and nationalist fanatics in some countries, but that's not a problem of rationality, and even they can't trigger existential risk events.

Dangerous information discovered: e.g. the rationalist community develops a Theory of Everything that reveals a recipe for a physics disaster (e.g. a cheap way to turn the Earth into a block hole). A non-rationalist decides to exploit this.

This isn't a rationalist-community worry. This is a general worry. As technology improves, people individually have more destructive power. That's a problem completely disconnected from rationalists. Even if such improved rationality did lead to massive tech leaps, it is rarely general theories that immediately give nasty weapons, but rather sophisticated and fairly complicated applications of them along with a lot of engineering. In 1939 the basic theory for atomic weapons existed, but they were developed in secret.

It seems unlikely that anyone is going to get the knowledge base to build an AGI from simply being exposed to a few memes here

Agreed; I was remarking on the danger of being exposed to a few memes on the Uber Less Wrong that we seek to become. Memes which we may have designed to be very accessible and enticing to lay readers.

Those people don't generally want to destroy the world, because the world is where their toys are

With a population of seven billion, it's hard not to commit a weak version of the typical mind fallacy and assume you can assume anything about people's motivations.

As technology improves, people individual have more destructive power

OK; so we should regard as risky any technology or social structure which we expect to significantly advance the rate of technological progress.

With a population of seven billion, it's hard not to commit a weak version of the typical mind fallacy and assume you can assume anything about people's motivations.

Diverse as we all are, humans occupy but a tiny pin-point in the mind-design space of all possible intelligent agents. That our values are mostly aligned is one reason that there are so many opportunities for the positive-sum exchanges that have fueled the economic growth of the past several hundred years. I rarely encounter anyone that really truly desires a future that I find horrible (though plenty claim to want such futures). On the other hand, plenty of people go about achieving their goals in all sorts of awfully inefficient ways.

Improving the world's rationality to the point that people can actually achieve their goals seems like the least of our worries.

And people are starting to look at fixing it. I'm not worried that their attempts - and mine - will fail. At least we'd have fun and learn something.I'm worried that they will succeed.

Your concerns are misplaced. I think you're a consequentialist, so perhaps instead of thinking about it as "failing to start effective rationalist communities" you should think of it as "shattering all existing rationalist communities, leaving a handful of uncoordinated splinters whose influence on the world is minimal." The transposition is bad for morale, but good for getting over status quo bias.

Trading off probability of success against risk reduction is reasonable to consider, but after only a little considering I strongly believe that all of your proposals except for "1. Ignore" go way too far.

A simpler version of my question can be: does a healthy, effective rationalist community make unfriendly AI more or less likely? I'd like to see some evidence that the question has at least been seriously considered.

Not everything is about AI and direct existential risks.

For instance, the first thing that comes to my mind in this space is effective rationalist memes becoming coupled with evil, or just insufficiently-thoughtful and ethics-challenged, ideologies or projects, creating dangerous and powerful social movements or conspiracies. Nothing to do with AI and not a direct x-risk, but something that could make the world more violent and chaotic in a way that would likely increase x-risk.

I upvoted the OP and think this topic deserves attention, but share JoshuaZ's criticisms of the initial examples.

Suggest putting this in your post. One sentence summaries are always good.

I trust the judgment of my rational successors more than my own judgment; insofar as the decision to work on FAI or not is based on correct reasoning, I would rather defer it to a community of effective rationalists. So I don't believe that the proportion of working going into safe technological development is likely to decrease, unless it should.

A good default seems to be that increasing the rate of technical progress is neutral towards different outcomes. Increasing the effectiveness of researches presumably increases the rate of intellectual progress as compared to progress in manufacturing processes (which are tied to the longer timescales associated with manufacturing) which is a significant positive effect. I don't see any other effect to counteract this one.

We may also worry that today rationality provides a differential advantage to those developing technology safely which will be eroded as it becomes more common. Unfortunately, I don't think that a significant advantage yet exists, and so developing rationality further will at least do very little harm. This does not rule out the possibility that an alternative course, which avoids spreading rationality too much (or spreading whatever magical efficiency-enhancing fruits rationality may provide), may do even more good. I strongly suspect this isn't the case, but it is a reasonable thing to think about and the argument is certainly more subtle.

A good default seems to be that increasing the rate of technical progress is neutral towards different outcomes.

I think the second-order terms are important here. Increasing technological progress benefits hard ideas (AI, nanotech) more than comparatively easy ones (atomic bombs, biotech?). Both categories are scary, but I think the second is scarier, especially since we can use AI to counteract existential risk much more than we can use the others. Humans will die 'by default' - we already have the technology to kill ourselves, but not that which could prevent such a thing.

I would be worried about a movement that aims to make all of humanity more effective at everything they do.

It seems like the dangerous thing isn't so much widespread rationality so much as widespread superpowerful tools. Like, to the point that you can easily build a nuke in your garage, or staple together an AI from other available programs.

I'm not too too worried about people getting better at achieving their goals, because most people at least understand that they don't want to murder their friends and family. Like, a human tasked with eliminating malaria wouldn't think that killing all life on Earth would be the best way to accomplish that.

I'd prefer for most people to be better at achieving their goals, even if those goals x-risk reduction. Like, it would be nice if churches trying to help raise the quality of life of poor people did better at doing so.

In some areas (like status), everyone getting better at accomplishing their goals would probably be zero-sum, but its already that way. As it stands, I would like for rationalists to increase their status in order to better get things done, but once everyone's a rationalist I don't have strong preferences for who has more status than who. So its not that big of a problem.

Some people are often mentioned as wanting destruction (suicide bombers, for instance). Gwern posted a nice article about how ineffective they are, and it seems like becoming more rational (and thus happier, more well-adjusted) would probably draw them out of terrorism.

Gwern posted a nice article about how ineffective they are, and it seems like becoming more rational (and thus happier, more well-adjusted) would probably draw them out of terrorism.

I would particularly point out the section on how Black September was disbanded: http://www.gwern.net/Terrorism%20is%20not%20about%20Terror#about-the-chicks-man

It's an amazing and striking anecdote.

Some people (suicide bombers, for instance) seem to actually want to cause destruction to humanity.

Every movement that employs suicide bombing is highly political and absolutely not nihilist. All of them involve some preexisting milieu of hate and destruction, which provides the moral imperative behind the act. It's about making the ultimate sacrifice of which a person is capable, not just about causing as much damage as possible.

The point stands, but I edited the wording.

I have some optimism that a sort of "organizational takeoff" might occur, though if it does I expect it will happen in an organization which can convert its constituents' rationality into money in order to grow. Such an organization is likely to look quite unlike LW, and it may interact with the public in a very different way than you envision (for example, it may recruit widely but selectively, and may not publish anything at all).

But given that humanity already has technology with slight existential threat implications (nuclear weapons, rudimentary AI research), I would be worried about a movement that aims to make all of humanity more effective at everything they do.

We already have Science...

Science brought us the dangerous technologies that I mentioned. What might a Vastly Improved Science bring us?

What might a Vastly Improved Science bring us?

There's an implicit issue here that the improvement would be vast. That seems very far from obvious.

The idea being that anyone who understands rationality will also understand that it must be used for good.

I thought you were going to say that the do-good ideas would repulse rationalists with bad goals. That seems reasonably likely.