[ Question ]

What are good defense mechanisms against dangerous bullet biting?

by Mati_Roy1 min read21st Apr 202015 comments

6

Rationality
Frontpage

Some beliefs seem to naively imply radical and dangerous actions. But there often are rational reasons to not act on those beliefs. Knowing those reasons is really important for those that don't have a natural defense mechanism.

Most people have a natural defense mechanism which is to not taking ideas seriously. If you just follow what others do, it's less likely that an error in your explicit reasoning will lead you to doing something radical and dangerous. The more likely you are to make such errors, the most (evolutionary and individually) advantageous it is for you to have a conformist instinct.

The answer to this question is mostly meant for people to which I want to share ideas that are dangerous if taken at face value / object-level (I want to make sure they have those defense mechanisms first; I encourage you do the same, and do your due diligences when discussing dangerous ideas; this post is not sufficient). I want to advocate to smart people to take ideas more seriously, but I don't want them to fully repress their conformist instincts, especially if they haven't built in explicit defense mechanisms. This post should also be useful for people already not having those defense mechanism. And also useful to people that want to better understand the function of conformity (although conformity is not the only defense mechanism).

Note that the defense mechanisms are not meant as fully general counterargument. They are not insurmontable (at least, not always), they just indicate when it's prudent to want more evidence.

As a small tangent, is also often has a positive externality to do exploration:

Like it’s rational for any individual to be pursuing much more heavily exploitation based strategy as long as someone somewhere else is creating the information and part of what I find kind of charming and counterintuitive about this is that you realize people who are very exploratory by nature are performing a public service. (source: Computersciencealgorithmstacklefundamentalanduniversalproblems.Cantheyhelpuslivebetter,oristhatafalse hope?)

I will post my answer below.

Rationality4
Frontpage

6

New Answer
Ask Related Question
New Comment

1 Answers

Model uncertainty

Even if your model says there's a high probability of X, it doesn't mean X is very likely. You also need to take into account the probability that the model itself is right. See: When the uncertainty about the model is higher than the uncertainty in the model For example, you could ask yourself: what's the probability that I could read something that changed my mind about the validity of this model?

Beliefs vs impressions

Even if you have the impression that X is true, it might still be prudent to believe that maybe ~X if:

  • a lot of people you (otherwise) trust epistemologically disagree
  • a lot of our thinking on this seems still confused
  • it seems like we're still making progress on the topic
  • it seems likely that there's a lot of unknown unknowns
  • this type of question has a poor track record at being tackled accurately
  • you have been wrong with similar beliefs in the past
  • etc.

See: Beliefs vs impressions

Option value

Option value is generally useful; it's a convergent instrumental goal. Even if you are confident about some model of the world or moral framework, it might still remain a priority to keep your options open just in case your wrong. See Hard-to-reverse decisions destroy option value.

Group rationality

Promoting a norm of taking actions even when they are based on a model of the world few people share seems bad. See Unilateralist’s curse

14 comments, sorted by Highlighting new comments since Today at 11:39 PM

Bullet biting seems like a small subset of what you're gesturing at. Ideas may imply action without making it clear how those actions could go wrong (even if the act is successful).

oh yeah, that's true. I guess I thought of it in terms of bullet biting because they are the most propice to the most dangerous actions

Differential knowledge improvement / Differential learning

The order in which an agent (AI, human, etc.) learns things might be really important.

For a superintelligence, learning some information in the wrong order could pause an existential risk. For example, if they learn about Pascal's mugging argument before its resolution, they might get their future light cone mugged.

For a human, if they learn arguments for dangerous behavior before learning about 'defense mechanisms', this could have a high cost, including imminent death. See examples.

I think I could come up with many more examples. Let me know if interested.

Some beliefs seem to naively imply radical and dangerous actions.

Can you give some examples? Some belief sets (that is, the sum conditional prediction of a potential action, or the sum of empirical and deontological beliefs that relate to the action), within most decision theories, do imply actions. But "radical" and "dangerous" are just part of the belief sets, not external labels on the actions.

But there often are rational reasons to not act on those beliefs.

Are those reasons not simply beliefs that go into the decision? Can you give me an example of a non-belief rational reason to act or not-act?

My point is that if you don't have some of those general / meta beliefs described in this post, you will generally take much worse decisions, in a way that will often be known by you intuitively, but not by your explicit reasoning (which is dangerous if you don't take your intuitive warning signal seriously).

Let's assume you're someone that doesn't know the answer to the question I asked (or the information in the specific answer I gave).

Here are examples of what could go wrong.

Example 1

If you believe that a discontinuity in consciousness means you die, and when consciousness is reestablished in the brain, another mind is instantiated that is a copy of you. Then you might decide to not go back to sleep until you actually, biologically die from sleep deprivation.

While this could be the actual optimal choice, even taking into account this post, it seems likely to me that taking into account information in this post could change one's mind from 'not sleeping at all' to 'keeping normal sleeping habit'.

Some approach to moral uncertainty might actually recommend sleeping even if you're rather confident it will kill you because: % you care about discontinuity * how long you can go without sleeping << % you don't care about discontinuities * how long you can live if you sleep.

But if you don't know about how to integrate uncertainty at the model level in your reasoning, then you might just act based on your belief that sleep kills, and so stop sleeping. This error mode could severely affect a lot of people around me based on the 'object-level' beliefs I see shared around.

I've written more about this here, but I have made the post private for now as I'm revisiting whether it contains info-hazard.

Example 2

If you don't see any error with Pascal's mugging, and so you decide to act on its logical implications, then a mugger might rob you of everything, and render you a complete slave.

Actually, I'm not sure if I have a defense mechanism to propose for this one, beside knowing the resolution of the problem before / at the same time than being introduced to the problem. But one could argue that "your intuitions that this is wrong" would be a good defense mechanism against explicit reasoning going astray.

Can you give some examples?

Like a belief that you've discovered a fantastic investment opportunity, perhaps?

So, false beliefs are the risk here? I'd think the defense mechanism is Bayes' Rule.

The vast majority of people who read about Pascal's Mugging won't actually be convinced to give money to someone promising them ludicrous fulfilment of their utility function. The vast majority of people who read about Roko's Basilisk do not immediately go out and throw themselves into a research institute dedicated to building the basilisk. However, they also do not stop believing in the principles underpinning these "radical" scenarios/courses of action (the maximization of utility, for one). Many of them will go on to affirm the very same thought processes that would lead you to give all your money to a mugger or build an evil AI, for instance by donating money to charities they think will be most effective.

This suggests that most people have some innate way of distinguishing between "good" and "bad" implementations of certain ideas or principles that isn't just "throw the idea away completely". It might* be helpful if we could dig out this innate method and apply it more consciously.

*I say might because there's a real chance that the method turns out to be just "accept implementations that are societally approved of, like giving money to charity, and dismiss implementations that are not societally approved of, like building rogue AIs". If this is the case, then it's not very useful. But it's probably worth investigating some amount at least.

Someone wrote on the Facebook thread (sharing with permission):

Why did I build this model to begin with? Did I want to formalise some ethical intuitions, did I want to justify some phenomenon? Is my bullet-biting here running counter to my initial motivations for building the model, or is it just something one party sees as counter-intuitive while not being particularly counter-intuitive for me?