I thought I'd write about some basic concepts that apply to infohazards that I often find relevant, but don't have a handy reference to, when I wish to refer to them. (For instance most recently I wanted this concept to refer to why discussing weak pivotal acts can be infohazardous. [1]) I've also not spent too much time on this post, hope that is okay. I thought it was rather worth writing quickly than not at all, because LW and EA's current procedures for dealing with infohazards still feel not very good to me and I really wish there was more work on that.

 

Context: Only conflict-theoretic infohazards

For this post, I am considering infohazards in the conflict-theoretic sense. As per linked post by Jessica Taylor, ideal rational agents are never harmed by gaining more information. [2] Humans ofcourse are not such agents and can be harmed by things they learn. This is not what I will discuss in the post. What I will discuss in the post is adversarial games between multiple agents, where an agent A benefits if another agent B does not gain some information. Information here could be strategic moves themselves, or info that aids in developing such moves, or info that in any way improves one's world model to eventually increase the likelihood of playing better moves.

 

Examples of adversarial settings

  1. You want to forecast and discuss ways to defend against more efficient forms of totalitarian governments, and would-be totalitarians want to implement such efficient governments.
  2. You want to discuss weak pivotal acts that well-meaning AGI labs can take, and foreign militaries have reasons to defend against such acts. [1]
  3. You want to reduce x-risk, and some group of people actively wants our extinction.
  4. You want your country to benefit from the economic growth possible because of some invention, without another opposing country also obtaining the same economic growth by copying the invention.
  5. You want your country to benefit from military capabilities possible because of some invention, without another opposing country also obtaining the same capability by copying it.

 

Why helping is hard?

In real life you yourself are often not one of the two agents directly involved in the conflict. Instead you're a helper trying to help one of the agents because they're on the good side (as per whatever you consider good). The trivial way to do this is private communication with the good agent. Here's some reasons why you may not be able to do this.

  1. The agent you're trying to help does not exist yet, for instance a well-meaning AGI lab from the future.
  2. The agent you're trying to help does not have your attention yet, and you need to separately fight for this after you have found good ideas you wish to share with them.
  3. You aren't able to generate good ideas to give this agent all by yourself, and wish to work on the ideas with a public community first. A public community includes agents misaligned with you, and agents likely to make mistakes and share things with the wrong people.

I am most interested in 3 here, although 1 and 2 can also simultaneously apply to the same situation.

 

"Secrecy at scale" is unsolved

I find it helpful to consider the "secrecy at scale" problem an open and unsolved problem in human coordination. We typically know how to keep secrets in small groups but do not know how to keep them in large groups. Secrecy at scale is one example of a 1-of-n defection problem.

"Secrecy at scale" being unsolved (and possibly unsolveable) has implications. Most impressive acts in the real world require the coordination of a large number of people. However since secrecy is not possible, this means groups outside of this group are also able to benefit from the same information. This ensure for instance that all of civilisation roughly grows at the same rate, and it is much harder for any group (such as a nation) to obtain a lasting decisive advantage against other groups.

(I will leave open the question of whether some groups obtaining decisive advantage against others is good or bad. Clearly coordination of some against others can be both good or bad in an impartial moral sense, and it depends a lot on situation, I won't discuss this.)

 

As long as secrecy at scale remains unsolved, I will assume that in case number 3, where ideas are discussed in a public community, there is always risk of leaking information to adversaries. This is the situation I wish to discuss here.

 

Infohazards and inferential distances

 

Identifying what can and cannot be discussed in such a public setting can be hard.

Discussing nothing in public is one ideal extreme. Discussing anything at all is knowingly weakening your agent's strategic prospects (usually by some tiny amount) in the hope that this community will build on your ideas such that on net your agent's strategic prospects increase.

With this tradeoff in mind we can look at some things that may be fine to share.

 

FINE: Easy ideas

Ideas at a short inferential distance are usually fine to discuss. This includes:

 - moves you can make that your adversary very likely knows you can make

 - moves your adversary can make where your adversary knows you likely know they can make them

[Technically, what matters here is the ideas being easy enough to be common knowledge. So when I say "moves you can make that your adversary very likely knows you can make" what I mean is "moves you can make that your adversary very likely knows you can make AND your adversary very likely knows that you know that your adversary is very likely to know that you can make them AND your adversary knows ..." ad infinitum]

 

FINE: Hard but resilient ideas on your side

Ideas that are resilient to the other agent knowing may be fine to discuss. For instances moves that you can make, where even if the opponent knows of the move, there's nothing they can do to improve their situation with this knowledge.

Note that you need to be very confident here that your idea is indeed completely resilient.

Usually resilient ideas are moves that are decisively winning moves for you, and end the game. But in general the class of resilient ideas is larger than the class of decisively winning moves.

Note also that this only works if you're the only one who can make this move. It does not, for instance, make sense to discuss moves the adversary can take that are decisively winning for them.

 

MAY BE FINE: Impossibility proofs on either side

If you know some task is too hard for you to solve or some move you were hoping you have, does not exist, it may be fine to discuss this.

Similarly if you know for certain the opponent does not have a move you were until now uncertain they could have, it may be fine to discuss this.

Note that impossibility proofs are still a non-zero amount of information. They save resources your adversary may spend searching for moves that don't exist. And they reduce the probability your adversary assigns to certain futures, which could in expectation benefit them. But it is often true that these harms aren't that big compared to the benefit of sharing an impossibility proof with your own side.

 

NOT FINE: Hard ideas on either side

It does not make sense to publicly discuss a move your side has, which is hard for your adversary to anticipate, but if they could anticipate it they could defend against.

Similarly it does not make sense to publicly discuss a move that your adversary has, which is hard for your adversary to figure out, but if they were aware of it they could benefit from playing it.

 

There is therefore a scale where discussing easy things is fine, discussing hard things is not fine, but discussing impossible things can be fine. Can give it a name like a U scale or S scale or something.

 

Watch out for

 

Framing hazards

Even easy (or medium difficulty) ideas lie in a very high dimensional search space of possible ideas. The lines between easy, hard and impossible for you and your adversary get drawn out in this space, in both intuitive and unintuitive ways.

 

There are a very large number of easy ideas, more than we can think of at any given point. We typically search this space by orienting our search in specific directions to specific goals. It is often possible that searching things differently (say with different subgoals or perspectives in mind) lets you find easy ideas others haven't thought of. Framing a problem in a different or clearer perspective too could make it easier for an adversary to find new ideas.

 

Attentional hazards

 

Even public communities can discuss ideas that have less attention directed towards them. It may be possible to deliberately ensure certain portions of the ideas discussed by a community get less attention.

Whether this is possible depends on community size and readership of material published by the community. It also depends on how many adversaries exist in the community.

At the limit, you have dedicated helpers of adversaries scouring through every word, post and chat message in your community for anything that helps the adversary. In such a world diverting attention does not help. But there is a sweet spot between "public enough to have adversaries who can leak stuff" and "public and important enough where adversaries scour through everything", that most communities including LessWong lie inside of.

 

  1. ^

    Short answer: Pivotal acts typically require violating sovereignty of other nations that house AGI labs, and those countries spend a lot of resources optimising to retain their sovereignty. If a weak pivotal act is made public today it's possible those countries will take necessary defensive steps to ensure the act won't work.

  2. ^

    They may for instance be harmed by getting information they wrongly trust, but rational agents will also try to ensure they trust info to the right amount, and make the right amount of bayesian update on each new piece of info they get.

7

New Comment

New to LessWrong?