Leon Lang

I'm a PhD student at the University of Amsterdam. I have research experience in multivariate information theory and equivariant deep learning and recently got very interested into AI alignment. https://langleon.github.io/

Wiki Contributions


Has this already been posted? I could not find the post. 

For what it's worth, I think this comment seems clearly right to me, even if one thinks the post actually shows misalignment. I'm confused about the downvotes of this (5 net downvotes and 12 net disagree votes as of writing this). 

Now to answer our big question from the previous section: I can find some  satisfying the conditions exactly when all of the ’s are independent given the “perfectly redundant” information. In that case, I just set  to be exactly the quantities conserved under the resampling process, i.e. the perfectly redundant information itself.


In the original post on redundant information, I didn't find a definition for the "quantities conserved under the resampling process". You name this F(X) in that post.

Just to be sure: is your claim that if F(X) exists that contains exactly the conserved quantities and nothing else, then you can define  like this? Or is the claim even stronger and you think such  can always be constructed?

Edit: Flagging that I now think this comment is confused. One can simply define  as the conditional, which is a composition of the random variable  and the function 

When I converse with junior folks about what qualities they’re missing, they often focus on things like “not being smart enough” or “not being a genius” or “not having a PhD.” It’s interesting to notice differences between what junior folks think they’re missing & what mentors think they’re missing.


There may also be social reasons to give different answers depending on whether you are a mentor or mentee. I.e., answering "the better mentees were those who were smarter" seems like an uncomfortable thing to say, even if it's true. 

(I do not want to say that this social explanation is the only reason that answers between mentors and mentees differed. But I do think that one should take it into account in one's models)

Then  is a projection matrix, projecting into the span.

To clarify: for this, you probably need the basis  to be orthonormal? 


  • Disagreements often focus on outputs even though underlying models produced those.
    • Double Crux idea: focus on the models!
    • Double Crux tries to reveal the different underlying beliefs coming from different perspectives on reality
  • Good Faith Principle:
    • Assume that the other side is moral and intelligent.
    • Even if some actors are bad, you minimize the chance of error if you start with the prior that each new person is acting in good faith
  • Identifying Cruxes
    • For every belief A, there are usually beliefs B, C, D such that their believed truth supports belief A
      • These are “cruxes” if them not being true would shake the belief in A.
      • Ideally, B, C, and D are functional models of how the world works and can be empirically investigated
    • If you know your crux(es), investigating it has the chance to change your belief in A
  • In Search of more productive disagreement
    • Often, people obscure their cruxes by telling many supporting reasons, most of which aren’t their true crux.
      • This makes it hard for the “opponent” to know where to focus
    • If both parties search for truth instead of wanting to win, you can speed up the process a lot by telling each other the cruxes
  • Playing Double Crux
    • Lower the bar: instead of reaching a shared belief, find a shared testable claim that, if investigated, would resolve the disagreement.
    • Double Crux: A belief that is a crux for you and your conversation partner, i.e.:
      • You believe A, the partner believes not A.
      • You believe testable claim B, the partner believes not B.
      • B is a crux of your belief in A and not B is a crux of your partner’s belief in not B.
      • Investigating conclusively whether B is true may resolve the disagreement (if the cruxes were comprehensive enough)
  • The Double Crux Algorithm
    • Find a disagreement with another person (This might also be about different confidences in beliefs)
    • Operationalize the disagreement (Avoid semantic confusions, be specific)
    • Seek double cruxes (Seek cruxes independently and then compare)
    • Resonate (Do the cruxes really feel crucial? Think of what would change if you believed your crux to be false)
    • Repeat (Are there underlying easier-to-test cruxes for the double cruxes themselves?)


In this post, John starts with a very basic intuition: that abstractions are things you can get from many places in the world, which are therefore very redundant. Thus, for finding abstractions, you should first define redundant information: Concretely, for a system of n random variables X1, …, Xn, he defines the redundant information as that information that remains about the original after repeatedly resampling one variable at a time while keeping all the others fixed. Since there will not be any remaining information if n is finite, there is also the somewhat vague assumption that the number of variables goes to infinity in that resampling process. 

The first main theorem says that this resampling process will not break the graphical structure of the original variables, i.e., if X1, …, Xn form a Markov random field or Bayesian network with respect to a graph, then the resampled variables will as well, even when conditioning on the abstraction of them. John’s interpretation is that you will still be able to make inferences about the world in a local way even if you condition on your high-level understanding (i.e., the information preserved by the resampling process)

The second main theorem applies this to show that any abstraction F(X1, …, Xn) that contains all the information remaining from the resampling process will also contain all the abstract summaries from the telephone theorem for all the ways that X1, …, Xn (with n going to infinity) could be decomposed into infinitely many nested Markov blankets. This makes F a supposedly quite powerful abstraction.

Further Thoughts

It’s fairly unclear how exactly the resampling process should be defined. If n is finite and fixed, then John writes that no information will remain. If, however, n is infinite from the start, then we should (probably?) expect the mutual information between the original random variable and the end result to also often be infinite, which also means that we should not expect a small abstract summary F.

Leaving that aside, it is in general not clear to me how F is obtained. The second theorem just assumes F and deduces that it contains the information from the abstract summaries of all telephone theorems. The hope is that F is low-dimensional and thus manageable. But no attempt is made to show the existence of a low-dimensional F in any realistic setting. 

Another remark: I don’t quite understand what it means to resample one of the variables “in the physical world”. My understanding is as follows, and if anyone can correct it, that would be helpful: We have some “prior understanding” (= prior probability) about how the world works, and by measuring aspects in the world — e.g., patches full of atoms in a gear — we gain “data” from that prior probability distribution. When forgetting the data of one of the patches, we can look at the others and then use our physical understanding to predict the values for the lost patch. We then sample from that prediction.

Is that it? If so, then this resampling process seems very observer-dependent since there is probably no actual randomness in the universe. But if it is observer-dependent, then the resulting abstractions would also be observer-dependent, which seems to undermine the hope to obtain natural abstractions.

I also have a similar concern about the pencils example: if you have a prior on variables X1, …, Xn and you know that all of them will end up to be “objects of the same type”, and a joint sample of them gives you n pencils, then it makes sense to me that resampling them one by one until infinity will still give you a bunch of pencil-like objects, leading you to conclude that the underlying preserved information is a graphite core inside wood. However, where do the variables X1, …, Xn come from in the first place? Each Xi is already a high-level object and it is unclear to me what the analysis would look like if one reparameterized that space. (Thanks to Erik Jenner for mentioning that thought to me; there is a chance that I misrepresent his thinking, though.)


  • Goal: Find motivation through truth-seeking rather than coercion or self-deception
    • Ideally: the urges are aligned with the high-level goals
    • Turn “wanting to want” into “want”
  • If a person has simultaneously conflicting beliefs and desires, then one of those is wrong.
    • [Comment from myself: I find this, as stated, not evidently true since desires often do not have a “ground truth” due to the orthogonality thesis. However, even if there is a conflict between subsystems, the productive way forward is usually to find a common path in a values handshake. This is how I interpret conflicting desires to be “wrong”]
  • Understanding “shoulds”
    • If you call some urges “lazy”, then you spend energy on a conflict
    • If you ignore your urges, then part of you is not “focused” on the activity, making it less worthwhile
    • Acknowledge your conflicting desires: “I have a belief that it’s good to run and I have a belief that it’s good to watch Netflix”
      • The different parts aren’t right or wrong; they have tunnel vision, not seeing the value of the other desire
    • Shoulds: When there is a default action, there is often a sense that you “should” have done something else. If you would have done this “something else”, then the default action becomes the “should” and the situation is reversed.
    • View shoulds as “data” that is useful for making better conclusions
  • The IDC Algorithm (with an example in the article)
    • Recommendation: Do not tweak the structure of IDC before having tried it a few times
    • Step 0: Find an internal disagreement
      • Identify a “should” that’s counter to a default action
    • Step 1: Draw two dots on a piece of paper and name them with the subagents representing the different positions
      • Choose appropriate names/handles that don’t favor one side over the other
    • Step 2: Decide who speaks first (it might be the side with more “urgency”)
      • Say one thing embodied from that perspective
      • Maybe use Focusing to check that the words resonate
    • Step 3: Get the other side to acknowledge truth.
      • Let it find something true in the statement or derived from it
    • Step 4: The second side also adds “one thing”
      • Be open in general about the means of communication of the sides; they may also scribble something, express a feeling, or …
    • Step 5: Repeat
    • Notes:
      • It’s okay for some sides to “blow off steam” once in a while and not follow the rules; if so, correct that after the fact from a “moderation standpoint”
      • You may write down “moderator interjections” with another color
      • Eventually, you might realize the disagreement to be about something else.
        • This can give clarity on the “internal generators” of conflict
        • If so, start a new piece of paper with two new debaters
        • Ideally, the different parts understand each other better, leading them to stop getting into conflict since they respect each other's values


  • Focusing is a technique for bringing subconscious system 1 information into conscious awareness
  • Felt sense: a feeling in the body that is not yet verbalized but may subconsciously influence behavior, and which carries meaning.
  • The dominant factor in patient outcomes: does the patient remain uncertain, instead of having firm narratives
    • A goal of therapy is increased awareness and clarity. Thus, it is not useful to spend much time in the already known
    • The successful patient thinks and listens to information
      • If the verbal part utters something, the patient will check with the felt senses to correct the utterance
      • Listening can feel like “having something on the tip of your tongue”
  • From felt senses to handles
    • A felt sense is like a picture
      • There’s lots of tacit, non-explicit information in it
    • A handle is like a sketch of the picture that is true to it.
      • Handles “resonate” with the felt sense
      • The first attempt at a handle will often not resonate — then you need to iterate
        • In the end, you might get a “click”, “release of pressure”, or “sense of deep rightness”
      • The felt sense can change or disappear once “System 2 got the message”
  • Advice and caveats
    • The felt sense may also not be true — your system 1 may be biased.
    • Tips:
      • Choosing a topic: if you don’t have a felt sense to focus on, produce the utterance “Everything in my life is perfect right now” and see how system 1 responds. This will usually create a topic to focus on
      • Get physically comfortable
      • Don’t “focus” in the sense of effortful attention, but “focus” in the sense of “increase clarity”
      • Hold space: don’t go super fast or “push”; silence in one’s mind is normal
      • Stay with one felt sense at a time
      • Always return to the felt sense, also if the coherent verbalized story feels “exciting”
      • Don’t limit yourself to sensations in your body — there are other felt senses
      • Try saying things out loud (both utterances and questions “to the felt sense”)
      • Try to not “fall into” overwhelming felt senses; they can sometimes make the feeling a “subject” instead of an “object” to hold and talk with
        • Going “meta” and asking what the body has to say about a felt sense can help with not getting sucked in
        • Verbalizing “I feel rage” and then “something in me is feeling rage” etc. can progressively create distance to felt senses
  • The Focusing Algorithm
    • Select something to bring into focus
    • Create space (get physically comfortable and drop in for a minute; Put attention to the body; ask sensations to wait if there are multiple; go meta if you’re overwhelmed)
    • Look for a handle of the felt sense (Iterate between verbalizing and listening until the felt sense agrees; Ask questions to the felt sense; Take time to wait for responses)
Load More