Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

How does 'understanding' a problem work, and when do we feel like we understood an explanation or proof? Having an opaque long formal proof often feels insufficient, similarly some arguments feel unsatisfying because they contain many subjectively arbitrary choices.

An explanation is a sequence of claims corroborating a statement. A reasoner understands an explanation (or rather has more understanding of an explanation) when each claim has low surprisal given its mental state after reading all of the previous statements, starting with the problem statement. The aggregate of the surprise over the entire explanation indicates how poorly understood it is. The measure of surprisal is essentially about the reconstructability of the explanation using something like Monte Carlo Tree Search over all possible explanations informed by the mental states of the reasoner.

The surprisal of claims is roughly how long it would take for the reasoner to come up with the claim given its current mental state. Since the space of such claims is exponential in the length of the claim the reasoner has to use some form of attention to guide its search. We can model this attention mechanism by an ontological graph. Such a graph encodes the collection of mental concepts and associative links between them. The mental state of the reasoner is an activation of an ensemble of concepts and the associative links make available other concepts, lowering their surprisal when invoked in the next step of the explanation.

When a step in an explanation is highly surprising some understanding is needed. The reasoner does this by modifying its ontology, creating new concepts or creating associations that make the step more obvious. I call such modifications insights, a good insight gives clarity and makes steps obvious and subjectively trivial.

To examine this situation consider the mutilated chessboard problem[1] and subsequent explanation:

Suppose a standard 8×8 chessboard has two diagonally opposite corners removed, leaving 62 squares. Is it possible to place 31 dominoes of size 2×1 so as to cover all of these squares?

The answer that it is impossible. Opposite corner squares always have the same color so that there are 30 and 32 white or black squares left. A domino placed on the chessboard will always cover one white square and one black square. So there will always be either two white or two black squares left which can not be covered by a domino.


The problem becomes trivial after making available the idea of attaching the invariant to the problem. We can imagine that the color invariant is activated by the joint activation of the domino and chess nodes in the reasoners ontology.

In this scheme there is nothing stopping a reasoner from encoding solutions to all problems directly into its ontology. This leads to 1) poor generalization/transfer; and
2) a large ontology.

Compression deals with both of these issues, indeed compression is closely related to generalization. General compression works by recognizing and exploiting all detectable patterns with certain computational bounds. Once these patterns are detected they can be extract and recognized in different contexts, which is generalization.[2]

Compressing the ontology has the same general shape as understanding explanations, except this time we want to understand all elements in our ontology given the rest of the ontology. This is related to drop-out: how surprised would I be of this ontological node given that it was not in my ontology. Compressing of the ontology has the nice feature that the ontology becomes more fault tolerant: things that are forgotten but once well understood can be reconstructed from context.

In the example of the mutilated chessboard we can explain the insight by noting that is a general instance of attaching invariant to problems.

I believe there is both pressure to prune nodes with low surprisal (beliefs that do not pay rent) and low activation rates. Very low surprisal nodes can be safely pruned as they are reconstructible from context. On the other hand, nodes with very high surprisal are hard to recall (i.e. get activated) and will also be pruned. This explains why opaque formal proofs don't feel like satisfying explanations, it's hard to fit them into our information retrieval indices.


[1]: https://en.wikipedia.org/wiki/Mutilated_chessboard_problem

[2]: Compression in the colloquial sense only exploits patterns with very low computational complexity

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 2:14 PM

I think this is a useful perspective. A way to reformulate (part of) this is: an explanation is good when it looks like it was sampled from the distribution of explanations you would come up with yourself, if you thought about the problem long enough. This resonates a lot with some of my (and other people's) thought about AI alignment, where we try to constrain the AI to "human-like" behavior or make it imitate human behavior (this happens in Delegative Reinforcement Learning, Quantilization and other approaches).