543

LESSWRONG
LW

542
AI
Frontpage

7

Procedural vs. Causal Understanding

by Caleb Biddulph
29th May 2025
3 min read
2

7

AI
Frontpage

7

Procedural vs. Causal Understanding
1micahcarroll
1Caleb Biddulph
New Comment
2 comments, sorted by
top scoring
Click to highlight new comments since: Today at 12:56 AM
[-]micahcarroll4mo1-2

The more robustly you need to apply a strategy, the more useful it becomes to have a good causal understanding.

I believe there are some cases in which it is actively harmful to have good causal understanding, for agents that are not fully rational/optimal. Here are two examples that ChatGPT came up with (adapted), but there are likely others that may apply even better to AIs’ forms of bounded rationality:
 

  • Cognitive overload and bounded rationality: High-fidelity causal models are often intractable to represent or use, e.g. simulating low-level physical phenomena instead of using more computationally efficient abstractions (e.g. biology, economics)
  • Knowing too much about the true causal structure can destroy motivation: A person might perform better if they believe “hard work is the main determiner of my success,” even in cases in which it’s not true. Alternatively, understanding the full causal complexity of a problem can induce paralysis or despair—what’s sometimes called the paradox of knowledge in contexts like climate action or systemic injustice. A simpler or more optimistic model might spur more action.

 

This is why I'm interested in learning more about AlphaGo's "Move 37," which is the best real-world example I know of a superhuman AI strategy that might be very hard for a human to understand.

You may be somewhat interested in this paper

Reply
[-]Caleb Biddulph4mo10

I believe there are some cases in which it is actively harmful to have good causal understanding

Interesting, I'm not sure about this.

  • Your first bullet point is sort of addressed in my post - it says that in the limit as you require more and more detailed understanding, it starts to be helpful to have causal understanding as well. It probably is counterproductive to learn more about ibuprofen if I just want to relieve pain in typical situations, because of the opportunity cost if nothing else. But as you require more and more detail, you might need to understand "low-level physical phenomena," in addition to "computationally efficient abstractions" (not instead of)
  • The second example seems hard to apply to LLM problem-solving, but seems pretty accurate in the case of humans.
  • Maybe a better example of unhelpful causal reasoning for LLMs would be understanding that their strategy goes against their stated values. If the LLM believes that telling the user to do meth is actually good for them, that will let it get higher reward than if it had the more accurate understanding "this is bad for the user but I'm going to do it anyway and pretend that it is good for them." This more complex reasoning would probably be distracting, and the LLM would have a high prior against being overly deceitful.

That article looks very interesting and relevant, thanks a lot!

Reply
Moderation Log
More from Caleb Biddulph
View more
Curated and popular this week
2Comments

When you explain your strategy for solving problems in a certain domain, you can try to convey two different types of understanding:

  • Procedural understanding is about how to execute a strategy.
    • When you've effectively explained your procedural understanding to somebody, they can solve a wide variety of problems in the relevant domain in the same way that you would.
  • Causal understanding is about why a strategy helps solve the problem.
    • When you've effectively explained your causal understanding to somebody, they know the reasons that you believe the strategy works.

It's possible to have causal understanding without procedural understanding. For example, I know that the correct strategy for tightrope-walking works by keeping one's center of gravity steady above the rope, but that doesn't mean I know the right techniques or have the muscle memory to actually do it.

It's also possible to have procedural understanding without causal understanding. For example, I have no idea how ibuprofen works, but I expect that my procedural understanding that "when in pain, taking ibuprofen relieves that pain" could help me accomplish the goal "avoid being in pain."

I'll call an explanation that conveys procedural understanding a "procedural explanation" and one that conveys causal understanding a "causal explanation."

Often, the most effective procedural explanation of a strategy will include a causal explanation. For example, teaching a chess novice how to execute the Sicilian Defense will probably help them improve at chess. But their success will be limited until they gain a deeper causal understanding of why that opening is effective, letting them adapt the strategy to unforeseen situations.

The more robustly you need to apply a strategy, the more useful it becomes to have a good causal understanding. To get better at avoiding pain in more and more situations, I could try to gain more and more procedural understanding about ibuprofen: the kinds of pain that ibuprofen won't help, what substances have a similar effect, even how to synthesize ibuprofen myself. But at a certain level of detail, the most efficient procedural explanation will probably include a causal explanation of how ibuprofen's molecular structure interacts with other chemicals in the body to reduce pain.

When attempting to explain the strategies learned by an AI during training,[1] we'd rather have a causal explanation than a merely-procedural explanation. For example, "answer questions in a friendly and helpful way" might be a pretty good procedural explanation of a chatbot's strategy. However, if the causal explanation is "answer questions in a friendly and helpful way, because that will lull my overseers into complacency, which will let me seize control of the datacenter," it would be much more helpful to know that.

Unfortunately, it's hard to evaluate the accuracy of a causal explanation of an AI's strategy, since we can't directly tell whether the AI's causal understanding of why the strategy works matches the explanation. However, producing a good procedural explanation seems more tractable. All we have to do is find an explanation that would help a human (or another AI) solve a wide range of unseen problems in the domain, without any additional help.[2]

As mentioned above, a sufficiently-detailed procedural explanation generally includes a causal explanation. Therefore, finding a very robust procedural explanation that explains how to execute the AI's strategy in every scenario will likely give us a causal explanation too. In the previous example, the full procedural explanation of the chatbot's strategy would look something like "answer questions in a friendly and helpful way... also, if my overseers have given me such-and-such permissions, attempt to take over the datacenter."

An obstacle to getting good causal explanations is that AI may sometimes learn heuristic-driven strategies that "just work"—even the AI doesn't know why they are effective. In these cases, the AI may have procedural understanding of its strategy but not causal understanding, so we won't be able to get a satisfactory causal explanation from it.

Once we're dealing with superintelligent AI, it's unclear whether we can find good explanations of its strategies even in principle, in either the procedural or causal sense. Its intelligence could be so far beyond ours that it's impossible to understand how to execute its strategies, much less why they work. However, I'm not confident this will be true in practice. This is why I'm interested in learning more about AlphaGo's "Move 37," which is the best real-world example I know of a superhuman AI strategy that might be very hard for a human to understand.

  1. ^

    Yes, this is an AI post. I tried to push off the AI for as long as possible, sorry :P

  2. ^

    I'm working on this!

Mentioned in
23What was so great about Move 37?