Ok, sure, machine learning is great for making predictions; But you can’t use it to replace scientific theory. Not only will it fail to reach generalizable conclusions, but the result is going to lack elegance and explainability. We won’t be able to understand it or build upon it!

What makes an algorithm or theory or agent explainable?

It's certainly not the ability o "look inside"; We're rather happy assuming that block boxes, such as brains, are capable of explaining their conclusions and theories. We scoff at the idea that perfectly transparent neural networks are "explainable" in a meaningful sense; So it's not visibility that makes something explainable.

Why was that prediction made as a function of our inputs and their interactions? Under what conditions would the outcome differ? These are questions of explainability, questions we may assume are critical for the pursuit of science. Answers to them hold most of the upside of explainability.

There's a running assumption that people are better than trained algorithms by this second definition. I believe it's mainly a matter of semantics.


Let's think of a physicist that takes in hundreds of petabytes of experimental data, reduces them using equations, and claims with a high probability that there exists this thing called a "Higgs Boson", and that it has various properties with implications for gravity and a bunch of other things.

The resulting Boson can probably be defined within a few pages of text via things such as mass, the things it decays into, its charge, its spin, the various interactions it can have with other particles, and so on.

If a Luddite like myself asks the physicists: "Why did you predict this fundamental particle exists?", he may give me a pop-sci story like:

Well, it was one of the missing particles that we've speculated about for years and fits our theories, helping us merge our models of particles with how gravity works. These experiments helped us "see" this boson in a way that's hard to doubt.

But this obviously explains nothing, it just tells me that the physicist is really sure and thinks this is awesome. If I press on and ask for a real explanation, one that requires counterfactuals and the quantitative motivation behind the claim I'd get something like:

Look, the standard model is a fairly advanced concept in physics, so you first have to understand that and why it came to be. Then you have to understand the experimental statistics needed to interpret the kind of data we work with here. In the process, you'll obviously learn quantum mechanics, but to understand the significance of the Higgs boson specifically it's very important that you have an amazing grasp of general relativity, since it was in part defined to serve as a unifying link between the two theories. Depending on how smart you are this might take 6 to 20 years to wrap your head around, really you won't even be the same person by the time you're done with this. And oh, once you get your Ph.D. and work with us for half a decade there's a chance you'll disagree with our statistics or our model, and might think that we are wrong, which is fine, but in that case, you will find the explanation unsatisfactory.

We are fine with this, since physics is bound to be complex, it earns its keep by being useful and making predictions about very specific things with very tight error margins, it's fundamental to all other areas of scientific inquiry.

When we say that we "understand" physics what we really mean is:

There are a few dozen of thousands of blokes that spent half their lives turning their brains into hyper-optimized physics-thinking machines and they assure us that they "understand" it to some extent.

For the rest of us, the edges of physics are a black box, I know physics works because Nvidia sells me GPUs with more VRAM each year and I'm able to watch videos of nuclear reactors glowing on youtube while patients in the nearby oncology ward are getting treated with radiation therapy.


This is true for many complex areas, we "understand" them because a few specialists say they do and the knowledge that trickles down from those specialists has results that are obvious to all. Or, more realistically, because a dozen-domain long chain of specialists combined, each relying on the other, is able to produce results that are obvious to all.

But what about a very high-end credit risk analysis machine learning algorithm making a prediction that we should give Steve a 14,200$ loan?

The model making this prediction might be operating with TBs worth of data about Steve, his browsing history, his transaction history, his music preferences, a video of him walking into the bank... each time he walked into the bank for the last 20 years, various things data aggregators tell us about him, from his preference about clothing to the likelihood he wants to buy an SUV, and of course, the actual stated purpose Steve gave us for the credit, both in text and as a video recording.

Not only that but the model has been trained on previous data from millions of people similar to Steve and the outcomes of the loans handed to them. It's working off petabytes of data in order to draw it's 1-line conclusion: "You should loan steve, at most, 14,200$, if you want to probabilistically make a profit".

If we wanted this model to have an "explainability" component it would probably give "PR" answers much like the physicist. Because the "real" answer would be something like:

Look, I can explain this to you, but 314,667,344,401 parameters had a significant role in coming up with this number, and if you want to "truly" understand that then you'd have to understand my other 696,333,744,001 parameters and the ways they related to each other in the equation. In order to do this, you have to gain an understanding of human-gate analysis as well as how its progress over time relates to life satisfaction, credit history analysis, shopping preference analysis, error theory behind the certainty of said shopping preferences, and about 100 other mini-models that end up coalescing into the broader model that gave this prediction. And the way they "coalesce" is even more complex than any of the individual models. You can probably do this given 10 or 20 years, but basically, you'd have to re-train your brain from scratch to be like an automated risk analyst, you'd only be able to explain this to another automated risk analyst, and the "you" "understanding" my decision will be quite different from the "you" that is currently asking.

And even the above is an optimist take assuming the algorithm is made of multiple modules that are somewhat explainable.


So, is the algorithm unexplainable here?

Well, not more than the physicist. Both of them can, in theory, explain the reasoning behind their choice. But in both cases, the reasoning is not simple, there's no single data point that is crucial, if even a few inputs were to change slightly the outcome might be completely different, but the input space is so fast it's impossible to reason about all significant changes to it.

This is just the way things are in physics and it might be just the way things are in credit risk analysis. After all, there's no fundamental rule of the universe saying it should be easy to comprehend by the human mind. The reason this is more obvious in physics is simply that physicists have been gathering loads of data for a long time. But it might be equally true in all other fields of inquiry, based on current models, it probably is. It's just that those other fields didn't have enough data nor the intelligence required to grok through it until recently.

So should we just accept a very complex machine learning model instead of more explainable theories? I don't think there's a clear-cut answer here. There's something fundamentally "nice" about clever theories, but clever theories turn out to almost never have any predictive power and require problem-specific models to model reality, the thing that both science and engineering ultimately aim for.


There seems to be a drive to distill theory into short abstraction, even if that abstraction loses most meaning along the way and requires years of theoretical baggage to interpret properly.

Shrodinger's equation is "short" only in so far as you want to comprehend it as a work of art, if you want to parametrize it to solve any real problem, heck, even an imaginary problem, you are back in billions-of-computation land doing something that’s awfully similar to machine learning.

A similar thing can be said about representing ml algorithms through linear algebra, which is useless in conveying most things that make or break the actual implementation. Getting a CNN to run on a GPU did not change the symbols behind it one bit, but revolutionized the field nonetheless. If your theory can't even represent ground-breaking changes that might not be a sign that it’s “timeless and elegant”, but rather that it’s modeling the wrong thing entirely.

But I digress,


It might be wise to have "explainable" theories as the basis for our models. Then optimize the resulting equations in order to fit more niche problems, but we might just be lying to ourselves in saying this gives us any real understanding.

I think the answer the world will opt for is likely pragmatic. We'll leave machine learning to build “hard-to-explain” models for problems that are valuable enough to need a solution, be those particle physics or protein folding. Problems where there's no pressure for an optimal solution, or even the ability to distinguish such a solution, might be accounted for with more "intuitive" models.

New to LessWrong?

New Comment
5 comments, sorted by Click to highlight new comments since: Today at 2:15 PM

I was going to make the point that I can look at the Wikipedia article for Higgs boson and gain a somewhat-decent understanding of it. Having tested it, thas hypothesis was disproven: after 1 minute and 30 seconds reading Wikipedia I conclude that even a cursory understanding of the Higgs Boson requires knowledge I do not have (it gives some particles mass. I get that part). Kudos, it was a good example.

I think the word "explainable" isn't really the best fit. What we really mean is that the model has to be able to construct theories of the world, and prioritize the ones which are more compact. An AI that has simply memorized that a stone will fall if it's exactly 5, 5.37, 7.8 (etc) meters above the ground is not explainable in that sense, whereas one that discovered general relativity would be considered explainable.

And yeah, at some point, even maximally compressed theories become so complex that no human can hope to understand them. But explainability should be viewed as an intrinsic property of AI models rather than in connection with humans.

Or, maybe we should think of "explainability" as the AI's lossy compression quality for its theories, in which case it must be evaluated together with our ability, as all modern lossy compression takes the human ear, eye and brain into account. In this case it could be measured by how close our reconstruction is to the real theory for each compression.

maybe we should think of "explainability" as the AI's lossy compression quality for its theories, in which case it must be evaluated together with our ability, as all modern lossy compression takes the human ear, eye and brain into account. In this case it could be measured by how close our reconstruction is to the real theory for each compression.

My view is along these lines, and see first link for an interesting example vis a vis this (start at min 17, or just read the linked paper in the description)

I think there is at least some difference between 'a full understanding of this requires years studying physics, I can give you a fuzzy layman's definition but it'll be missing important detail' and '33 million terms need to be incorporated to get any transparency whatsoever.  I think it's a quantitative rather than a qualitative difference in explainability, but I think it's definitely there.

If you assume the layman definition contains any information, which it doesn't