Max Kanwal

PhD Student @ Stanford University

Wiki Contributions


Mechanistic Transparency for Machine Learning

I see two major challenges (one of which leans heavily on progress in linguistics). I can see there being mathematical theory to guide candidate model decompositions (Challenge 1), but I imagine that linking up a potential model decomposition to a theory of 'semantic interpretability' (Challenge 2) is equally hard, if not harder.

Any ideas on how you plan to address Challenge 2? Maybe the most robust approach would involve active learning of the pseudocode, where a human guides the algorithm in its decomposition and labeling of each abstract computation.