Model Reduction as Interpretability: What Neuroscience Could Teach Us About Understanding Complex Systems
TL;DR: Neuroscientists face the same interpretability problem as AI safety researchers: complex, inscrutable systems with thousands of parameters that transform inputs to outputs. I worked on a systematic method to find the minimal features that capture the input-output computation under specific conditions. For cortical neurons with thousands of morphological/biophysical parameters,...
Jan 1211