Maciej Satkiewicz

Message

It turns out that DNNs are remarkably interpretable.

I recently posted a paper suggesting that deep networks may harbor an implicitly linear model, recoverable via a form of gradient denoising. The method - called excitation pullback - produces crisp, human-aligned features and offers a structural lens on generalization. Just look at the explanations for ImageNet-pretrained ResNet50 on the...

Aug 4, 202512

Towards White Box Deep Learning

Hi, I’d like to share my paper that proposes a novel approach for building white box neural networks. The paper introduces semantic features as a general technique for controlled dimensionality reduction, somewhat reminiscent of Hinton’s capsules and the idea of “inverse rendering”. In short, semantic features aim to capture the...

Mar 27, 202418

LESSWRONG
LW

LESSWRONG
LW

Maciej Satkiewicz

Maciej Satkiewicz

Maciej Satkiewicz

Maciej Satkiewicz

It turns out that DNNs are remarkably interpretable.

Towards White Box Deep Learning

It turns out that DNNs are remarkably interpretable.

Towards White Box Deep Learning