The boring claim is that diffusion models learn to generalize beyond their exact training set, and models trained on training sets drawn from the same distribution will be pretty good at unseen images drawn from that distribution - and because they're both pretty good, they'll overlap in their suggested desnoisings.

The exciting claim is that diffusion models trained on overlapping data from the same dataset learn nearly the same algorithms, which can be seen because they produce suggested denoisings that are similar in ways that would be vanishingly unlikely if they weren't overlapping mechanistically.

AFAICT, they show the boring claim that everyone already knew, and imply the exciting claim but don't support it at all.

Reply

[-]Bogdan Ionut Cirstea2y30

Haven't read in detail but Fig. 2 seems to me to support the exciting claim (also because overparameterized models with 70k trainable parameters)?

Reply

[-]Charlie Steiner2y30

Okay, sure, I kind of buy it. Generated images are closer to each other than to the nearest image in the training set. And the denoisers learn similar heuristics like "do averaging" and "there's probably a face in the middle of the image."

I still don't really feel excited, but maybe that's me and not the paper.

Reply

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

4

[Linkpost] Generalization in diffusion models arises from geometry-adaptive harmonic representation

4

4