This post is part of my hypothesis subspace sequence, a living collection of proposals I'm exploring at Refine. Preceded by representational tethers.

Intro

Consider diffusion models. DALL-E is able to take in an image as input and imperceptibly edit it so that it becomes slightly more meaningful (i.e. closer to the intended content). In this, DALL-E is an optimizer of images. It can steadily nudge images across image space towards what people generally deem meaningful images. But an image diffusion model need not nudge images around in this specific way. Consider an untrained version of DALL-E, whose parameters have just been initialized a moment ago. It still implicitly exerts currents across image space, it's just that those currents don't systematically lead towards the manifold of meaningful images. We deem an image diffusion model good if the field it implements nudges images in a way we deem appropriate. Conversely, a bad one simply specifies a wildly different field.

Naturally, we'd like good models. However, we lack an exhaustive closed-form description of the desired field for pushing towards meaningful images. The best we managed to do is come up with a way of roughly telling which of two models is better. We then use (1) this imperfect relative signal, and (2) a sprinkle of glorified hill-climbing as means of obtaining a good model by charting our way through model space. We might fill those in with e.g. (1) incremental denoising performance on ImageNet, and (2) SGD. Those two ingredients help us define a trainer for the diffusion model, a way from getting from bad models towards good ones. We deem a trainer good if the field it implements nudges models in a way we deem appropriate (i.e. from bad to good). Conversely, a bad one simply specifies a wildly different field across model space.

Naturally, we'd like good trainers. However, we lack an exhaustive closed-form description of the desired field for reliably pushing towards truly good models. You've guessed it, we did manage to come up with a way of roughly telling which of two trainers is better. We similarly use this rough relative signal and our favorite hill-climbing algorithm as means of obtaining a good trainer by charting our way across trainer space. We might fill those in with e.g. (1) convergence speed, and (2) grid search across hyperparameters. Those two help us define a hyperparameter ...

Hypothesis Subspace

Hypothesis Subspace