Posts

Sorted by New

Wiki Contributions

Comments

Just like language models are trained using masked language modelling and next token prediction, Dall-E was trained for image inpainting(predicting cropped-out parts of an image).  This doesn't require explicit labels; hence it's self-supervised learning. Note this is only a part of the training procedure, which is self-supervised and not the whole training process.