By Brian Muhia

Status: Mid-project progress report after developing some thinking about potential future research directions. This is also part of a grant application [private communication].


The historical amount of pesticide use in agriculture has been a constant producer of illness as a secondary effect, that can be reduced by early plant disease detection. This document by Our World In Data shows which countries are most affected by massive amounts of pesticide use. Artificial intelligence should be helpful in more targeted, early interventions that help people to avoid using large amounts of pesticides. This project is to ensure the intelligence is interpretable, trustworthy, reproducible and generative. First we train, then we explain a family of self-supervised vision-only models in order to study the value of rigorous interpretability methods, and their potential for educating practitioners on their uses - leading eventually to a world where more practitioners routinely explore model capabilities and limits with more rigor than is the current norm. The goal here is to improve food safety, and also AI safety by exemplifying a norm of rigorous interpretability research.

Training Log Narrative

<aside> 💡 These loss curves can be reproduced during the training of an unsupervised encoder (from imagenet weights, and from scratch) that is further finetuned for classification in two ways: mlp-only, and discriminatively. The notebooks are in the linked repo. In the plot I included about 60 different iterations, for different image scales and architectures.


This article starts with a walk-through of my thoughts about the different training runs from this public competition to help Egypt, while my memory is still fresh. First thought that occurs is that there seem to be two phases to training this model if you’re fine-tuning from ImageNet weights.

This is recommended as you can do this in a few hours on a laptop as a proof of concept, as opposed to a few days of waiting for the from-scratch model to randomly wander its way down to low-loss land.

First, we use the ImageNet weights to carve a significant chunk out of the error before plateauing. The dark red trajectory (scroll up to the cover image) shows a Res2Next50 making it out of what would have been several hours while training from scratch. Thankfully fp16 enables a high data rate and high batch size. In practice, we need fast iteration time, and image size is one way to get it. This lets us make decisions about killing training runs that aren’t going as fast as we need. Through careful monitoring of the speed of early training, I reload the best of these plateaued training checkpoints, first running a hyper-parameter search using Learner.lr_find. The resulting learning rate should enable a gradual change in the gradient.

And The Loss Curve Goes “Bump”

In the second phase, we need to initialize the optimizer such that it is in a state that it can find a loss trajectory that is gradual and not too fast, and maintain this slow random walk for a long time. With some manufactured luck i.e. many tries, we learn which hyperparameters are right. After some training time it uncovers “something significant” about the data/task, solving the problem much faster. We can/should do interpretability research to explain this “bump” in the loss curve. Something similar to individual contributions in, or Neel Nanda’s Grokking work.

We have two “converged” models, and even more checkpoints interrupted at various stages in training. They are trained in a unique data setup that allows us to test interpretability tools and methods:


Conditional Generative Models

Training a score-based conditional generative model on the SWaV projection’s representation for any two pairs of matching vs. non-matching images. This generates high-resolution images of these representations in an interactive, conditional manner. We can use this new capability to visualize representations for both matching, where X1 and X2 are the same plant, and non-matching images to show if there’s a discernible difference in quality or visualisation content, and if so whether this shows us how the model represents pictures of same vs different objects.

<aside> 💡 Since $X1$ and $X2$ are not copies, this is a different kind of unsupervised learning, and more like sensor/data fusion. There’s a structure encoded by the researcher, which is to get the model to integrate two sensor signals as one representation. We can use this to conduct experiments with different images.


  • This method uses one half of the dataset to represent the other half. This is why it is unique to visualize. Can we learn more about polysemantic neurons here?
  • Download/train a Res2Next50 “baseline off-the-shelf” model from ImageNet weights, preferrably a high-performance one. With RGB only. Visualize the difference in representations from classification alone and the self-supervised method.
  • What property of the representation’s visualization changes with encoder model scale?

Representational Similarity

Episodic Linear Probes

  • Linear probes provide some interpretability value, what if we used them while training and linked the loss functions of the main model and the linear probe to regularize the main model? ELP is a new method for regularizing deep neural networks. We can evaluate their performance benefits as well. Without making it cumbersome through poor interface choices, it should be possible to implement ELP as a live-plotting mechanism using callbacks in fastai.

Some kind of Eigenvalue Analysis

The Transformer Circuits thread and videos showed how to detect the presence of “induction heads”. Looking for eigenvalues in the projection head could share some insights if there’s some kind of interpretable algorithm we can reverse-engineer from the self-supervised model.

Adversarial Robustness

  • Use the multiple-image setting to test if/how a model like this can be susceptible and/or resistant to adversarial attacks.
New Comment