There isn't a lot of talk about image models (e.g. Dall-E and StableDiffusion) on LW in the context of alignment, especially compared to LLMs. Why is that? Some hypotheses:
- LLMs just happened to get some traction early, and due to network effects, they are the primary research vehicle
- LLMs are a larger alignment risk than image models, e.g. the only alignment risk of image generation comes from the language embedding
- LLMs are not a larger alignment risk, but they are easier to use for alignment research
idk if this is The Reason or anything, but one factor might be that current image models use a heavily convolutional architecture and are as a result quite a bit weaker. transformers are involved, but not as heavily as in current language models.
You're saying that transformers are key to alignment research?
I would imagine that latent space exploration and explanation is a useful part of interpretability, and developing techniques that work for both language and images improves the chance that the techniques will generalize to new neural architectures.