They create a synthetic dataset for lit and unlit rooms with styleGAN. They exploit the fact that the GAN has disentangled and meaningful directions in its latent space, that can be individually edited. They find a lighting latent automatically, by taking noise that produces rooms, editing each latent in turn and looking for big changes specifically on light pixels
- StyleGAN does not have a text input, and there's no mention of prompting (as far as I can tell - I'm not familiar with GANs). This is not a DALL-E style model. Its input is just Gaussian noise
- This is a really cool result, and I am excited about it! The claim that GANs have disentangled latents (and that this is known), which makes this less exciting (man I wish this was true of LLMs). But it's still solid!
- This is in section 4.1
They manually create a dataset of lit and unlit rooms, which isn't that interesting. They use this for benchmarking their method, not for actually training it (I don't find this that exciting)
They use the GAN as a source of training data, to train a model specifically for lit -> unlit rooms (I don't find this that exciting)

[-]Sam Marks3yΩ240

Yeah, sorry, I should have made clear that the story that I tell in the post is not contained in the linked paper. Rather, it's a story that David Bau sometimes tells during talks, and which I wish were wider-known. As you note, the paper is about the problem of taking specific images and relighting them (not of generating any image at all of an indoor scene with unlit lamps), and the paper doesn't say anything about prompt-conditioned models. As I understand things, in the course of working on the linked project, Bau's group noticed that they couldn't get scenes with unlit lamps out of the popular prompt-conditioned generative image models.

[-]Neel Nanda3yΩ120

Ah, thanks for the clarification! That makes way more sense. I was confused because you mentioned this in a recent conversation, I excitedly read the paper, and then couldn't see what the fuss was about (your post prompted me to re-read and notice section 4.1, the good section!).

[-]Neel Nanda3yΩ466

Another thought: The main thing I find exciting about model editing is when it is surgical - it's easy to use gradient descent to find ways to intervene on a model, while breaking performance everywhere else. But if you can really localise where a concept is represented in the model and apply it there, that feels really exciting to me! Thus I find this work notably more exciting (because it edits a single latent variable) than ROME/MEMIT (which apply gradient descent)

[-]AlphaAndOmega3y10

Did they try running unCLIP on an image of a room with an unlit lamp, assuming the model had a CLIP encoder?

That might have gotten a prompt that worked.

Moderation Log

More from Sam Marks

Curated and popular this week

^{^}

That is, I find it very likely that DALL-E-2 "can" produce images of bedrooms with unlit lamps, for whatever reasonable interpretation you'd like of what that "can" means.

^{^}

The general vibe here is "the effectiveness of model editing scales with our ability to generate accurate interpretability hypotheses, whereas the effectiveness of finetuning scales with something different." In more detail:

* Finetuning relies on being able to effectively evaluate outputs, whereas model editing might not. For example, suppose that we would like to get a language model to write good economics papers. Finetuning for human evaluations of economics papers could jointly optimize for papers which are more true as well as more persuasive. On the other hand, if our interpretability were good enough to separate out the mechanisms for writing truly vs. writing persuasively, then we could hope to make a model edit which results in more true but not more persuasive papers output.

* Finetuning might require large datasets, whereas model editing might not. Again in the economics paper writing example, finetuning a model to write good economics papers might require a large number of evaluations. On the other hand, we might be able to generate accurate interpretability hypotheses using only a small number of dataset examples (and possibly no dataset examples at all).

* Model editing might transfer better out-of-distribution. If the interpretability hypothesis on which we base a model edit is good, then the model edit ought to have the predicted effect on a broad distribution of inputs. On the other hand, fine-tuning only guarantees that we get the desired behavior on the fine-tuning distribution.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

68

Turning off lights with model editing

68

Ω 31

68

Ω 31