Great question, I don't have deep technical knowledge here, but would also be very curious about this. Intuitively, that seems right that CoT monitoring doesn't transfer over very well to this case.

Moderation Log

More from Allen Thomas

Curated and popular this week

0 1

Google showed off Gemini Diffusion at their event yesterday and while I do not have access to the model just yet, based on what I have seen about so far, it seems impressive.

I have been interested in AI Safety for a while and have been trying to keep up with the literature in this space.

According to OpenAI's March 2025 blog about detecting misbehavior in models -

We believe that CoT monitoring may be one of few tools we will have to oversee superhuman models of the future.

Can we extend CoT monitoring for diffusion-based models as well?
Or do you think the intermediate states will be too incoherent to monitor?

This is my first LW post, and I do not have a lot of hands-on experience with safety techniques, so forgive me if my question is poorly framed.

LESSWRONG
LW

LESSWRONG
LW

6

[ Question ]

Which AI Safety techniques will be ineffective against diffusion models?

6

6