3961

LESSWRONG
LW

3960
AI ControlAI
Personal Blog

6

[ Question ]

Which AI Safety techniques will be ineffective against diffusion models?

by Allen Thomas
21st May 2025
1 min read
A
0
1

6

6

Which AI Safety techniques will be ineffective against diffusion models?
1Oscar
New Answer
New Comment
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 12:09 PM
[-]Oscar4mo10

Great question, I don't have deep technical knowledge here, but would also be very curious about this. Intuitively, that seems right that CoT monitoring doesn't transfer over very well to this case.

Reply
Moderation Log
More from Allen Thomas
View more
Curated and popular this week
A
0
1
AI ControlAI
Personal Blog

Google showed off Gemini Diffusion at their event yesterday and while I do not have access to the model just yet, based on what I have seen about so far, it seems impressive.

I have been interested in AI Safety for a while and have been trying to keep up with the literature in this space.

According to OpenAI's March 2025 blog about detecting misbehavior in models -

We believe that CoT monitoring may be one of few tools we will have to oversee superhuman models of the future.

Can we extend CoT monitoring for diffusion-based models as well? 
Or do you think the intermediate states will be too incoherent to monitor? 



This is my first LW post, and I do not have a lot of hands-on experience with safety techniques, so forgive me if my question is poorly framed.