LESSWRONG
LW

LINGJIE CHEN

Message

VLMs can Aggregate Scattered Training Patches

This is the abstract and summary of our new paper. We show that vision-language models can learn to reconstruct harmful images from benign-looking patches scattered across training data—a phenomenon we call visual stitching. This ability allows dangerous content to bypass moderation and be reassembled during inference, raising critical safety concerns...

Jun 16, 2025•2

LINGJIE CHEN

LINGJIE CHEN — LessWrong

LINGJIE CHEN

Message

VLMs can Aggregate Scattered Training Patches

Jun 16, 2025•2

LINGJIE CHEN

VLMs can Aggregate Scattered Training Patches

LINGJIE CHEN

8mo

This is the abstract and summary of our new paper. We show that vision-language models can learn to reconstruct harmful images from benign-looking patches scattered across training data—a phenomenon we call $visual stitching$ . This ability allows dangerous content to bypass moderation and be reassembled during inference, raising critical safety concerns for VLMs.

Authors: Zhanhui Zhou, Lingjie Chen, Chao Yang, Chaochao Lu.

See our project page and full code repo at Github.

Figure 1: Illustration of visual stitching. (Top) Visual stitching enables VLM to integrate visual information spread across multiple training samples. After finetuning on ${(patch, ID)}$ of a cat, VLMs can verbalize the $ID$ when given the full $image$ or a text $reference$ to the image, despite never training on them. (Bottom) Visual stitching enables... (read 977 more words →)