x

LESSWRONG

LW

Woody Gan — LessWrong

Woody Gan

Woody Gan

Message

4

1

1y

Woody Gan

4

1y

Text Steers Vision

Textual Steering Vectors Can Improve Visual Understanding in Multimodal Large Language Models TL;DR: We discovered the possibility of using steering vectors from text-only models to enhance visual reasoning in multimodal LLMs (MLLMs). The technique is simple: extract textual representations for concepts like "spatial relationships" and "counting" from the LLM backbone...

Jun 1, 2025•5