LESSWRONG
LW

Ruixuan Huang
11210
Message
Dialogue
Subscribe

An undergraduate student from USTC CSE. Currently during internship in MSRA social computing group. Recent research interests are XAI-related issues.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Subspace Rerouting: Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Ruixuan Huang5mo10

Great job! Consider reading our related paper: https://arxiv.org/abs/2404.12038

Reply
No wikitag contributions to display.
8Steering LLMs' Behavior with Concept Activation Vectors
1y
0
6Exploring the Evolution and Migration of Different Layer Embedding in LLMs
1y
0