x

LESSWRONG

LW

Xinyu Zhou — LessWrong

Xinyu Zhou

Xinyu Zhou

Message

1

12d

Xinyu Zhou

12d

The case for unlearning that removes information from LLM weights

Xinyu Zhou12d10

Thanks for the excellent note. I wanted to offer an alternative hypothesis on the unlearning of the random birthday (RB) dataset: the RB dataset might be effectively unlearned primarily because the labels are completely randomly generated.

Because there are no semantic relationships across the data points, the model is forced to memorize each fact in strict isolation. This highly localized memorization likely makes the unlearning process structurally easier and more thorough. Furthermore, because this memorization is isolated and lacks shared latent heuristics among the data points, performing RTT cannot propagate updates that would affect or recover the evaluation on set V.