Hey, I’m the first author of INLP and RLACE. The observation you point out to was highly surprising to us as well. The RLACE paper started as an attempt to prove the optimality of INLP, which turned out to be not true for classification problems. For classification, it’s just not true that the subspace that does not encode a concept is the orthogonal complement of the subspace that encodes the concept most saliently (the subspace spanned by the classifier’s parameter vector). In RLACE we do prove that this property holds for certain objectives: e.g if you ...
- Actually, RLACE has a lesser impact on the representation space, since it removes just a rank-1 subspace.
- Note that if we train a linear classifier w to convergence (as done in the first iteration of INLP), then by definition we can project the entire representation space over the direction w and retain the very same accuracy -- because that subspace that is spanned by w is the only thing the linear classifier is "looking at". We performed experiments similar in spirit to what you suggest with INLP in [this](https://arxiv.org/pdf/2105.06965.pdf) paper. In t
... (read more)