I agree that this is a very important area of research. In fact, I work on this problem myself.
Some points:
I didn't get from the paper alone what $I$ refers to. Maybe a quick definition in the paper would be nice.
I think it would be good to compare against the Vaccine algorithm from Huang et al. ("Vaccine: Perturbation-aware alignment for large language model") since they are essentially trying to solve the same problem. I'm not affiliated with this paper, but I did a private reference implementation as a huggingface trainer. Lmk if you are interested and
I agree that this is a very important area of research. In fact, I work on this problem myself.
Some points:
- I didn't get from the paper alone what $I$ refers to. Maybe a quick definition in the paper would be nice.
- I think it would be good to compare against the Vaccine algorithm from Huang et al. ("Vaccine: Perturbation-aware alignment for large language model") since they are essentially trying to solve the same problem. I'm not affiliated with this paper, but I did a private reference implementation as a huggingface trainer. Lmk if you are interested and
... (read more)