Barcoding LLM Training Data Subsets. Anyone trying this for interpretability?
Main point: seeking information on this potential strategy for improving neural network interpretability, alignment, and reliability. Segmenting training data and injecting unique "barcoding" tokens into each data subset. These unique tokens should occasionally show up in generated outputs at a frequency correlated with the importance of each training subset to...