Great work! Really excited about how BIF scales to billion-level models with linear time & memory complexities.
Also, this seems reminiscent of a recent work, d-TDA (Distributional Training Data Attribution); both BIF and d-TDA take explicit account for stochasticity of the training dynamics.
P.S.: Keen to see more quantitative results on NLP tasks :)
Great work! Really excited about how BIF scales to billion-level models with linear time & memory complexities.
Also, this seems reminiscent of a recent work, d-TDA (Distributional Training Data Attribution); both BIF and d-TDA take explicit account for stochasticity of the training dynamics.
P.S.: Keen to see more quantitative results on NLP tasks :)