Yuntai Bao — LessWrong

Great work! Really excited about how BIF scales to billion-level models with linear time & memory complexities.

Also, this seems reminiscent of a recent work, d-TDA (Distributional Training Data Attribution); both BIF and d-TDA take explicit account for stochasticity of the training dynamics.

P.S.: Keen to see more quantitative results on NLP tasks :)

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments