LESSWRONG
LW

3543
Yuntai Bao
0010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Jesse Hoogland's Shortform
Yuntai Bao2d10

Great work! Really excited about how BIF scales to billion-level models with linear time & memory complexities.

Also, this seems reminiscent of a recent work, d-TDA (Distributional Training Data Attribution); both BIF and d-TDA take explicit account for stochasticity of the training dynamics.

P.S.: Keen to see more quantitative results on NLP tasks :)

Reply