Cool work! Related: https://arxiv.org/abs/2509.24884 -- they look at PT & IT checkpoints accuracy and attention patterns when adding filler tokens.
Cool work! Related: https://arxiv.org/abs/2509.24884 -- they look at PT & IT checkpoints accuracy and attention patterns when adding filler tokens.