Kernel of something that might inspire someone else who knows more than I.

Assuming weights that have “grokked” a task are more interpretable, is there use in modifying loss functions to increase grokking likelihood? Perhaps by making it path dependent on the updates of the weights themselves?

I think it'd be good to add an endnote mentioning that while saving for your FIRE number is relatively straightforward, withdrawing it can be much more complicated. I know my intuitions didn't serve me at all when thinking about that phase.