Kernel of something that might inspire someone else who knows more than I.
Assuming weights that have “grokked” a task are more interpretable, is there use in modifying loss functions to increase grokking likelihood? Perhaps by making it path dependent on the updates of the weights themselves?
I think it'd be good to add an endnote mentioning that while saving for your FIRE number is relatively straightforward, withdrawing it can be much more complicated. I know my intuitions didn't serve me at all when thinking about that phase. https://earlyretirementnow.com/2018/06/27/ten-things-the-makers-of-the-4-rule-dont-want-you-to-know/