This is a special post for quick takes by nonveumann. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
1 comment, sorted by Click to highlight new comments since: Today at 4:29 PM

Kernel of something that might inspire someone else who knows more than I.

Assuming weights that have “grokked” a task are more interpretable, is there use in modifying loss functions to increase grokking likelihood? Perhaps by making it path dependent on the updates of the weights themselves?