In my opinion, this is connected with Sturgeon's Law. I'd guess that to expert pianists and piano tuners, 90% of pianos sound out of tune. I know among hardcore software engineers, a common lament is that almost all software sucks. Windows is almost unbearable to me, but I'm sure most desktop users are happy with it. Most desktop users are not programmers.
90% of all things may be crap to the discerning eye, but the world remains ok with that because each person has only a handful of places where they care to discern.
it’s clear that more exploration is the way to go
I think there is nuance here. https://mindingourway.com/dive-in-2/ offers a good alternative perspective.
You should also set
model.cfg.normalization_type = None
afterwards. It's mostly a formality since you're doing it after initialization.ActivationCache.apply_ln_to_stack()
is the only function I found which behaves incorrectly if you don't change this.