Well, I can certainly emphasize with the feeing that compromising on a core part of your identity is threatening ;-)
More seriously, what you are describing as empathy seems to be asking the question:
"What if my mind was transported into their bodies?"
rather than
"What if I was (like) them, including all the relevant psychological and emotional factors?"
The latter question should lead feelings of disgust iff the target experiences feelings of disgust.
Of course, empathy is all the more difficult when the person you are trying to emphasize with is very different from you. Being an outlier can clearly make this harder. But unless you have never experienced any flavour of learned helplessness/procrastination/akrasia, you have the necessary ingredients to extrapolate.
Historically commutative algebra came out of algebraic number theory, and the rings involved - Z,Z_p, number rings, p-adic local rings... - are all (in the modern terminology) Dedekind domains.
Dedekind domains are not always principal, and this was the reason why mathematicians started studying ideals in the first place. However, the structure of finitely generated modules over Dedekind domains is still essentially determined by ideals (or rather fractional ideals), reflecting to some degree the fact that their geometry is simple (1-dim regular Noetherian domains).
This could explain why there was a period where ring theory developed around ideals but the need for modules was not yet clarified?
Modules are just much more flexible than ideals. Two major advantages:
BTW the geometric perspective might sound abstract (and setting it up rigorously definitely is!) but it is many ways more concrete than the purely algebraic one. For instance, a quasicoherent sheaf is in first approximation a collection of vector spaces (over varying "residue fields") glued together in a nice way over the topological space Spec(R), and this clarifies a lot how and when questions about modules can be reduced to ordinary linear algebra over fields.
Some of my favourite topics in pure mathematics! Two quick general remarks:
There is another interesting connection between computation and bounded treewidth: the control flow graphs of programs written in languages "without goto instructions" have uniformly bounded treewidth (e.g. <7 for goto-free C programs). This is due to Thorup (1998):
https://www.sciencedirect.com/science/article/pii/S0890540197926973
Combined with graphs algorithms for bounded treewidth graphs, this has apparently been used in the analysis of compiler optimization and program verification problems, see the recent reference:
https://dl.acm.org/doi/abs/10.1145/3622807
which also proves a similar bound for pathwidth.
Nice!
I would add the following, which is implicit in the presentation: this phenomenon of real representations is not specific to finite groups. Real irreducible representations of a group are always neatly divided into three types: real, complex or quaternionic. This is [Schur\'s lemma](https\://ncatlab\.org/nlab/show/Schur\%27s\+lemma\#statement) together with the fact that the real division algebras are exactly R, C and the quaternions H.
(Should ML interpretability people care about infinite groups to begin with - unlike mathematicians, who love them all? For once, models as well as datasets can exhibit (exact or approximate) continuous symmetries, and these symmetries be understood mathematically as actions of matrix Lie groups such as the group GL_n of all invertible matrices or the group O_n of n-dimensional rotations. Sometimes these actions are linear, so are themselves representations, and sometimes they can be studied by linearizing them. Using representation theory to study more general geometric group actions is one of those great tricks of mathematics which reduce complicated problems to linear algebra.)
On 1., you should consider that, for people who don't know much about QFT and its relationship with SFT (like, say, me 18 months ago), it is not at all obvious that QFT can be applied beyond quantum systems!
In my case, the first time I read about "QFT for deep learning" I dismissed it automatically because I assumed it would involve some far-fetched analogies with quantum mechanics.
but in fact you can also understand the theory on a fine-grained level near an impurity by a more careful form of renormalization, where you view the nearest several impurities as discrete sources and only coarsegrain far-away impurities as statistical noise.
Where could I read about this?
Which formal properties of the KL-divergence do the proofs of your result use? It could be useful to make them all explicit to help generalize to other divergences or metrics between probability distributions.