Category-Theoretic Wanderings into Interpretability

unruly abstractions

19 Category-Theoretic Wanderings into Interpretability

by unruly abstractions

2nd Sep 2025

1 min read

2

19

This is a linkpost for https://www.unrulyabstractions.com/pdfs/wanderings.pdf

I have realized I want to contribute to AI Safety in any way I can. I am currently focused on interpretability, trying to make sense of research out there, orienting myself, looking for other new ravers^[1], and ultimately learning to enact its technics^[2]. In that process, I am writing a paper. A sort of queer paper. I keep updating it as I think more about it. You can read the latest version here:

FULL PAPER AT UNRULYABSTRACTIONS.COM

I will use category theory to investigate what interpretability is. Think of this formalism as much closer to a language than a theory^[3]. I use notation and symbols to interrogate how things could come together. I am doing some scribbles and asking you, do you also feel it's something like that?

Hope you enjoy wandering with me.

^{^}
Ravers are those who need to rave. Raves are practices. What if practices (like this one, which I am currently feeling my way in) are also raves?
^{^}
Technics are a general category for all making and all practices
^{^}
A judgement but not a proposition

AI PsychologyAnthropic (org)Category theoryInterpretability (ML & AI)Language Models (LLMs)Logic & Mathematics AI

Frontpage

19

New Comment

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 8:06 AM

[-]Trevor Hill-Hand9mo32

I enjoyed reading the paper but did not find the screenshots here in the post a helpful addition; I think I would have just quoted the introduction, if converting it into a full article was infeasible.

It's also fun seeing other Eugenia Chang fans!

Reply

[-]unruly abstractions9mo*10

Good feedback!

I am still trying to figure out my workflow. I like writing on Typst, but I realized it's not very easy to go from Typst -> Less Wrong. Also, a lot of my writing is sorta experimental. I'm trying to determine which parts of my writing should be directed to which platforms/audiences.

I will make this a linkpost

And yes, Eugenia Chang is amazing :)

Reply

Moderation Log