We use a mix of direct instruction, lots of online resources that we manage ourselves, and 1-on-1 tutors via Zoom through the (excellent) startup Modulo. I spent a large amount of time in the first 6 months - 1 year when we started (back during the pandemic) establishing norms and routines around scheduling and patterns that I hoped would lead to him becoming eventually very self-directed. Which did in fact work. That was intensive but now in the steady state the time cost is low.
I'm basically spending no time on preparation per se, but there is a time cost to supervision. We both work full-time and take turns managing him during the day (he's 9), which means making sure he's making it to his online classes and paying attention to his schedule, taking him outdoors for visits to museums etc. Most of the time he's working on projects that he's passionate about and doesn't need me except when he gets stuck. He spends a lot of time building levels (for puzzle games or shooters, particularly) and teaching himself tools using YouTube videos and a lot of GPT/Claude.
We know some home-schooling kids with pretty fine-grained schedules, ours is more like a few scheduled things (e.g. online classes) and then big blocks of time where we trust him to do whatever he's interested in that day.
I do directly instruct him in math and coding.
Right, we homeschool our son because he seems more alive this way
This is the first time I wrote something on LW that I consider to be serious, in that it explored genuinely new ideas in technical depth. I'm pretty happy with how it turned out.
I write a lot of hand-written notes that, years later, become papers. People who are around me know about this habit. This post started as such a hand-written note that I put together in a few hours, and would have likely stayed that way if not for the outlet of LW. The paper this became is "Programs as singularities" (PAS). The treatment there is much better than the (elementary but somewhat gross) calculations here, but it is also 90 pages long and came out more than a year later.
I think the idea of structural Bayesianism being hinted at here is correct and important, and is conceptually the foundation for how we think about interpretability at Timaeus. Its role in providing foundations for talking about the structure of agents is just starting to become visible, Dalcy Ku has some nice recent shortform about their work and Timaeus will have work on SLT in the setting of RL coming out soon, as well as more throughout 2026.
Was it worth making this post, vs just waiting to share the ideas in the paper? I'm not sure. Plausibly some people saw it here who wouldn't otherwise have engaged with the material (PAS is probably a bit intimidating). This post interprets that material more in an alignment setting and draws connections e.g. to RL that we didn't do in the paper. I think posting works-in-progress like this runs a risk of incentivising flag planting and rewarding people psychologically for half-finished things (which then never get finished, because "that's done" and nobody has the incentive to do it properly). I thought at the time that this material was "weird" enough that this risk was marginal, as it has turned out to be.
I notice I haven't done something like this again since March 2024, however.
Love the shoutout to Thom :)
Right on both counts!
There's a certain point where commutative algebra outgrows arguments that are phrased purely in terms of ideals (e.g. at some point in Matsumura the proofs stop being about ideals and elements and start being about long exact sequences and Ext, Tor). Once you get to that point, and even further to modern commutative algebra which is often about derived categories (I spent some years embedded in this community), I find that I'm essentially using a transplanted intuition from that "old world" but now phrased in terms of diagrams in derived categories.
E.g. a lot of Atiyah and Macdonald style arguments just reappear as e..g arguments about how to use the residue field to construct bounded complexes of finitely generated modules in the derived category of a local ring. Reconstructing that intuition in the derived category is part of making sense of the otherwise gun-metal machinery of homological algebra.
Ultimately I don't see it as different, but the "externalised" view is the one that plugs into homological algebra and therefore, ultimately, wins.
(Edit: saw Simon's reply after writing this, yeah agree!)
Yeah it's a nice metaphor. And just as the most important thing in a play is who dies and how, so too we can consider any element as a module homomorphism and consider the kernel which is called the annihilator (great name). Then factors as where the second map is injective, and so in some sense is "made up" of all sorts of quotients where varies over annihilators of elements.
There was a period where the structure of rings was studied more through the theory of ideals (historically this as in turn motivated by the idea of an "ideal" number) but through ideas like the above you can see the theory of modules as a kind of "externalisation" of this structure which in various ways makes it easier to think about. One manifestation of this I fell in love with (actually this was my entrypoint into all this since my honours supervisor was an old-school ring theorist and gave me Stenstrom to read) is in torsion theory.
One of my son's most vivid memories of the last few years (and which he talks about pretty often) is playing laser tag at Wytham Abbey, a cultural practice I believe instituted by John and which was awesome, so there is a literal five-year-old (well seven-year-old at the time) who endorses this message!
Makes sense to me, thanks for the clarifications.
I found working through the details of this very informative. For what it's worth, I'll share here a comment I made internally at Timaeus about it, which is that in some ways this factorisation into and reminds me of the factorisation into the map from a model to its capability vector (this being the analogue of ) and the map from capability vectors to downstream metrics (this being the analogue of ) in Ruan et al's observational scaling laws paper.
In your case the output metrics have an interesting twist, in that you don't want to just predict performance but also in some sense variations of performance within a certain class (by e.g. varying the prompt), so it's some kind of "stable" latent space of capabilities that you're constructing.
Anyway, factoring the prediction of downstream performance/capabilities through some kind of latent space object in your case, or latent spaces of capabilities in Ruan et al's case, seems like a principled way of thinking about the kind of object we want to put at the center of interpretability.
As an entertaining aside: as an algebraic geometer the proliferation of 's i.e. "interpretability objects" between models and downstream performance metrics reminds me of the proliferation of cohomology theories and the search for "motives" to unify them. That is basically interpretability for schemes!
Certainly simpler to police if you have a clear rule for in vs out