386

LESSWRONG
LW

385
AI Alignment FieldbuildingSelf ImprovementAI
Personal Blog

3

TT Self Study Journal # 4

by TristanTrim
15th Aug 2025
5 min read
0

3

AI Alignment FieldbuildingSelf ImprovementAI
Personal Blog

3

New Comment
Moderation Log
More from TristanTrim
View more
Curated and popular this week
0Comments
Mentioned in
6TT Self Study Journal # 3

[Epistemic Status: This is an artifact of my self study. I am using it to remember links and help manage my focus. As such, I don't expect anyone to fully read it. If you have particular interest or expertise, skip to the relevant sections, and please leave a comment, even just to say "good work/good luck". I'm hoping for a feeling of accountability and would like input from peers and mentors. This may also help to serve as a guide for others who wish to study in a similar way to me. ]

Previous Entry: SSJ #3

Review of 3rd Sprint

My goals for this sprint were:

  • SSJ--1 -- Write
    • AIA Terminology Lit Review
    • Math in my AI Alignment Goals
  • SSJ--2 -- Read
    • Shallow Review of Technical AI Safety 2024
  • SSJ--3 -- Math
    • Do some Linear Algebra reading and practice.
  • SSJ--4 -- Experimentation
    • Go through Transformers From Scratch.
  • SSJ--5 -- Tooling
    • Do an informal literature review on MI Tooling and Data Visualization for High Dimensional Data.
    • Places to start for MI Tooling:
      • The Interpretability Toolkit
      • TransformerLens & Callum McDougall's guide for it.
      • Nostalgebraist’s transformer-utils library
      • Google PAIR’s Learning Interpretability Tool (LIT)
      • Google PAIR’s What-If Tool
      • Jesse Vig’s BERTViz
      • LOOM
      • CircuitVis
  • SSJ--6 -- Social
    • Email math profs after finished writing "Math in my AI Alignment Goals".
    • Consider other places to find potential mentors.
      • Consider what kinds of feedback I am looking for.
      • Should I reach out specific people for general advice on my SSJ or only once I have specific questions for them and their work?
      • Make some posts in various forums asking for people willing to review and comment on my SSJ.
    • I am also networking with the goal of finding future funding or paid roles. Consider strategies for that.

So how did I do?

Daily Worklog

DateProgress
Wd, July 23No progress. I did talk on the PauseAI discord a bit and think about OIS, but it's not really actively relevant. Alas.

Th, July 24

- We, Aug 13

No progress with SSJ goals while studying for final exam, but i did write two LW articles while procrastinating. Also went camping so no progress. But now I'm done with school and ready to fully focus on this : )
Th, Aug 14Spent about 5 hours thinking about the social media concept I'm now calling "maat" and wrote Tristan's Projects which I plan to keep as an updated index of the projects I'm interested in developing or contributing to.
Fr, Aug 15Spent 3 hours compiling an overview of my ndsp project.
Mo, Aug 18Spent 5h 36m applying to SPAR. There are a lot of cool researchers with a lot of cool projects there. I could have spent a lot more time reading about their work and answering their questions, but I want to move on.
Tu, Aug 19Spent an hour or two moving my maat ideas closer to a post. Also spent some time reading and considering "Agent foundations: not really math, not really science"... I'll probably write a post with some of my thoughts.
Aug - Oct

For the rest of August through September and October I got various combinations of too busy, distracted, and depressed to continue proper documentation. Within that time I:

  • Applied to several fellowships
  • Attended an EA Summit in Vancouver
  • Continued discussing and considering AIA
  • Recovered some standards of self care lost during my BSc

I continue to find it surprising how easy it is to lose large amounts of time and how difficult it is to treat a self managed project with the same seriousness of a full time job.

Sprint Summary

Overview

During the past months I neglected focus on any of my chosen goals in favour of ad-hoc applying to fellowships. This both that my goals are not in alignment with what I think is really important, but also that I have difficulty prioritizing overhead organization and documentation, which I think are valuable, so I would like to improve and get back to doing this journal work.

SSJ--1 -- Write

I still like the idea of doing an AIA Terminology Review. I am less convinced by the value of discussing Math in my AIA goals specifically, rather, multiple people have suggested I should focus on writing and communicating about my "Theory of Change" both focusing on the importance of my (potential) role and the importance of my agendas. I would also like to try putting together better article to briefly explain my OIS concept based on various conversations and thoughts, and finally write the MAAT article I was planning to write.

SSJ--2 -- Read

I started skimming this but didn't really engage and then got busy. I think it may be a good resource to be included in my AIA Terminology Review.

SSJ--3 -- Math

I didn't actually practice any linear algebra. It's much harder to do so when I'm not handing it in to a professor for marks. I did however pick up my copy of C.Kosniowski's "algebraic topology" which I find engaging. The concepts may or may not have value for my thinking on semantic spaces. I think I still have a shallow understanding of topology which I would like to deepen towards a solid understanding of manifolds.

SSJ--4 -- Go through Transformers From Scratch.

This still seems high value. I still haven't started.

SSJ--5 -- Literature review on MI Tooling and Etc...

I didn't focus on this. I'm torn between wanting to jump in and begin implementation work, however, a review beforehand is probably quite valuable.

SSJ--6 -- Social

The focuses in this section may have been of higher benefit than applying to the various fellowships I have applied to. Additionally, I often got an email about a fellowship deadline, worked on an application before the deadline and then submitted it and moved on to something else. This approach seems somewhat aimless and disorganized. I would like to be keeping a better track of which fellowships I have targeted with what levels of effort and also have a better sense of what fellowships are out there and what other options I can consider.

Goals for 4th Sprint

The Goals:

  • SSJ--1 -- Write
    • Make an article or doc to contain and organize articles I would like to write.
    • Theory of Change
    • OIS explainer
    • MAAT
    • AIA Terminology Review
  • SSJ--2 -- Read
    • Search and read various articles for AIA Terminology Review.
    • Spend some time reading and comment on one random LW article 4 days / week.
  • SSJ--3 -- Math
    • Low priority: Continue reading C.Kosniowski's "algebraic topology"
  • SSJ--4 -- Experimentation (copied from last sprint)
    • Go through Transformers From Scratch.
      • SERIOUSLY! Clock in some time on this!
  • SSJ--5 -- Tooling (copied from last sprint)
    • Do an informal literature review on MI Tooling and Data Visualization for High Dimensional Data.
    • Places to start for MI Tooling:
      • The Interpretability Toolkit
      • TransformerLens & Callum McDougall's guide for it.
      • Nostalgebraist’s transformer-utils library
      • Google PAIR’s Learning Interpretability Tool (LIT)
      • Google PAIR’s What-If Tool
      • Jesse Vig’s BERTViz
      • LOOM
      • CircuitVis
  • SSJ--6 -- Social
    • Develop my networking plan
      • Create a list of people I respect who may be worth reaching out to for mentorship or networking.
      • Research and reach out to people where possible and pragmatic.
      • Clarify the problems I am interested in focusing on and the capacity in which I am interested in focusing on them. (High overlap with SSJ--1 "Theory of Change" )

Well. The last sprint went of the rails. Somewhat disappointing but I am wishing myself luck getting back on track.


List of common acronyms:

  • Mechanistic Interpretability (MI)
  • AI Alignment (AIA)
  • Outcome Influencing System (OIS)
  • n-Dimensional Interactive Scatter Plot (NDISP)
  • Machine Learning (ML)
  • Large Language Model (LLM)