TL;DR I keep a fancy journal and was wondering what people know about what techniques people use to navel-gaze their writings.

I keep a journal on my computer. I put ideas, writing, and diary entries in it. Basically, anything I write is put in the journal. Every entry is timestamped, has hashtags, points to the entry that generated it, and is separated by a newlines and '---' strings. This allows for a non-linear structure to emerge with time. I can also cite older entries which allows me to reuse older entries. For example, I might start a branch of thinking on 'bananas' and then cite those entries later in writing about 'genetic modification' even though they might inhabit different areas of the journal.

Overall, this is roughly analogous to the Zettelkasten Method that's been discussed here previously. The difference is that I use homebrew scripts together with a markdown processor to automate management of tedious details such as timestamps, parent pointer, and organization.

Once I got over 100k words I started to realize that I needed to partition the journal. This will make a lot more sense when I present a visualization of the journal. The data structure being used is a tree so it was most natural to use a phylogenetic tree (older is cooler in color)

phylogenetic tree

Using this model it became apparent that I could approximate the growth probabilities fairly well by looking at how recent the entry was and how close it was to the leaves of the tree. Using similar logic, I was also able to cut the journal into significant branches. This allowed to me to add entries to a smaller journal with about ~50k words making rendering easier on my laptop. All the changes are synchronized with the main journal still, I can just work on branches directly instead of loading the entire journal every-time I want to make a change.

Up to this point, this analysis has spurred mostly out of necessity. When I experimentally switched from the linear format to the non-linear format I quickly realized that it was too good to ever go back. However, it's not practical to maintain a single journal over long periods of time. Now that this problem has been mostly solved, I've had some wilder ideas about things I could do. I wonder what other techniques people use to analyze their own journaling/diary. Is there a resource on how to do journal analysis or am I stuck copy/pasting techniques from the NLP and citation network people? Thanks!


New Answer
Ask Related Question
New Comment

2 Answers sorted by

You might look into Topic Modeling, or Topological Data Analysis. The basic idea is to build a database of entries and lists of words they contain, then run the data through a machine learning algorithm, which groups the entries into "topics", and generate a page for each topic listing the entries that belong to the topic. Then you can add a toolbar to the bottom of each entry containing lists to all the topics that entry belongs to.

The algorithms have been reduced to black boxes, and there are tutorials for the black boxes. The difficult part is preparing the data. I've been wanting to do something like this for a while. I use Zim, a programmable desktop wiki. My problem is that my pages are full of markup, some of it generated programmatically, in order to make the wiki easier for me to use. All of the markup has to be removed before feeding the data into the black box.

Sorry, a quick question: linear means something like:

  • Tuesday: I bough bananas and strawberries
  • Wednesday: Bananas are good

while non-linear means

  • Bananas:
    • I bought them (Tuesday)
    • They are good (Wednesday)
  • Strawberries
    • I bought them (Tuesday)


Yeah, linear means that you fill in the journal so that new entries always go at the end (new page). Entries are appended to a list of prior entries.

In a way that you never "go back" and edit the "immutable" previous writeups, right?
1Zachary Robertson3y
Well, the ordering refers to where the entry is. It’s possible to make edits after the fact. For instance, I correct typos whenever I see them. However, I don’t ‘delete’ entries.