LESSWRONG
LW

Note-TakingProductivityPracticalWorld ModelingRationality
Frontpage

26

Is Building Good Note-Taking Software an AGI-Complete Problem?

by Thane Ruthenis
26th May 2025
8 min read
13

26

Note-TakingProductivityPracticalWorld ModelingRationality
Frontpage

26

Is Building Good Note-Taking Software an AGI-Complete Problem?
15gwern
6Mo Putera
46gwern
12TsviBT
5DirectedEvolution
2Garrett Baker
4Viliam
3Paulo Ferreira
3Yaroslav Granowski
3Thane Ruthenis
3Thane Ruthenis
4Valdes
3Trevor Hill-Hand
New Comment
13 comments, sorted by
top scoring
Click to highlight new comments since: Today at 3:19 PM
[-]gwern3mo153

My earlier commentary on what I think note-taking tools tend to get wrong: https://gwern.net/blog/2024/tools-for-thought-failure

Reply
[-]Mo Putera3mo63

Have you by any chance gotten further along on your Nenex idea, or know of anyone online who's gone somewhat in that direction far enough to be interesting? To be fair the Nenex features you listed are pretty extensive so I doubt anyone's gone all that far, which is a bummer since

What I want is to animate my dead corpus so it can learn & think & write.

is a seductive vision that feels like it should be a lot closer today than it actually is.

Reply
[-]gwern3mo460

I have not done any work directly on it. The LLMs have kept improving so rapidly since then, especially at coding, that it has not seemed like a good idea to work on it.

Instead, I've been thinking more about how to use LLMs for creative writing or personalization (cf. my Dwarkesh Patel interview, "You should write more online"). To review the past year or two of my writings:

  • So for example, my meta-learning LLM interviewing proposal is about how to teach a LLM to ask you useful questions about your psychology so it can better understand & personalize (based on my observations that LLMs can now plan interviews by thinking about possible responses and selecting interesting questions, as a variant of my earlier "creativity meta-prompt" idea/hierarchical longform training); "Quantifying Truesight With SAEs" is an offline version about distilling down 'authors' to allow examination and imitation. And my draft theory of mathematicians essay is about the meta-RL view of math research suggesting that 'taste' reduces down to a relatively few parameters which are learned blackbox style as a bilevel optimization problem and that may be how we can create 'LLM creative communities' (eg. to extract out small sets of prompts/parameters which all run on a 'single' LLM for feedback as personas or to guide deep search on a prompt).

  • My "Manual of Style" is an experiment in whether you can iteratively, by asking a LLM to read your writings, extract out an explicit manual of style about how to 'write like you'

    It includes a new denoising/backtranslation prompt-engineering trick I am currently calling "anti-examples" where you have the LLM make editing suggestions (which turn it into ChatGPTese) and then you reverse that to fix the chatbot prior*.

    So given how gargantuan context windows have become, and the existence of prompt caching, I think one may be able to write a general writing prompt, which includes a full MoS, a lot of anti-examples for several domains, some sample Q&As (optimized for information gain), instructions for how to systematically generate ideas, and start getting a truly powerful chatbot assistant persona with the scaled-up base models like GPT-5 which should start landing this year.

  • "Virtual comments" is another stab at thinking about how 'LLM writing support' can work, as well as reinventing the idea of 'seriation', and better semantic search via tree-shaped embeddings for both LLM & human writers (and the failed experiment with E-positive).

  • "Towards better RSS feeds" is about an alternative to Nenex commands: can you reframe writing as a sequence of atomic snippets which the LLM rewrites at various levels of abstraction/detail, which enables reading at those same levels, rather than locking people into a single level of detail, which inevitably suits few?

  • "October The First Is Too Late", "Bell, Crow, Moon: 11 Poetic Variations", "Area Man Outraged AI Has Not Solved Everything Yet", "Human Cannibalism Alignment Chart"/"Hacking Pinball High Scores", "Parliament of Rag & Bone", "A Christmas Protestation", "Second Life Sentences", "On the Impossibility of Superintelligent Rubik’s Cube Solvers" were tests of how useful the LLMs are for iterative variation and selection using a 'brainstorm' generate-rank-select prompt and/or for hierarchical generation; they finally seem at the point where you can curate good stuff out of them and are genuinely starting to become useful for my nonfiction essays like "'you could have invented Transformers' tutorial"/"Cats As Horror Movie Villains"/typesetting HTML fractions/Rock-Paper-Scissors optimality (and demonstrate my views on acceptable use of generative media).

  • "Adding Bits Beats AI Slop" is about my observations about how this kind of intensive search + personalization seems critical to taking generative model outputs from mediocre slop to genuinely good.

  • "LLM Challenge: Write Non-Biblical Sentences" is an observation that for creativity, "big model smell" may be hard to beat, and you may just need large LLMs for high-end intellectual work, so one should beware false economies; similarly, "Towards Benchmarking LLM Diversity & Creativity" is about avoiding the LLMs getting ever worse for search purposes (mode-collapsed small models being a danger for Nenex uses - they are the ones that will be easy and tempting to run, but will hamstring you, and you have to go into it with eyes open).

  • "AI Cannibalism Can Be Good" is a quick explainer to try to overcome the intuition that there are no gains from 'feeding AI inputs back into AI' - if you don't understand how this can be a good thing or why it's not a perpetual motion machine, much of the foregoing will seem like nonsense or built on sand.

Obviously, I've also been doing a lot of regular writing, and working on the Gwern.net website infrastructure - adding the 'blog' feature has been particularly important, but just getting the small details right on things like "October The First" takes up plenty of time. But the overall through-line is, "how can we start getting meaningful creative work out of LLMs, rather than sleepwalking into the buzzsaw of superhuman coders creating Disneyland-without-children where all the esthetics is just RLHF'd AI slop?"

* This seems particularly useful for fiction. I'm working on a write up of an example with a Robin Sloan microfic where the LLM suggestions get better if you negate them, and particularly if you order them to think about why the suggestions were bad and what that implies before they make any new suggestions - which suggests, in conjunction with the success of the 'brainstorm' prompt, that a major failing of LLMs right now is just that they tend to treat corrections/feedback/suggestions in a 'superficial' manner because the reasoning-mode doesn't kick in when it should. Interestingly, 'superficial' learning may be why dynamic-evaluation/finetuning seems to underperform: https://arxiv.org/abs/2505.01812 https://arxiv.org/abs/2505.00661#google Because adding paraphrases or Q&A to the finetuning data, although it cannot add any new information, improves performance; reminiscent of engrams/traces in human memory - you can have memorized things, but not be able to recall them, if there aren't enough 'paths' to a memory.

Reply811
[-]TsviBT3mo123
  1. All note-taking systems hitherto have failed for a simple reason: they do not ask thinking what it needs. (I think you appreciate this already, just restating.) https://www.lesswrong.com/posts/CoqFpaorNHsWxRzvz/what-comes-after-roam-s-renaissance?commentId=CNK44LqKyh2EQZpJm
  2. I agree with your point that we should be looking to the human as the starting point. In fact, I think this means we should be asking for MENTAL tools for thinking FIRST. Maybe possibly LATER we could use software to help thinking, if it asks for it.
  3. The mental tool we should be looking at is LEXICOGENESIS. I've written about this at length: "The possible shared Craft of deliberate Lexicogenesis" But to summarize for this context: if we create resources that improve our lexicogenetic abilities (such as a more productive morphemicon, augmented grammars, augmented notation, or skill with making words (thinking of metaphors, using the morphemicon, clarifying / factoring ideas to put words to, etc.)), then we will be better able to think at the edge in the language of thinking at the edge.
Reply1
[-]DirectedEvolution3mo50

I've been using Zotero and Obsidian for a while.

Zotero has a few nice features:

  • Hard-links, where the same document can be placed into multiple files.
  • Ease of syncing between devices
  • The "save to Zotero" extension in Chrome
  • Text and area annotations on the sidebar, which you can edit, comment, tag, and click to jump to the annotated section.
  • The ability to create new, editable notes by extracting annotations from one or more documents with the "create note from annotations" feature.
    • The major limitation here is that there's no way to selectively search and extract annotations based on tags.
  • Also, the difference between a "tag," "annotation," "note" and "document" feels messy. Other Zotero users have complained that there's no built-in way to delete tags (you can edit them manually in the sqlite3 database, but that's beyond most users). Airtable does a better job with tags, giving auto-suggestions and a place to edit or delete existing tags. The ability to define "tag sets" that apply across documents that you can toggle between, combine, etc would be much better.
  • I wish Zotero had the ability to alter the main view of documents, or offered some sort of hover preview of the document. Title/creator/year usually doesn't capture well why I care about that document or why I added it to a particular file. Zotero thinks of itself as a tool to track citations. I want it to be a tool to create custom views of my document collection. As it is, I get overwhelmed and confused by folders that contain a lot of documents.

Obsidian's ability to create links between documents and its canvas feature are nice, but I rarely use them in practice. I think this is for a couple reasons:

  • Obsidian doesn't allow me to add the same document to multiple files, so I have to put a lot of thought into choosing which file it will live in. Then if I want to find it later, I have to hunt for it. This makes me very conservative about creating new notes, which is exactly the opposite of what a link-friendly note system ought to do.
  • Often, I don't want to click through to another document to see a note. I want information all in one place. But Obsidian doesn't let me embed notes, only link to them.

Also, I straight up don't care at all about the graph view, I think it's a nonsense way to view your notebase.

What I get the most value from in Obsidian are:

  • Its markdown interface with seamless support for LaTeX and monospaced code blocks
  • Ease of syncing between devices
  • The traditional file structure (when Google Drive shifted to an AI-sorted "home" menu as its landing page, I hated it)
  • The "outline" view, which transforms different header levels into a clickable table of contents.
  • To some extent, the ability to collapse and expand bullet points - I'd like more of this.

I wish that Obsidian had code cells, like Jupyter or Collab notebooks. There's a community extension but it is not seamless, and I haven't managed to get it to work yet. The ability to construct images and diagrams within the application would be very nice - perhaps support for embedded Mermaid charts?

Personally, I see the role of LLMs in this setting as primarily a search tool. "Find all the documents containing figures with single base pair resolution DNA methylation information" is a question I'd love to be able to pose to my notebase. I don't want the LLM to construct the note for me. The point of the notebase is to have the human gain an understanding of the notebase, and having the LLM construct notes isn't very helpful for that.

Based on these observations, it seems to me that text and areas of documents are the "raw material" of a notebase. The priority should be on tools that make it easier to define, select, morph the visibility of, and remix units of raw material into new documents. I think that Git has shown it's not particularly important to minimize redundancy in this type of setting - just create new copies of the material and let users build on them.

On a more abstract level, this is about minimizing the need to commit early to a particular ontology and making it much easier to rapidly define and edit lightweight ontologies. The note taking software should enable you to rapidly construct new views of the raw material in your "note lake." It should enable some sort of "lineage tracing" that lets you reconstruct where, originally, the material in a particular document came from - if you create views of views of views, you should be able to click back through until you reach bottom.

Reply
[-]Garrett Baker3mo20

Obsidian doesn't allow me to add the same document to multiple files, so I have to put a lot of thought into choosing which file it will live in. Then if I want to find it later, I have to hunt for it. This makes me very conservative about creating new notes, which is exactly the opposite of what a link-friendly note system ought to do.

I haven’t had this problem, and usually use the image-wiki syntax for this, that is ![[file-name.pdf.png.html]]

[This comment is no longer endorsed by its author]Reply
[-]Viliam3mo41

(I am not familiar with the recent software solutions, so anything I write here may already be obvious, or even outdated.)

Ideally, all notes should be in one database. As opposed to having multiple databases, e.g. one for school, one for personal things, one for a diary, etc. That's because whatever classification you will make up, later you will find examples in the gray area. And even if not, cross references are often useful, e.g. if you mention something you learned at school in your diary, and the links would be difficult to maintain in multiple databases.

If you intend to share your notes somehow, or collaborate with others, there still should be one big database, except that some nodes are flagged as "public" or "shared (with whom)". New nodes are private by default, to prevent forgetting to set up privacy; or maybe private and public nodes should have dramatically different design (e.g. different background color).

When the database gets large enough, it is impossible to rewrite large parts of it when your ontology changes. Some of it may never get rewritten, as there may be too much outdated context and more important things to do. It would probably be good to clearly mark some nodes as "outdated" or otherwise in a serious need of rewrite. So that maybe you may decide to rewrite a specific page in the future when you link it from elsewhere.

The organization of content should be a mix of hierarchical and hyperlinked. Sometimes the hierarchy is ambiguous: is "mass transit in Bay Area" a subnode of "mass transit" or a subnode of "Bay Area"? Sometimes the hierarchy is obvious: "notes from HP:MoR, Chapter 1" is obviously a subnode of "notes from HP:MoR" (even if it may be also linked from other places). So I imagine the ideal structure as a set of local hierarchies and individual nodes, connected by hyperlinks. There should be a simple way to see "what links here". (Categories could be implemented simply as pages which link everything that is in the category.) The hyperlinks should be implemented so that renaming the target page does not break them.

There should be a full-text search, smart enough to prioritize pages that contain the keyword in their title over pages than merely contain it somewhere in the text. E.g. looking for "less wrong" should return the node "Less Wrong" as a first result, even if there are thousand pages mentioning LW somewhere in the text.

There should be some history information, in the sense of "this page had last significant updates in 2003". Not sure about the exact algorithm for that. Intuitively, if I fix a typo in 2025, the date should not change. Perhaps the year should be set up when I create the page, and then updated whenever I explicitly click "this page is up-to-date as of now"? And maybe old pages are displayed in grey color in the tree?

I do not feel bad about the fundamental problem of information getting outdated per se. I feel like any problems related to that are actually problems with the editing interface, as the outdated information gets in the way or is difficult to find and update. With good interface, there should be no problem with adding any random stuff even if you will never need it again; it should not get in your way, but stay hidden somewhere under "my diary, May 2025".

(These are just my opinions. I do not know any software that would do all of the above, in the right way.)

Reply
[-]Paulo Ferreira3mo30

I've been playing around with a related concept in a less formal context (not scientific research).

Currently using Claude code on Obsidian vaults and asking it some if the same questions you've mentioned and reviewing changes using git.

A long way to go to fully avoid the pitfalls mentioned and context management remains a big challenge but I feel it's a workable first step.

Reply
[-]Yaroslav Granowski3mo30

This is pretty much what my research is about.

The main problem is in the limitation of common languages. While thinking, you generate multiple new concepts for which you have no words or notations. All you can do is to anchor them with some phrases or expressions that also kind of approximate the meaning. If you elaborated well enough, these expressions would evoke similar ideas in the mind of another person.

You could better represent your ideas if you were free to generate new identifiers. Just like what we are doing while programming. Predicate calculus could be a good fit. You easily can express basic arithmetic model this way, but if you tried to express your social-level ideas, you would likely feel totally confused because these ideas are very much depending on others, most of which we don’t have words for.

This is a very different paradigm of thinking and requires to be developed from scratch. So, I started my research in ergonomics of logic programming. While developing self-hosted platform for interactive logic programming, I hope to get fluent in expressing complicated theories and develop a good fundament to move to other fields of knowledge. Perhaps Quantum Physics will be the next. And maybe one day Become a Superintelligence Yourself

Reply1
[-]Thane Ruthenis3mo30

Paging @Raemon (this seems relevant to your interests) and @johnswentworth (if you have any thoughts regarding UI improvements or coping strategies).

Reply2
[-]Thane Ruthenis3mo30

1. A major question here is LLMs. Scattershot thoughts:

  1. Any notebase-refactoring tool would be much more useful if it's able to understand semantics, instead of operating off of coarse measures like blind text-clustering algorithms. LLMs seem like the tool for the job.
  2. LLMs are reportedly pretty good at understanding research papers and at reading comprehension in general, and those skillsets have a lot of overlap with this. They may do a good job here.
  3. LLMs have pretty bad research taste, and research taste is tightly entangled with the whole "choice of correct abstractions/ontologies" thing we want to get right here. They may not do a good job here.
  4. Plenty of research ideas are fragile Butterfly Ideas, so the process of refactoring your thinking on a topic often has to be "intimate". Injecting the opinions of some external person-like entity into it may be lethal. LLM contributions, if there are any, would likely need to be heavily sanitized/de-personalized/abstracted over.
  5. Plenty of research ideas are dream ideas/empty ideas. Having an LLM with proper context correctly shoot holes in them may save you months of wasted time.
  6. LLMs are sycophantic and sometimes good at persuasion, so they may encourage your delusions about your dream/empty ideas. Or just inject subtle misunderstandings into your ontology in an attempt to avoid upsetting you.

Overall, I expect consideration (1) dominates here, and mitigating (2)-(6) would require a fair bit of tinkering.

An approach that might work here is to define a bunch of useful high-level functions for manipulating natural-language notes (such as "redefine(X, Y) means 'redefine X in terms of Y'"), describe those to an LLM, then build an interface where interactions with the LLM are conducted exclusively through this abstraction layer. As in, there's a literal "redefine" button, you select notes X and Y, press the button, and get the redefined notes, without having to talk to the LLM at all.

Ideally, the LLM is also fine-tuned to do the task correctly, minimizing how much personal flourish it injects and enhancing its attention to detail.

Some illustrative examples of such functions:

  • "Redefine the set of notes N in terms of X."
  • "Fetch the list of phenomena in the set of notes N that {contradict}/{are unexplained given} the model described in the notes M."
  • "Check whether X and Y are isomorphic in the context of N."
  • "Find all information in the set of notes N relevant to the thought X."
  • "Check whether assertion X contradicts/breaks anything in the set of notes N."
  • "Merge notes X and Y, distilling the information redundant between them into a new node and creating 0-2 new nodes recording information unique to each."

2. "Mundane" UI ideas:

  • I think a 2D representation along the lines of Obsidian's canvas (the canvas, not the graph view) is a good start. By contrast, 1D (a text document) or "1.5D" (set of text documents separated into folders, creating a "tree-like" format) are too impoverished to help with reasoning about any nontrivial structure at all, and 3D (let alone higher) is hard to parse because we process visual information in 2D.
  • Some sort of native multi-level hierarchical representation would be nice. "These two concepts often interact/appear together" (a "horizontal" connection) and "these low-level concepts are how this higher-level concept is implemented" (a "vertical" connection) are qualitatively different types of connections – yet a "flat" graph representation forces you to mix them up.

    I. e., we prospectively want a "2.5D" representation: a partially ordered set of arbitrary 2D graphs, corresponding to systems living on different levels of abstraction. Equipped with native functionality for flexibly editing it.
  • History-keeping would be useful. Preferably with a good visualizer regarding how the notebase was transformed from A to B, with an easy ability to (partially) revert the changes.

None of those seem like high-enough-impact improvements, though.

Reply
[-]Valdes3mo40

I quite like some of your ideas about how to design note-taking software and I think I have a partial answer to the issues you point to:

  • Make the repository of notes more semantic than usual, halfway between pure text notes and a structured database. If the notes exist in a graph database then it becomes possible to write database queries (probably with something like SPARQL).
  • Allow the user to store queries within notes so the query is run whenever the note is opened. Maybe allow for a graph visualization of the query. This way the semantic structure becomes a part of the normal way to interact with the "hub" notes.
  • The ontology used on a topic is defined by the structure of the links. If the user wants a new ontology they just create it on top of the old one and let the old one rot. No need to erase it and it does not distract from the new one (maybe there is some name reuse in the link/attribute names, but that's trivial to fix).
  • Use LLMs to correctly annotate notes with the semantic information that populates the database (links, tags, some attributes, ...) This way they don't need taste to know how to edit, they just follow the pattern that comes with a given ontology. Perhaps this requires the user to create templates.
  • Provide a good CLI (Command Line Interface) for the software. This way the user can write scripts to update their notes using any language. It becomes easy to apply the new ontology to the old notes, possibly fully automagically but most of the time there is probably a bit of manual work left afterward. That's fine.
  • (unrelated to your post) let the GUI be a VScode extension so the user can use whatever other existing extension they like, this way there is no need to also design a good text editor.

I sent you a DM to talk more about this.

Reply
[-]Trevor Hill-Hand3mo30

It's quite barebones, and just more of a silly sketch of a project rather than serious project, but a lot of these ideas are what I'm thinking about within https://github.com/Jadael/Library-of-Aletheia - some way to systematically and reliably apply library science style ministerial operations, sometimes LLM powered and sometimes traditional scripts, to a collection of information, all in an Obsidian-canvas-esque "web of knowledge" framing and a visual UI.

I stopped working it on it because the LLM extension I was using in Godot wasn't being updated, and couldn't go above a hardcoded context window of 512 which limited the ability to do anything interesting, but perhaps it's possible to update it t just use Ollama or something nowadays 🤔

Reply
Moderation Log
More from Thane Ruthenis
View more
Curated and popular this week
13Comments

In my experience, the most annoyingly unpleasant part of research[1] is reorganizing my notes during and (especially) after a productive research sprint. The "distillation" stage, in Neel Nanda's categorization. I end up with a large pile of variously important discoveries, promising threads, and connections, and the task is to then "refactor" that pile into something compact and well-organized, structured in the image of my newly improved model of the domain of study.

That task is of central importance:

  1. It's a vital part of the actual research process. If you're trying to discover the true simple laws/common principles underlying the domain, periodically refactoring your mental model of that domain in light of new information is precisely what you should be doing. Reorganizing your notes forces you to do just that: distilling a mess into elegant descriptions.
  2. It allows you to get a bird's-eye view on your results, what they imply and don't imply, what open questions are the most important to focus on next, what nagging doubts you have, what important research threads or contradictions might've ended up noted down but then forgotten, et cetera.
  3. It does most of the work of transforming your results into a format ready for consumption by other people.

 

A Toy Example

Suppose you're studying the properties of matter, and your initial ontology is that everything is some combination of Fire, Water, Air, and Earth. Your initial notes are structured accordingly: there are central notes for each element, branching off from them are notes about interactions between combinations of elements, case studies of specific experiments, attempts to synthesize and generalize experimental results, et cetera.

Suppose that you then discover that a "truer", simpler description of matter involves classifying it along two axes: "wet-dry" and "hot-cold". The Fire/Water/Air/Earth elements are still relevant, revealed to be extreme types of matter sitting in the corners of the wet-dry/hot-cold square. But they're no longer fundamental to how you model matter.

Now you need to refactor your entire mental ontology – and your entire notebase. You need to add new nodes for the wetness/temperature spectra, you need to wholly rewrite the notes about the elements to explicate their nature as extreme states of matter (rather than its basic building blocks), you need to do the same for all notes about elemental interactions, you need to re-interpret the experimental results, and you need to ensure you don't overlook any subtle evidence contradicting the new ontology, or stray thoughts that might lead to an insight regarding an even-more-correct ontology, or absent-minded ideas about high-impact applications and research avenues...

At a sufficiently abstract level, what you should do is: fetch all information relevant to the new ontology from your old-ontology notes, use that information to properly flesh the new ontology out, then redefine the old ontology in the new ontology's terms.


This isn't only a frontier-researcher problem: similar happens whenever I'm studying a domain that's already well-explored. I start with a flawed model centered around incorrect variables. Gradually, as I learn more, my thinking re-organizes around truer central elements. Once enough changes have accumulated, over the course of months or years, my mental representation ends up having little in common with my initial one.

But the state of the corresponding notebase usually drags behind.

Refactoring your mental ontology is relatively easy: it just requires thinking, and the interface for navigating and editing your world-model is very rich and flexible. Friction costs of mental actions are nonzero, but low.

The same is not true for note-taking. Tools for it do a fairly poor job of accommodating the above functionality; even e. g. Obsidian's canvas. They impose a lot of additional friction, their interfaces and editing features aren't optimized for such at-scale refactors.

My impression is that a lot of people run into similar issues when trying to use notebases as "second brains".[2]

 

Why Not Just Start From Scratch?

Arguably, the solution is to just periodically start from scratch. Instead of trying to edit, you spin up a new notebase, writing directly from your updated world-model; the old notebase you delete.

I think this is very suboptimal, in two ways.

First, those outdated notebases do still hold a lot of value:

  • Human memory, especially working memory, is painfully limited. Having a notebase ensures that you don't forget any phenomena and connections you don't commonly encounter[3]; that you're reminded to properly propagate the updates downstream of the new ontology to all corners of your world-model.
  • The excitement of discovering an apparently better ontology might blind you to its issues. When thinking in its terms, you would, by definition, only be able to think about the phenomena that could be easily described in its terms. If there are any broad swathes of phenomena it fails at, or any subtle-but-crucial issues with interpreting past data through the new lens, you may end up unable to perceive them. A stable, externalized record of all of this forces you to properly think through it all.

Basically, "rewrite the notebase from scratch" has a lot of the same issues as "rewrite the codebase from scratch".

Second, even if you're taking a sufficiently wise "rewrite it from scratch" approach, where you're constantly reviewing your previous notebase to ensure you're not missing anything... That is a lot, a lot of work.

Work that coincidentally forces you to do useful conceptual thinking, yes. But a significant fraction of it is just drudgery forced on you by UI shortcomings.

What would the ideal interface for this be? Something that slashes the above friction costs. Something that allows to flexibly vary the representation of your knowledge – in terms of concepts you describe it via – while ensuring that all information (including subtle, forgotten, yet crucially important doubts) is retained.

 

Generalized Representation-Flipping

In a way, what would be ideal here is a generalization of my idea about an "exploratory medium for mathematics":

A big part of highly theoretical research is flipping between different representations of the problem: viewing it in terms of information theory, in terms of Bayesian probability, in terms of linear algebra; jumping from algebraic expressions to the visualizations of functions or to the nodes-and-edges graphs of the interactions between variables; et cetera.

The key reason behind it is that research heuristics bind to representations. E. g., suppose you're staring at some graph-theory problem. Certain problems of this type are isomorphic to linear-algebra problems, and they may be trivial in linear-algebra terms. But unless you actually project the problem into the linear-algebra ontology, you're not necessarily going to see the trivial solution when staring at the graph-theory representation. (Perhaps the obvious solution is to find the eigenvectors of the adjacency matrix of the graph – but when you're staring at a bunch of nodes connected by edges, that idea isn't obvious in that representation at all.)

This is a bit of a simplified example – the graph theory/linear algebra connection is well-known, so experienced mathematicians may be able to translate between those representations instinctively – but I hope it's illustrative.

As a different concrete example, consider John Wentworth's Bayes Net Algebra. This is essentially an interface for working with factorizations of joint probability distributions. The nodes-and-edges representation is more intuitive and easy to tinker with than the "formulas" representation, which means that having concrete rules for tinkering with graph representations without committing errors would significantly speed up how quickly you can reason through related math problems. Imagine if the derivation of such frameworks was automated: if you could set up a joint PD in terms of formulas, automatically project the setup into graph terms, start tinkering with it by dragging nodes and edges around, and get errors if and only if back-projecting the changed "graph" representation into the "formulas" representations results in a setup that's non-isomorphic to the initial one.

(See also this video, and the article linked above.)

A related challenge are refactors. E. g., suppose you're staring at some complicated algebraic expression with an infinite sum. It may be the case that a certain no-loss-of-generality change of variables would easily collapse that expression into a Fourier series, or make some Obscure Theorem #418152/Weird Trick #3475 trivially applicable. But unless you happen to be looking at the problem through those lens, you're not going to be able to spot it. (Especially if you don't know the Obscure Theorem #418152/Weird Trick #3475.)

It's plausible that the above two tasks is what 90% of math research consists of (the "normal-science" part of it), in terms of time expenditure. Flipping between representations in search of a representation-chain where every step is trivial.

Basically: You have some abstract construct which is "anchored down" by your notes/math. For any abstract construct, there's an infinite number of valid ways to anchor it. Some of those ways are better than others from the practical point of view: shorter, simpler to work with. What a good note-taking tool would allow is freely varying the form of your anchors under the constraint of fully preserving the abstract construct.

 

The Fundamental Problem

Mind, this isn't just a problem with note-taking. This sort of surface-level messiness convergently appears in any situation where we have a system gradually learning/adapting to an unfamiliar domain. Some examples:

  • A codebase that's gradually added to over the years, which ends up as a tall spaghetti tower in dire need of a refactor.
  • A population of organisms being mutated by evolution, resulting in spaghetti-code DNA structures and apparently messy biological dynamics... which nevertheless turn out to be elegant and simple, if you can find the right lens to look at them from.
  • A neural network being incrementally updated by gradient descent. It transforms into a massive black box... which still can, in principle, be translated into a simple symbolic-program form.
  • Law systems or bureaucratic regulations, which are gradually adjusted in response to social changes and legal loopholes, until they turn into Kafkaesque nightmares.

In all cases, we start with the description of a system in some initial representation/language/ontology, gradually refine the system, and end up with something that's effectively implemented on a different, "truer" ontology. But that high-level ontology isn't by-default visible, we don't get the "interpreter" for free, so what you end up seeing is an inefficient mess.

... Which, if we view it from that perspective, has depressing implications regarding any hope of building "good" note-taking software. None of the powerful processes above struggling with isomorphic problems (programmers, evolution, interpretability researchers, legislators, company managers) have managed to solve them. The only "solution" that ever works is to just have a competent human manually untangle the mess.

And indeed, if we think about what "lossless notebase refactors" would imply, it would imply fully intelligent edits. Not even something LLMs can really do: they would lose track of those subtle-but-crucial tidbits/doubts/thoughts I keep talking about.

So: it seems that a fully competent notes-editing software is AGI-complete.

 

Can the Problem Be Ameliorated?

Okay, so a full solution is beyond the scope of a notetaking app. Can the situation still be improved?

Intuitively, yes. Recall that we're not actually asking for fully automatic notebase refactors, we're looking to make manual human-guided refactors easier on the humans.

So: any ideas regarding how?

  • What UI elements would be helpful and ultimately implementable with the current technology? Both regarding how the notes are displayed, and how they can be edited.
  • Do you have any note-taking/research-logging strategies that lessen/fix this problem? (It would be very nice if all of the above turned out to just be a skill issue on my part.)
  • Are there any lessons from the domains of programming/biology/interpretability/government reform/company management that could be directly imported to this domain?

I've separated out my own thoughts into this comment.

  1. ^

    Especially pre-paradigmic research, such as in agent foundations.

  2. ^

    Source: Vague recollection of various discussions I've read, plus this brief attempt at a public-opinion review I just ran via o3.

  3. ^

    See the generalized correspondence principle: new ontology must explain every real phenomenon the previous ontology was able to explain.

    (And as far as keeping notes goes, you should also ideally preserve the explanation regarding what features of the new ontology made it look like the old ontology. "How and why does quantum physics consistently create the impression of classicality?")