Is Building Good Note-Taking Software an AGI-Complete Problem?

[-]gwern5mo153

My earlier commentary on what I think note-taking tools tend to get wrong: https://gwern.net/blog/2024/tools-for-thought-failure

[-]Mo Putera5mo63

Have you by any chance gotten further along on your Nenex idea, or know of anyone online who's gone somewhat in that direction far enough to be interesting? To be fair the Nenex features you listed are pretty extensive so I doubt anyone's gone all that far, which is a bummer since

What I want is to animate my dead corpus so it can learn & think & write.

is a seductive vision that feels like it should be a lot closer today than it actually is.

[-]gwern5mo460

I have not done any work directly on it. The LLMs have kept improving so rapidly since then, especially at coding, that it has not seemed like a good idea to work on it.

Instead, I've been thinking more about how to use LLMs for creative writing or personalization (cf. my Dwarkesh Patel interview, "You should write more online"). To review the past year or two of my writings:

So for example, my meta-learning LLM interviewing proposal is about how to teach a LLM to ask you useful questions about your psychology so it can better understand & personalize (based on my observations that LLMs can now plan interviews by thinking about possible responses and selecting interesting questions, as a variant of my earlier "creativity meta-prompt" idea/hierarchical longform training); "Quantifying Truesight With SAEs" is an offline version about distilling down 'authors' to allow examination and imitation. And my draft theory of mathematicians essay is about the meta-RL view of math research suggesting that 'taste' reduces down to a relatively few parameters which are learned blackbox style as a bilevel optimization problem and that may be how we can create 'LLM creative communities' (eg. to extract out small sets of prompts/parameters which all run on a 'single' LLM for feedback as personas or to guide deep search on a prompt).
My "Manual of Style" is an experiment in whether you can iteratively, by asking a LLM to read your writings, extract out an explicit manual of style about how to 'write like you'

It includes a new denoising/backtranslation prompt-engineering trick I am currently calling "anti-examples" where you have the LLM make editing suggestions (which turn it into ChatGPTese) and then you reverse that to fix the chatbot prior*.

So given how gargantuan context windows have become, and the existence of prompt caching, I think one may be able to write a general writing prompt, which includes a full MoS, a lot of anti-examples for several domains, some sample Q&As (optimized for information gain), instructions for how to systematically generate ideas, and start getting a truly powerful chatbot assistant persona with the scaled-up base models like GPT-5 which should start landing this year.
"Virtual comments" is another stab at thinking about how 'LLM writing support' can work, as well as reinventing the idea of 'seriation', and better semantic search via tree-shaped embeddings for both LLM & human writers (and the failed experiment with E-positive).
"Towards better RSS feeds" is about an alternative to Nenex commands: can you reframe writing as a sequence of atomic snippets which the LLM rewrites at various levels of abstraction/detail, which enables reading at those same levels, rather than locking people into a single level of detail, which inevitably suits few?
"October The First Is Too Late", "Bell, Crow, Moon: 11 Poetic Variations", "Area Man Outraged AI Has Not Solved Everything Yet", "Human Cannibalism Alignment Chart"/"Hacking Pinball High Scores", "Parliament of Rag & Bone", "A Christmas Protestation", "Second Life Sentences", "On the Impossibility of Superintelligent Rubik’s Cube Solvers" were tests of how useful the LLMs are for iterative variation and selection using a 'brainstorm' generate-rank-select prompt and/or for hierarchical generation; they finally seem at the point where you can curate good stuff out of them and are genuinely starting to become useful for my nonfiction essays like "'you could have invented Transformers' tutorial"/"Cats As Horror Movie Villains"/typesetting HTML fractions/Rock-Paper-Scissors optimality (and demonstrate my views on acceptable use of generative media).
"Adding Bits Beats AI Slop" is about my observations about how this kind of intensive search + personalization seems critical to taking generative model outputs from mediocre slop to genuinely good.
"LLM Challenge: Write Non-Biblical Sentences" is an observation that for creativity, "big model smell" may be hard to beat, and you may just need large LLMs for high-end intellectual work, so one should beware false economies; similarly, "Towards Benchmarking LLM Diversity & Creativity" is about avoiding the LLMs getting ever worse for search purposes (mode-collapsed small models being a danger for Nenex uses - they are the ones that will be easy and tempting to run, but will hamstring you, and you have to go into it with eyes open).
"AI Cannibalism Can Be Good" is a quick explainer to try to overcome the intuition that there are no gains from 'feeding AI inputs back into AI' - if you don't understand how this can be a good thing or why it's not a perpetual motion machine, much of the foregoing will seem like nonsense or built on sand.

Obviously, I've also been doing a lot of regular writing, and working on the Gwern.net website infrastructure - adding the 'blog' feature has been particularly important, but just getting the small details right on things like "October The First" takes up plenty of time. But the overall through-line is, "how can we start getting meaningful creative work out of LLMs, rather than sleepwalking into the buzzsaw of superhuman coders creating Disneyland-without-children where all the esthetics is just RLHF'd AI slop?"

* This seems particularly useful for fiction. I'm working on a write up of an example with a Robin Sloan microfic where the LLM suggestions get better if you negate them, and particularly if you order them to think about why the suggestions were bad and what that implies before they make any new suggestions - which suggests, in conjunction with the success of the 'brainstorm' prompt, that a major failing of LLMs right now is just that they tend to treat corrections/feedback/suggestions in a 'superficial' manner because the reasoning-mode doesn't kick in when it should. Interestingly, 'superficial' learning may be why dynamic-evaluation/finetuning seems to underperform: https://arxiv.org/abs/2505.01812 https://arxiv.org/abs/2505.00661#google Because adding paraphrases or Q&A to the finetuning data, although it cannot add any new information, improves performance; reminiscent of engrams/traces in human memory - you can have memorized things, but not be able to recall them, if there aren't enough 'paths' to a memory.

[-]TsviBT5mo123

All note-taking systems hitherto have failed for a simple reason: they do not ask thinking what it needs. (I think you appreciate this already, just restating.) https://www.lesswrong.com/posts/CoqFpaorNHsWxRzvz/what-comes-after-roam-s-renaissance?commentId=CNK44LqKyh2EQZpJm
I agree with your point that we should be looking to the human as the starting point. In fact, I think this means we should be asking for MENTAL tools for thinking FIRST. Maybe possibly LATER we could use software to help thinking, if it asks for it.
The mental tool we should be looking at is LEXICOGENESIS. I've written about this at length: "The possible shared Craft of deliberate Lexicogenesis" But to summarize for this context: if we create resources that improve our lexicogenetic abilities (such as a more productive morphemicon, augmented grammars, augmented notation, or skill with making words (thinking of metaphors, using the morphemicon, clarifying / factoring ideas to put words to, etc.)), then we will be better able to think at the edge in the language of thinking at the edge.

[-]DirectedEvolution5mo50

I've been using Zotero and Obsidian for a while.

Zotero has a few nice features:

Hard-links, where the same document can be placed into multiple files.
Ease of syncing between devices
The "save to Zotero" extension in Chrome
Text and area annotations on the sidebar, which you can edit, comment, tag, and click to jump to the annotated section.
The ability to create new, editable notes by extracting annotations from one or more documents with the "create note from annotations" feature.
- The major limitation here is that there's no way to selectively search and extract annotations based on tags.
Also, the difference between a "tag," "annotation," "note" and "document" feels messy. Other Zotero users have complained that there's no built-in way to delete tags (you can edit them manually in the sqlite3 database, but that's beyond most users). Airtable does a better job with tags, giving auto-suggestions and a place to edit or delete existing tags. The ability to define "tag sets" that apply across documents that you can toggle between, combine, etc would be much better.
I wish Zotero had the ability to alter the main view of documents, or offered some sort of hover preview of the document. Title/creator/year usually doesn't capture well why I care about that document or why I added it to a particular file. Zotero thinks of itself as a tool to track citations. I want it to be a tool to create custom views of my document collection. As it is, I get overwhelmed and confused by folders that contain a lot of documents.

Obsidian's ability to create links between documents and its canvas feature are nice, but I rarely use them in practice. I think this is for a couple reasons:

Obsidian doesn't allow me to add the same document to multiple files, so I have to put a lot of thought into choosing which file it will live in. Then if I want to find it later, I have to hunt for it. This makes me very conservative about creating new notes, which is exactly the opposite of what a link-friendly note system ought to do.
Often, I don't want to click through to another document to see a note. I want information all in one place. But Obsidian doesn't let me embed notes, only link to them.

Also, I straight up don't care at all about the graph view, I think it's a nonsense way to view your notebase.

What I get the most value from in Obsidian are:

Its markdown interface with seamless support for LaTeX and monospaced code blocks
Ease of syncing between devices
The traditional file structure (when Google Drive shifted to an AI-sorted "home" menu as its landing page, I hated it)
The "outline" view, which transforms different header levels into a clickable table of contents.
To some extent, the ability to collapse and expand bullet points - I'd like more of this.

I wish that Obsidian had code cells, like Jupyter or Collab notebooks. There's a community extension but it is not seamless, and I haven't managed to get it to work yet. The ability to construct images and diagrams within the application would be very nice - perhaps support for embedded Mermaid charts?

Personally, I see the role of LLMs in this setting as primarily a search tool. "Find all the documents containing figures with single base pair resolution DNA methylation information" is a question I'd love to be able to pose to my notebase. I don't want the LLM to construct the note for me. The point of the notebase is to have the human gain an understanding of the notebase, and having the LLM construct notes isn't very helpful for that.

Based on these observations, it seems to me that text and areas of documents are the "raw material" of a notebase. The priority should be on tools that make it easier to define, select, morph the visibility of, and remix units of raw material into new documents. I think that Git has shown it's not particularly important to minimize redundancy in this type of setting - just create new copies of the material and let users build on them.

On a more abstract level, this is about minimizing the need to commit early to a particular ontology and making it much easier to rapidly define and edit lightweight ontologies. The note taking software should enable you to rapidly construct new views of the raw material in your "note lake." It should enable some sort of "lineage tracing" that lets you reconstruct where, originally, the material in a particular document came from - if you create views of views of views, you should be able to click back through until you reach bottom.

[-]Garrett Baker5mo20

Obsidian doesn't allow me to add the same document to multiple files, so I have to put a lot of thought into choosing which file it will live in. Then if I want to find it later, I have to hunt for it. This makes me very conservative about creating new notes, which is exactly the opposite of what a link-friendly note system ought to do.

I haven’t had this problem, and usually use the image-wiki syntax for this, that is ![[file-name.pdf.png.html]]

[This comment is no longer endorsed by its author]Reply

[-]Viliam5mo41

(I am not familiar with the recent software solutions, so anything I write here may already be obvious, or even outdated.)

Ideally, all notes should be in one database. As opposed to having multiple databases, e.g. one for school, one for personal things, one for a diary, etc. That's because whatever classification you will make up, later you will find examples in the gray area. And even if not, cross references are often useful, e.g. if you mention something you learned at school in your diary, and the links would be difficult to maintain in multiple databases.

If you intend to share your notes somehow, or collaborate with others, there still should be one big database, except that some nodes are flagged as "public" or "shared (with whom)". New nodes are private by default, to prevent forgetting to set up privacy; or maybe private and public nodes should have dramatically different design (e.g. different background color).

When the database gets large enough, it is impossible to rewrite large parts of it when your ontology changes. Some of it may never get rewritten, as there may be too much outdated context and more important things to do. It would probably be good to clearly mark some nodes as "outdated" or otherwise in a serious need of rewrite. So that maybe you may decide to rewrite a specific page in the future when you link it from elsewhere.

The organization of content should be a mix of hierarchical and hyperlinked. Sometimes the hierarchy is ambiguous: is "mass transit in Bay Area" a subnode of "mass transit" or a subnode of "Bay Area"? Sometimes the hierarchy is obvious: "notes from HP:MoR, Chapter 1" is obviously a subnode of "notes from HP:MoR" (even if it may be also linked from other places). So I imagine the ideal structure as a set of local hierarchies and individual nodes, connected by hyperlinks. There should be a simple way to see "what links here". (Categories could be implemented simply as pages which link everything that is in the category.) The hyperlinks should be implemented so that renaming the target page does not break them.

There should be a full-text search, smart enough to prioritize pages that contain the keyword in their title over pages than merely contain it somewhere in the text. E.g. looking for "less wrong" should return the node "Less Wrong" as a first result, even if there are thousand pages mentioning LW somewhere in the text.

There should be some history information, in the sense of "this page had last significant updates in 2003". Not sure about the exact algorithm for that. Intuitively, if I fix a typo in 2025, the date should not change. Perhaps the year should be set up when I create the page, and then updated whenever I explicitly click "this page is up-to-date as of now"? And maybe old pages are displayed in grey color in the tree?

I do not feel bad about the fundamental problem of information getting outdated per se. I feel like any problems related to that are actually problems with the editing interface, as the outdated information gets in the way or is difficult to find and update. With good interface, there should be no problem with adding any random stuff even if you will never need it again; it should not get in your way, but stay hidden somewhere under "my diary, May 2025".

(These are just my opinions. I do not know any software that would do all of the above, in the right way.)

[-]Paulo Ferreira5mo30

I've been playing around with a related concept in a less formal context (not scientific research).

Currently using Claude code on Obsidian vaults and asking it some if the same questions you've mentioned and reviewing changes using git.

A long way to go to fully avoid the pitfalls mentioned and context management remains a big challenge but I feel it's a workable first step.

[-]Yaroslav Granowski5mo30

This is pretty much what my research is about.

The main problem is in the limitation of common languages. While thinking, you generate multiple new concepts for which you have no words or notations. All you can do is to anchor them with some phrases or expressions that also kind of approximate the meaning. If you elaborated well enough, these expressions would evoke similar ideas in the mind of another person.

You could better represent your ideas if you were free to generate new identifiers. Just like what we are doing while programming. Predicate calculus could be a good fit. You easily can express basic arithmetic model this way, but if you tried to express your social-level ideas, you would likely feel totally confused because these ideas are very much depending on others, most of which we don’t have words for.

This is a very different paradigm of thinking and requires to be developed from scratch. So, I started my research in ergonomics of logic programming. While developing self-hosted platform for interactive logic programming, I hope to get fluent in expressing complicated theories and develop a good fundament to move to other fields of knowledge. Perhaps Quantum Physics will be the next. And maybe one day Become a Superintelligence Yourself

[-]Thane Ruthenis5mo30

Paging @Raemon (this seems relevant to your interests) and @johnswentworth (if you have any thoughts regarding UI improvements or coping strategies).

[-]Thane Ruthenis5mo30

1. A major question here is LLMs. Scattershot thoughts:

Any notebase-refactoring tool would be much more useful if it's able to understand semantics, instead of operating off of coarse measures like blind text-clustering algorithms. LLMs seem like the tool for the job.
LLMs are reportedly pretty good at understanding research papers and at reading comprehension in general, and those skillsets have a lot of overlap with this. They may do a good job here.
LLMs have pretty bad research taste, and research taste is tightly entangled with the whole "choice of correct abstractions/ontologies" thing we want to get right here. They may not do a good job here.
Plenty of research ideas are fragile Butterfly Ideas, so the process of refactoring your thinking on a topic often has to be "intimate". Injecting the opinions of some external person-like entity into it may be lethal. LLM contributions, if there are any, would likely need to be heavily sanitized/de-personalized/abstracted over.
Plenty of research ideas are dream ideas/empty ideas. Having an LLM with proper context correctly shoot holes in them may save you months of wasted time.
LLMs are sycophantic and sometimes good at persuasion, so they may encourage your delusions about your dream/empty ideas. Or just inject subtle misunderstandings into your ontology in an attempt to avoid upsetting you.

Overall, I expect consideration (1) dominates here, and mitigating (2)-(6) would require a fair bit of tinkering.

An approach that might work here is to define a bunch of useful high-level functions for manipulating natural-language notes (such as "redefine(X, Y) means 'redefine X in terms of Y'"), describe those to an LLM, then build an interface where interactions with the LLM are conducted exclusively through this abstraction layer. As in, there's a literal "redefine" button, you select notes X and Y, press the button, and get the redefined notes, without having to talk to the LLM at all.

Ideally, the LLM is also fine-tuned to do the task correctly, minimizing how much personal flourish it injects and enhancing its attention to detail.

Some illustrative examples of such functions:

"Redefine the set of notes N in terms of X."
"Fetch the list of phenomena in the set of notes N that {contradict}/{are unexplained given} the model described in the notes M."
"Check whether X and Y are isomorphic in the context of N."
"Find all information in the set of notes N relevant to the thought X."
"Check whether assertion X contradicts/breaks anything in the set of notes N."
"Merge notes X and Y, distilling the information redundant between them into a new node and creating 0-2 new nodes recording information unique to each."

2. "Mundane" UI ideas:

I think a 2D representation along the lines of Obsidian's canvas (the canvas, not the graph view) is a good start. By contrast, 1D (a text document) or "1.5D" (set of text documents separated into folders, creating a "tree-like" format) are too impoverished to help with reasoning about any nontrivial structure at all, and 3D (let alone higher) is hard to parse because we process visual information in 2D.
Some sort of native multi-level hierarchical representation would be nice. "These two concepts often interact/appear together" (a "horizontal" connection) and "these low-level concepts are how this higher-level concept is implemented" (a "vertical" connection) are qualitatively different types of connections – yet a "flat" graph representation forces you to mix them up.

I. e., we prospectively want a "2.5D" representation: a partially ordered set of arbitrary 2D graphs, corresponding to systems living on different levels of abstraction. Equipped with native functionality for flexibly editing it.
History-keeping would be useful. Preferably with a good visualizer regarding how the notebase was transformed from A to B, with an easy ability to (partially) revert the changes.

None of those seem like high-enough-impact improvements, though.

[-]Valdes5mo40

I quite like some of your ideas about how to design note-taking software and I think I have a partial answer to the issues you point to:

Make the repository of notes more semantic than usual, halfway between pure text notes and a structured database. If the notes exist in a graph database then it becomes possible to write database queries (probably with something like SPARQL).
Allow the user to store queries within notes so the query is run whenever the note is opened. Maybe allow for a graph visualization of the query. This way the semantic structure becomes a part of the normal way to interact with the "hub" notes.
The ontology used on a topic is defined by the structure of the links. If the user wants a new ontology they just create it on top of the old one and let the old one rot. No need to erase it and it does not distract from the new one (maybe there is some name reuse in the link/attribute names, but that's trivial to fix).
Use LLMs to correctly annotate notes with the semantic information that populates the database (links, tags, some attributes, ...) This way they don't need taste to know how to edit, they just follow the pattern that comes with a given ontology. Perhaps this requires the user to create templates.
Provide a good CLI (Command Line Interface) for the software. This way the user can write scripts to update their notes using any language. It becomes easy to apply the new ontology to the old notes, possibly fully automagically but most of the time there is probably a bit of manual work left afterward. That's fine.
(unrelated to your post) let the GUI be a VScode extension so the user can use whatever other existing extension they like, this way there is no need to also design a good text editor.

I sent you a DM to talk more about this.

[-]Trevor Hill-Hand5mo30

It's quite barebones, and just more of a silly sketch of a project rather than serious project, but a lot of these ideas are what I'm thinking about within https://github.com/Jadael/Library-of-Aletheia - some way to systematically and reliably apply library science style ministerial operations, sometimes LLM powered and sometimes traditional scripts, to a collection of information, all in an Obsidian-canvas-esque "web of knowledge" framing and a visual UI.

I stopped working it on it because the LLM extension I was using in Godot wasn't being updated, and couldn't go above a hardcoded context window of 512 which limited the ability to do anything interesting, but perhaps it's possible to update it t just use Ollama or something nowadays 🤔

^{^}

Especially pre-paradigmic research, such as in agent foundations.

^{^}

Source: Vague recollection of various discussions I've read, plus this brief attempt at a public-opinion review I just ran via o3.

^{^}

See the generalized correspondence principle: new ontology must explain every real phenomenon the previous ontology was able to explain.

(And as far as keeping notes goes, you should also ideally preserve the explanation regarding what features of the new ontology made it look like the old ontology. "How and why does quantum physics consistently create the impression of classicality?")

LESSWRONG
LW

LESSWRONG
LW

26

Is Building Good Note-Taking Software an AGI-Complete Problem?

26

26

A Toy Example

Why Not Just Start From Scratch?

Generalized Representation-Flipping

The Fundamental Problem

Can the Problem Be Ameliorated?