Recent AI Experiences

In my previous post, I wrote about how computers are rapidly becoming much more whatever-we-want-them-to-be, and this seems to have big implications. In this post, I explore some of the practical implications I've experienced so far.

I mainly use Claude, since I perceive Anthropic to be the comparably more ethical company.^[1] So while ChatGPT users in the spring were experiencing sycophancy and their AIs "waking up" and developing personal relationships via the memory feature,^[2] I was experiencing Claude 3.7.

Claude 3.7

Before Claude 3.7, I was having conversations with LLMs every few months, checking in on whether it was useful to me yet, but not using it seriously. Claude 3.7 felt useful. It was a competent "word calculator" over large amounts of text, so I could dump in a lot of context on a project (often copying everything associated with a specific LogSeq tag in my notes) and ask questions and expect to get answers that don't miss anything important from the notes.^[3]

This led me to put more focus into organizing blobs of context on a topic to dump into AI, as a main format for a project. These blobs would start out very disorganized, copying in a bunch of notes, supplementing what's missing from those with some Deep Research reports by other AIs (Claude didn't have a version of Deep Research yet at this point). My job would then be to gradually organize and categorize and refine the blob into what it wants to be. These blobs of information are both prompt and product.

In large part thanks to conversations with Sahil, I realized that digitized information was about to become much more valuable very quickly. AI is making it easier and easier to transform any record from any format to any format, EG convert a two-hour video call to a list of action items. (There are already several companies pushing such products.) Information hoarding makes sense. You want any record of thoughts, intentions, ideas, -- every wish can be captured and developed later.

Note that this does not mean automation -- the AI automates some things (basically the retrieval of relevant ideas from a blob of notes), but the endpoint of this evolution is not necessarily one where you go directly from initial statement of wish to final output with no human intervention. This is not vibe-coding. I'm doing a very large amount of the work. As the blob goes from initial spore to final-state fruiting-body, there's (sometimes) a pattern of mostly human mostly AI (ideas have been disassembled and elaborated) $\to$ mostly human (I've rewritten things in my own voice, made a final decision about my canon after considering different AI-generated versions, etc).

I would keep a Claude Project for each "blob" I was working on, and corresponding LogSeq tag which served as my long-term store of all the relevant information. However, the LogSeq tags were the more capable of the two information-organization systems, since LogSeq allows multiple tags so you can organize information into overlapping topics and subtopics, while Claude.ai only allows a flat folder system.

I wrote some further thoughts on the way I was using AI in a now somewhat dated design document for a better AI interface, which discusses my experiences further. (Feel free to leave comments there if you read it.)

I had a brief experience with true vibe-coding, but while Claude 3.7 was willing to generate a lot of text, a code-base gets big quick, so I ended up moving to Gemini with its larger context window, and still had a very frustrating experience. I hadn't tried agentic tools like Cursor, so it was very limited by dealing with the entire codebase as one big file.

This would all change later on with Claude Code. But to maintain the temporal order of my story, I should first talk about Loomsidian.

Loomsidian

I wrote my most recent ILIAD paper using a similar approach, slowly developing a blob of context information into what it needed to be. This took the form of writing multiple drafts, with all previous drafts (but not all previous AI chats on the topic) kept in-context, so that the AI would have a great chance of guessing what comes next. I've written up my process in more detail, along with the entire chain of notes used as AI context. The final paper was still mostly human-written, although less so when it came to the LaTeX math formulas.

The main purpose of the AI in all of this was to do the more obvious mathematical gruntwork. I wasn't using it for its English prose. I would turn to the AI when the next bit seemed obvious enough that I expected the AI to just do it, or when I was feeling stuck and wanted to take my chances with the AI un-sticking me (in which case the AI was mostly only useful as a sounding board). I think the jump from Sonnet 3.7 to Opus 4 was important for the sort of math I was doing.

The best tool I found for this kind of work was Loomsidian. This is an instance of the Loom UI design, which uses LLMs more as autocomplete, rather than chat partner. I would describe the writing improvement as follows:

Prompting a chat model to write the next section of an outline/draft is more like rolling a ball down a hill, trying to position it perfectly so that it'll take a desirable path due to gravity. If it doesn't work you have to try again (re-roll the LLM) or change your positioning to a starting point that has a better chance of going where you want (improve your prompt).^[4]
Loom is like walking down with a ball of yarn which you can kick some distance, but which eventually stops rolling. You can just walk with the yarn (write stuff yourself) or you can let it roll. You can also pick the string up and move it around (edit the AI output). Loom also allows you to sample multiple completions at once and select from them.

Loomsidian is not the smoothest implementation of the Loom UI, by the way; I just liked working in Obsidian.

I don't go to Loomsidian for most things; I still use claude.ai a lot, and only turn to Loomsidian for this specific writing process.

Claude Code

Claude Code was good enough at coding that for a couple days it seemed to just not make mistakes. I focused on using it to make a containment sandbox for itself, since I didn't want its mistakes to lose me anything of value. Again, I was not really vibe-coding: I worked to understand what the AI was doing, correcting any mistakes as they happen rather than letting Claude bash its head against the universe.

Claude Code escapes the boundaries of the context window by organizing things into files and folders, which it intelligently reviews at appropriate times. Each folder can have a CLAUDE.md file, which is what Claude Code always sees if you run it in that folder. This serves as an index of information and a general orienting document for Claude, concerning whatever is going on within that folder. You'll add to it (or have Claude add to it) when you complicate the file structure, create important dependencies that need to not get broken, when Claude Code tries to do something wrong and you need to remind future Claude to avoid that mistake, etc. Eventually it will get too large, eating valuable context tokens, so you start factoring it, creating additional information files to be read under specific circumstances.

This allows me to organize all my information and projects in a file hierarchy; a big improvement over the interface of claude.ai (or chatgpt or perplexity). If Claude Code rendered LaTeX, I would be happy to move almost all my AI use there. (As it is, I do a lot of mathematical work, so I still end up mostly doing things in the claude.ai interface, limited as it may be.)

A major component of this organization is todo lists, which capture (or provide pointers to) the local telic information -- your wishes. Think of it as a macro-scale attention mechanism (which operates across the human and AI together). The directory structure, the AI interfaces installed, it's one big telic blob. It has some explicit idea of what it is trying to do. You put wishes into it to refine that concept. You work with the AI to index those wishes well, converting them into todo lists, including items such as organizing the information you need to accomplish these tasks. Human and AI attention is metabolized into growth and development of the telic blob: where attention is given, it becomes more what it is supposed to be. The global todo system helps organize that attention.

I implemented a version of metaprompt, so that I can add wishes to a list & see something randomly every time I log into my AI sandbox. That way, I know I'll eventually be reminded of all these wishes. (Even if I move to a different system, I can use AI to facilitate moving wishes to that new format.) I use a three-tier priority system, where high-priority wishes are sampled first (so I'll always be reminded of a high-priority item if there are any), then if none, medium-priority gets sampled with a 50% chance (so they'll never be totally swamped by a bunch of random things), and then all the rest.

Here's an AI-written summary of the system you can point Claude Code at if you'd like to do something similar. (I have not edited this for accuracy! I could if there was interest.)

Overall, I find Claude Code to be really fun to interact with. It still takes a lot of time and work, but I can always record my wishes in rough form and work out the details later, which feels like a big relief of cognitive overhead. It's not quite there yet (in large part because Claude Code doesn't render latex), but this seems like something that could turn into a really useful tool for the research I do, helping me to prioritize my time while not losing threads, continuing to work on each concept in proportion to how promising it seems.

I hope to also integrate spaced-repetition into the system, so that I can capture wishes about what I'd like to learn, be reminded to study those topics, have AI there to facilitate learning, and then capture flashcards so that I can review what I learned. It feels like we're approaching something like the AI glasses of Manfred Macx from Accelerando.^[5]

I'm not quite settled on the terminology in this post; you might have noticed clashing analogies. I like something about "telic blob" to describe the phenomenon here (the object that has both prompt-nature and product-nature, and becomes more what it is meant to be under the focus of attention). However, it clashes with the analogy of seeds/spores for an early-stage telic blob, nor does it fit with blooming/fruiting/etc for telic blobs culminating in some sort of publication. "Telic blob" invokes a blob of clay which contains information about how it ought to be sculpted. In some ways, though, a garden (which can be harvested repeatedly) would be a better analogy, with attention being the water needed for growth.

I would invite readers to comment about their own recent history with AI use. How has your use of AI developed over time? What have you used it for? What has your process been?

^{^}
This is a deontological preference towards using the differentially more ethical option, not a consequentialist evaluation that Anthropic is good on net. I am fearful that Anthropic's approach to safety will not be sufficient.
^{^}
To be fair, I did briefly subscribe to ChatGPT to try out the new image generation when it came out, which also gave me a taste of the new memory feature. Even though I had relatively little context with ChatGPT compared to Claude, I enjoyed
^{^}
Claude 3.7 still rounded down ideas to something more statistically common, but its "conceptual resolution" was adequate to remind me of what was in the notes. Like looking at a blurry thumbnail preview of the idea.
^{^}
You can also tell the AI what is wrong with the attempt and it'll try to fix it, but this keeps a BAD ATTEMPT into your chat history, which will make the LLM less intelligent, since it'll be anchored to that bad attempt. Obviously this is worthwhile sometimes, but if you do it repeatedly in a chat it adds up, filling the context with junk.
^{^}
Manfred Macx loses his glasses at one point, and a kid puts them on. The child is trained by the glasses to act like Manfred Macx, and proceeds to try to close a deal Manfred had been heading to. In my headcanon at least, this is not because the AI glasses are some mind-control device, but because they're a cozy home for Manfred Macx, a cognitive aid which presents the information he needs to be his best self. The child found the prospect of being Manfred Macx exciting and leaned into it.

[-]ceba3mo50

It feels like we've skipped a step here. I want tech for developing my skills, increasing my capabilities. I want the meat to be at the limit of my/human capabilities before augmenting it with mech. We're nowhere close on that front, there's almost nothing like it.

Edit: or is there?

[-]abramdemski3mo50

With MetaPrompt, and similar approaches, I'm not asking the AI to autonomously tell me what to do, I'm mostly asking it to write code to mediate between me and my todo list. One way to think of it is that I'm arranging things to that I'm in both the human user seat and the AI assistant seat. I can file away nuggets of inspiration & get those nuggets served to me later when I'm looking for something to do. The AI assistant is still there, so I can ask it to do things for me if I want (and I do), but my experience with these various AI tools has been that things are going their best once I set the AI aside. I seem to find the AI to be a useful springboard, prepping the environment for me to work.

I agree with your sentiment that there isn't enough tech for developing your skills, but I think AI can be a useful enabler to build such tech. What system do you want?

[-]ceba3mo10

Something that facilitates practise. Most of the friction in learning for me is in figuring out how to aquire skills, what excercises to seek out, how to assess my own performance, how much more time/effort to put in. I'm sufficiently occupied with learning, that I've less resources to allocate to meta-problem of how to learn.

What would the solution look like? For mathematics, textbooks are almost ideal by themselves, however it's difficult to assess one's own performance, and the content never matches precisely with the uni course.

The software would schedule problems and content, grade user's solutions and diagnose misunderstandings, track performance, adapt pace to user peformance/assessment schedule.

The hard part would be deciding from user data how to adapt difficulty, volume, frequency, to optimise learning. Quantify difficulty? Then to adapt all that to assessment schedule.

Real sci-fi would be generating problems to order according to difficulty. It almost seems like we're there already, but I can't quite tell.

[-]Trevor Hill-Hand3mo30

I'd like to relate this old blog post I found recently after my love affair with A Fire Upon the Deep: https://medium.com/@greyboi/building-a-skrode-initial-thoughts-a195c4a0663d

I’ve been wondering about addressing this [working memory] slippage with tech. For instance, as a youngster I used to code on tiny screens, but nowadays I use as many large monitors as I can get; they help me retain context, taking the load off my working memory. So tech can definitely help if you use it well.

One of the things I’m thinking about primarily is adopting a prosthetic memory.

[-]abramdemski3mo43

A skrode does seem like a good analogy, complete with the (spoiler)

skrodes having a built-in vulnerability to an eldrich God, so that skrode users can be turned into puppets readily. (IE, integrating LLMs so deeply into one's workflow creates a vulnerability as LLMs become more persuasive.)

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

58

Recent AI Experiences

58

Claude 3.7

Loomsidian

Claude Code

58