NB: Last week I teased a follow-up that depended on posting an excerpt from Fundamental Uncertainty. Alas, I got wrapped up in revisions and didn’t get it done in time. So I don’t leave you empty handed, instead this week I offer you some updates on my Claude workflows.
Claude Opus 4.5’s visual interpretation of “us vibing together” on a problem. Claude chose to title this “groundlessness as ground” all on its own.
Back in October I shared how I write code with Claude. A month later, how I write blog posts (though ironically not this one). But I wrote those before Opus 4.5 came out, and boy has that model changed things.
Opus 4.5 is much better than what came before in several ways. It can think longer without getting lost. It’s much better able to follow instructions (e.g. putting things in CLAUDE.md now gets respected more than 90% of the time). And as a result, I can trust it to operate on its own for much longer.
I’m no longer really pair programming with Claude. I’m now more like a code reviewer for its work. The shift has been more subtle than that statement might imply, though. The reality is that Claude still isn’t great at making technical decisions. It’s still worse than random chance at picking the solutions I want it to pick. And so I still have to work with it quite closely to get it to do what I want.
But the big change has been that before I would have the terminal open, claude in one split in tmux, nvim in another, and we’d iterate in a tight loop, with claude serving as something like very advanced autocomplete. Now, I use Claude in the desktop app, get it to concurrently work on multiple branches using worktrees, and I have given it instructions on how to manage my Graphite stacks so that even for complex, multi-PR workflows I usually can just interact through chat rather than having to open up the console and do things myself.
Some tooling was needed to make this work. I had to update CLAUDE.md and write some skills so that Claude could better do what I wanted without intervention. I also had to start using worktrees, and then in the main repo directory I just check stuff out to test it (the local dev environment is a singleton) and do occasional manual operations I can’t hand to Claude (like running terraform apply, since I can’t trust it not to randomly destroy infrastructure on accident).
Still, this is not quite the workflow I want. Worktrees are annoying to use. I’d prefer to run Claude in cloud sandboxes. But the offering from Anthropic here is rather limited in how it can interact with git, and as a result not useful for me because it can’t use Graphite effectively. Graphite has their own background agents, but they’re still in beta and not yet reliable enough to use (plus they still have restrictions, like one chat per branch, rather than being able to have a chat that manages an entire stack).
But as I hope this makes clear, I now use Claude Code in a more hands-off way. My interactions with the code are less “sit in an editor with Claude and work in a tight, pair-programming-like loop”, and more “hand tasks to Claude, go do other things, then come back and review its work via diffs”. I expect this trend to continue, and I also hope to see new tooling that makes this workflow easier later in the year.
Well, Claude still isn’t fantastic here. It’s gotten much better at mimicking my style, but what it produces still has slop in its bones. It’s also gotten better at thinking things through on its own, but I still have to work to focus it and keep it on task. It will miss things, same as a human would, that I want it to look at.
For example, I was editing a paragraph recently. I made a change to some wording that I was worried might convey the right sense but be technically wrong. I handed it to Claude. Its response was along the lines of “yes, this looks great, you made this much more readable”. But when I pressed it on my factual concerns, it noticed and agreed there was a problem more strongly than I did! These kinds of oversights mean I can’t trust Claude to help me write words the same way I trust it to help me write code.
So I’m still doing something that looks much more like pairing with an editor when I write with Claude. This is good news in some sense, because it means I’m still needed to think in order to produce good writing, but bad news if you were hoping to automate more thinking with Claude.
This past week there came news of some novel mathematical breakthroughs using LLMs. The technology is clearly making progress towards critical thinking in a wider set of domains. And yet, writing remains a nebulous enough task that doing it well continues to evade Claude and the other models. That’s not to say they aren’t getting better at producing higher quality slop, but they still aren’t really up to completing a task like finishing revisions on my book the way I would want them done for me.
Where does this leave me feeling about LLMs right now?
We made a lot of progress on utility in the last 12 months. Last January I was still copy-pasting code into Claude to get its help and using Copilot for autocomplete. It was almost useless for writing tasks at that point, and I often found myself wasting time chatting with it trying to get things done when it would have been faster to sit and think and do it myself. That’s just no longer true.
As always, though, I don’t know where we are on the S-curve. In some ways it feels like progress has slowed down, but in others it feels like it’s sped up. The models aren’t getting smarter faster in the same way they were in 2024, but they’re becoming more useful for a wider set of tasks at a rapid rate. Even if we don’t get LLMs that exceed what, say, a human ranked in the 70th percentile on a task could do, that’s already good enough to continue to transform work.
NB: Last week I teased a follow-up that depended on posting an excerpt from Fundamental Uncertainty. Alas, I got wrapped up in revisions and didn’t get it done in time. So I don’t leave you empty handed, instead this week I offer you some updates on my Claude workflows.
Back in October I shared how I write code with Claude. A month later, how I write blog posts (though ironically not this one). But I wrote those before Opus 4.5 came out, and boy has that model changed things.
Opus 4.5 is much better than what came before in several ways. It can think longer without getting lost. It’s much better able to follow instructions (e.g. putting things in
CLAUDE.mdnow gets respected more than 90% of the time). And as a result, I can trust it to operate on its own for much longer.I’m no longer really pair programming with Claude. I’m now more like a code reviewer for its work. The shift has been more subtle than that statement might imply, though. The reality is that Claude still isn’t great at making technical decisions. It’s still worse than random chance at picking the solutions I want it to pick. And so I still have to work with it quite closely to get it to do what I want.
But the big change has been that before I would have the terminal open,
claudein one split intmux,nvimin another, and we’d iterate in a tight loop, withclaudeserving as something like very advanced autocomplete. Now, I use Claude in the desktop app, get it to concurrently work on multiple branches using worktrees, and I have given it instructions on how to manage my Graphite stacks so that even for complex, multi-PR workflows I usually can just interact through chat rather than having to open up the console and do things myself.Some tooling was needed to make this work. I had to update
CLAUDE.mdand write some skills so that Claude could better do what I wanted without intervention. I also had to start using worktrees, and then in the main repo directory I just check stuff out to test it (the local dev environment is a singleton) and do occasional manual operations I can’t hand to Claude (like runningterraform apply, since I can’t trust it not to randomly destroy infrastructure on accident).Still, this is not quite the workflow I want. Worktrees are annoying to use. I’d prefer to run Claude in cloud sandboxes. But the offering from Anthropic here is rather limited in how it can interact with
git, and as a result not useful for me because it can’t use Graphite effectively. Graphite has their own background agents, but they’re still in beta and not yet reliable enough to use (plus they still have restrictions, like one chat per branch, rather than being able to have a chat that manages an entire stack).But as I hope this makes clear, I now use Claude Code in a more hands-off way. My interactions with the code are less “sit in an editor with Claude and work in a tight, pair-programming-like loop”, and more “hand tasks to Claude, go do other things, then come back and review its work via diffs”. I expect this trend to continue, and I also hope to see new tooling that makes this workflow easier later in the year.
Share
That’s coding, but what about writing?
Well, Claude still isn’t fantastic here. It’s gotten much better at mimicking my style, but what it produces still has slop in its bones. It’s also gotten better at thinking things through on its own, but I still have to work to focus it and keep it on task. It will miss things, same as a human would, that I want it to look at.
For example, I was editing a paragraph recently. I made a change to some wording that I was worried might convey the right sense but be technically wrong. I handed it to Claude. Its response was along the lines of “yes, this looks great, you made this much more readable”. But when I pressed it on my factual concerns, it noticed and agreed there was a problem more strongly than I did! These kinds of oversights mean I can’t trust Claude to help me write words the same way I trust it to help me write code.
So I’m still doing something that looks much more like pairing with an editor when I write with Claude. This is good news in some sense, because it means I’m still needed to think in order to produce good writing, but bad news if you were hoping to automate more thinking with Claude.
This past week there came news of some novel mathematical breakthroughs using LLMs. The technology is clearly making progress towards critical thinking in a wider set of domains. And yet, writing remains a nebulous enough task that doing it well continues to evade Claude and the other models. That’s not to say they aren’t getting better at producing higher quality slop, but they still aren’t really up to completing a task like finishing revisions on my book the way I would want them done for me.
Subscribe now
Where does this leave me feeling about LLMs right now?
We made a lot of progress on utility in the last 12 months. Last January I was still copy-pasting code into Claude to get its help and using Copilot for autocomplete. It was almost useless for writing tasks at that point, and I often found myself wasting time chatting with it trying to get things done when it would have been faster to sit and think and do it myself. That’s just no longer true.
As always, though, I don’t know where we are on the S-curve. In some ways it feels like progress has slowed down, but in others it feels like it’s sped up. The models aren’t getting smarter faster in the same way they were in 2024, but they’re becoming more useful for a wider set of tasks at a rapid rate. Even if we don’t get LLMs that exceed what, say, a human ranked in the 70th percentile on a task could do, that’s already good enough to continue to transform work.
2026 is going to be an interesting year.