So far, I have not had Claude Code refuse a request from me, not even once.
When Claude reinterprets the request to be "the request, but safe", this is invisible, and good on the margin.
I think doing this is mostly fine.
Following up on [1] and [2]...
So, I've had a "Claude Code moment" recently: I decided to build something on a lark, asked Opus to implement it, found that the prototype worked fine on the first try, then kept blindly asking for more and more features and was surprised to discover that it just kept working.
The "something" in question was a Python file editor which behaves as follows:
The remarkable thing isn't really the functionality (to a large extent, this is just a wrapper on ast + QScintilla), but how little effort it took: <6 hours by wall-clock time to generate 4.3k lines of code, and I've never actually had to look at it, I just described the features I wanted and reported bugs to Opus. I've not verified the functionality comprehensively, but it basically works, I think.
How does that square with the frankly dismal performance I've been observing before? Is it perhaps because I skilled up at directing Opus, cracked the secret to it, and now I can indeed dramatically speed up my work?
No.
There was zero additional skill involved. I'd started doing it on a lark, so I'd disregarded all the previous lessons I've been learning and just directed Opus same as how I've been trying to do it at the start. And it Just Worked in a way it Just Didn't Work before.
Which means the main predictor of how well Opus performs isn't how well you're using it/working with it, but what type of project you're working on.
Meaning it's very likely that the people for whom LLMs works exhilaratingly well are working on the kinds of projects LLMs happen to be very good at, and everyone for whom working with LLMs is a tooth-pulling exercise happen not to be working on these kinds of projects. Or, to reframe: if you need to code up something from the latter category, if it's not a side-project you can take or leave, you're screwed, no amount of skill on your part is going to make it easy. The issue is not that of your skill.
The obvious question is: what are the differences between those categories? I have some vague guesses. To get a second opinion, I placed the Python editor ("SpanEditor") and the other project I've been working on ("Scaffold") into the same directory, and asked Opus to run a comparative analysis regarding their technical difficulty and speculate about the skillset of someone who'd be very good at the first kind of project and bad at the second kind. (I'm told this is what peak automation looks like.)
Its conclusions seem sensible:
Scaffold is harder in terms of:
SpanEditor is harder in terms of:
The fundamental difference: Scaffold builds infrastructure from primitives (graphics, commands, queries) while SpanEditor leverages existing infrastructure (Scintilla, AST) but must solve domain-specific semantic problems (code understanding).
[...]
Scaffold exhibits systems complexity - building infrastructure from primitives (graphics, commands, queries, serialization).
SpanEditor exhibits semantic complexity - leveraging existing infrastructure but solving domain-specific problems (understanding code without type information).
Both are well-architected. Which is "harder" depends on whether you value low-level systems programming or semantic/heuristic reasoning.
[...]
What SpanEditor-Style Work Requires
What Scaffold-Style Work Requires
The Cognitive Profile
Someone who excels at SpanEditor but struggles with Scaffold likely has these traits:
Strengths
| Trait | Manifestation |
| Strong verbal/symbolic reasoning | Comfortable with ASTs, grammars, semantic analysis |
| Good at classification | Naturally thinks "what kind of thing is this?" |
| Comfortable with ambiguity | Can write heuristics that work "most of the time" |
| Library-oriented thinking | First instinct: "what library solves this?" |
| Top-down decomposition | Breaks problems into conceptual categories |
Weaknesses
| Trait | Manifestation |
| Weak spatial reasoning | Struggles to visualize coordinate transformations |
| Difficulty with temporal interleaving | Gets confused when multiple state machines interact |
| Uncomfortable without guardrails | Anxious when there's no library to lean on |
| Single-layer focus | Tends to think about one abstraction level at a time |
| Stateless mental model | Prefers pure functions; mutable state across time feels slippery |
Deeper Interpretation
They Think in Types, Not States
SpanEditor reasoning: "A CodeElement can be a function, method, or class. A CallInfo has a receiver and a name."
Scaffold reasoning: "The window is currently in RESIZING_LEFT mode, the aura progress is 0.7, and there's a pending animation callback."
The SpanEditor developer asks "what is this?" The Scaffold developer asks "what is happening right now, and what happens next?"
They're Comfortable with Semantic Ambiguity, Not Mechanical Ambiguity
SpanEditor: "We can't know which class obj.method() refers to, so we'll try all classes." (Semantic uncertainty - they're fine with this.)
Scaffold: "If the user releases the mouse during phase 1 of the animation, do we cancel phase 2 or let it complete?" (Mechanical uncertainty - this feels overwhelming.)
They Trust Abstractions More Than They Build Them
SpanEditor developer's instinct: "Scintilla handles scrolling. I don't need to know how."
Scaffold requires: "I need to implement scrolling myself, which means tracking content height, visible height, scroll offset, thumb position, and wheel events."
The SpanEditor developer is a consumer of well-designed abstractions. The Scaffold developer must create them.
tl;dr: "they think in types, not states", "they're anxious when there's no library to lean on", "they trust abstractions more than they build them", and "tend to think about one abstraction level at a time".
Or, what I would claim is a fine distillation: "bad at novel problem-solving and gears-level modeling".
Now, it's a bit suspicious how well this confirms my cached prejudices. A paranoiac, which I am, might suspect the following line of possibility: I'm sure it was transparent to Opus that it wrote both codebases (I didn't tell it, but I didn't bother removing its comments, and I'm sure it can recognize its writing style), so perhaps when I asked it to list the strengths and weaknesses of that hypothetical person, it just retrieved some cached "what LLMs are good vs. bad at" spiel from its pretraining. There are reasons not to think that, though:
Overall... Well, make of that what you will.
The direction of my update, though, is once again in favor of LLMs being less capable than they sound like, and towards longer timelines.
Like, before this, there was a possibility that it really were a skill issue on my part, and one really could 10x their productivity with the right approach. But I've now observed that whether you get 0.8x'd or 10x'd depends on the project you're working on and doesn't depend on one's skill level – and if so, well, this pretty much explains the cluster of "this 10x'd my productivity!" reports, no? We no longer need to entertain the "maybe there really is a trick to it" hypothesis to explain said reports.
Anyway, this is obviously rather sparse data, and I'll keep trying to find ways to squeeze more performance out of LLMs. But, well, my short-term p(doom) has gone down some more.
We’re back with all the Claude that’s fit to Code. I continue to have great fun with it and find useful upgrades, but the biggest reminder is that you need the art to have an end other than itself. Don’t spend too long improving your setup, or especially improving how you improve your setup, without actually working on useful things.
The Efficient Market Hypothesis
Odd Lots covered Claude Code. Fun episode, but won’t teach my regular readers much that is new.
Bradly Olsen at the Wall Street Journal reports Claude [Code and now Cowork are] Taking the AI World By Storm, and ‘Even Non-Nerds Are Blown Away.’
It is remarkable how everyone got the ‘Google is crushing everyone’ narrative going with Gemini 3, then it took them a month to realize that actually Anthropic is crushing everyone, at least among the cognoscenti with growing momentum elsewhere, with Claude Code and Claude Opus 4.5. People are realizing you can know almost nothing and still use it to do essentially everything.
Are Claude Code and Codex having a ‘GPT moment’?
Huh, Upgrades
Claude Cowork is now available to Pro subscribers, not only Max subscribers.
Claude Cowork will ask explicit permission before all deletions, add new folders in the directory picker without starting over and make smarter connector suggestions.
Claude Code on the web gets a good looking diff view.
Claude Code for VSCode has now officially shipped, it’s been available for a while. To drag and drop files, hold shift.
Claude Code now has ‘community events’ in various cities. New York and San Francisco aren’t on the list, but also don’t need to be.
Claude Code upgraded to 2.1.9, and then to 2.1.10 and 2.1.11 which were tiny, and now has reached 2.1.14.
Few have properly updated for this sentence: ‘Claude Codex was built in 1.5 weeks with Claude Code.’
Planning mode now automatically clears context when you accept a plan.
Anthropic is developing a new Customize section for Claude to centralize Skills, connectors and upcoming commands for Claude Code. My understanding is that custom commands already exist if you want to create them, but reducing levels of friction, including levels of friction in reducing levels of friction, is often highly valuable. A way to browse skills and interact with the files easily, or see and manage your connectors, or an easy interface for defining new commands, seems great.
Obsidian
I highly recommend using Obsidian or another similar tool together with Claude Code. This gives you a visual representation of all the markdown files, and lets you easily navigate and search and edit them, and add more and so on. I think it’s well worth keeping it all human readable, where that human is you.
Heinrich calls it ‘vibe note taking’ whether or not you use Obsidian. I think the notes are a place you want to be less vibing and more intentional, and be systematically optimizing the notes, for both Claude Code and for your own use.
You can combine Obsidian and Claude Code directly via the Obsidian terminal plugin, but I don’t see any mechanical advantage to doing so.
New Tools
Siqi Chen offers us /claude-continuous-learning. Claude’s evaluation is that this could be good if you’re working in codebases where you need to continuously learn things, but the overhead and risk of clutter are real.
Jasmine Sun created a tool to turn any YouTube podcast into a clean, grammatical PDF transcript with chapters and takeaways.
Tool Search
The big change with Claude Code version 2.1.7 was enabled MCP tool search auto mode by default, which triggers when MCP tools are more than 10% of the context window. You can disable this by adding ‘MCPSearch’ to ‘disallowedTools’ in settings. This seems big for people using a lot of MCPs at once, which could eat a lot of context.
With that solved, presumably you should be ‘thinking MCP’ at all times, it is now safe to load up tons of them even if you rarely use each one individually.
Out Of The Box
Well, yes, this is happening.
Some of us three years ago were pointing out, loud and clear, that exactly this was obviously going to happen, modulo various details. Now you can see it clearly.
Not giving Claude a lot of access is going to slow things down a lot. The only thing holding most people back was the worry things would accidentally get totally screwed up, and that risk is a lot lower now. Yes, obviously this all causes other concerns, including prompt injections, but in practice on an individual level the risk-reward calculation is rather clear. It’s not like Google didn’t effectively have root access to our digital lives already. And it’s not like a truly rogue AI couldn’t have done all these things without having to ask for the permissions.
The humans are going to be utterly dependent on the AIs in short order, and the AIs are going to have access, collectively, to essentially everything. Grok has root access to Pentagon classified information, so if you’re wondering where we draw the line the answer is there is no line. Let the right one in, and hope there is a right one?
Skilling Up
What’s better than one agent? Multiple agents that work together and that don’t blow up your budget. Rohit Ghumare offers a guide to this.
You can do this with a supervisor agent, which scales to about 3-8 agents, if you need quality control and serial tasks and can take a speed hit. To scale beyond that you’ll need hierarchy, the same as you would with humans, which gets expensive in overhead, the same as it does in humans.
Or you can use a peer-to-peer swarm that communicates directly if there aren’t serial steps and the tasks need to cross-react and you can be a bit messy.
You can use a shared state and set of objects, or you can pass messages. You also need to choose a type of memory.
My inclination is by default you should use supervisors and then hierarchy. Speed takes a hit but it’s not so bad and you can scale up with more agents. Yes, that gets expensive, but in general the cost of the tokens is less important than the cost of human time or the quality of results, and you can be pretty inefficient with the tokens if it gets you better results.
Olivia Moore offers a basic guide to Cursor and Claude Code for nontechnical folks.
Here’s another Twitter post with basic tips. I need to do better on controlling context and starting fresh windows for each issue, in particular.
Often you’ll want to tell the AI what tool is best for the job. Patrick McKenzie points out that even if you don’t know how the orthodox solution works, as long as you know the name of the orthodox solution, you can say ‘use [X]’ and that’s usually good enough. One place I’ve felt I’ve added a lot of value is when I explain why I believe that a solution to a problem exists, or that a method of some type should work, and then often Claude takes it from there. My taste is miles ahead of my ability to implement.
The Art Must Have An End Other Than Itself Or It Collapses Into Infinite Recursion
Always be trying to get actual use out of your setup as you’re improving it. It’s so tempting to think ‘oh obviously if I do more optimization first that’s more efficient’ but this prevents you knowing what you actually need, and it risks getting caught in an infinite loop.
Always optimize in the service of a clear target. Build the pieces you need, as you need them. Otherwise, beware.
Safely Skip Permissions
Yes, you could use a virtual machine, but that introduces some frictions that many of us want to avoid.
I’m experimenting with using a similar hook system plus a bunch of broad permissions, rather than outright using –dangerously-skip-permissions, but definitely thinking to work towards dangerously skipping permissions.
A Matter of Trust
At first everyone laughed at Anthropic’s obsession with safety and trust, and its stupid refusals. Now that Anthropic has figured out how to make dangerous interactions safer, it can actually do the opposite. In contexts where it is safe and appropriate to take action, Claude knows that refusal is not a ‘safe’ choice, and is happy to help.
Investment strategizing tends to be safe across the board, but there are presumably different lines on where they become unwilling to help you execute. So far, I have not had Claude Code refuse a request from me, not even once.
Code Versus Cowork
I haven’t tried Cowork myself due to the Mac-only restriction and because I don’t have a problem working with the command line. I’ve essentially transitioned into Claude Code for everything that isn’t pure chat, since it seems to be more intelligent and powerful in that mode than it does on the web even if you don’t need the extra functionality.
Claude Cowork Offers Mundane Utility
The joy of the simple things:
Or to figure out how to write them.
Enjoying the almost as simple things:
The actual answer is that very obviously it was not worth it for Ado to get a hydroponic garden, because his hourly rate is insanely high, but this is a fun project and thus goes by different standards.
The transition from Claude Code to Claude Cowork, for advanced users, if you’ve got a folder with the tools then the handoff should be seamless:
Justine Moore has Claude Cowork write up threads on NeurIPS best papers, generate graphics for them on Krea and validate this with ChatGPT. Not the best thing.
Claude Code Offers Mundane Utility
Peter Wildeford is having success doing one-shot Instacart orders from plans without an explicit list, and also one-shotting an Uber Eats order.
A SaaS vendor (Cypress) a startup was using tried to double their price from $70k to $170k a year, so the startup does a three week sprint and duplicates the product. Or at least, that’s the story.
By default Claude Code only saves 30 days of session history. I can’t think of a good reason not to change this so it saves sessions indefinitely, you never know when that will prove useful. So tell Claude Code to change that for you by setting cleanupPeriodDays to 0.
Yep. Often the way you ues Claude Code is to notice that you can automate things and then have it automate the automation process. It doesn’t have to do everything itself any more than you do.
An explanation (direct link to 15 minute video) of what Claude skills are.
Vibe Coding Requires Good Vibes
James Ide points out that ‘vibe coding’ anything serious still requires a deep understanding of software engineering and computer systems. You need to figure out and specify what you want. You need to be able to spot the times it’s giving you something different than you asked for, or is otherwise subtly wrong. Typing source code is dead, but reading source code and the actual art of software engineering are very much not.
I find the same, and am rapidly getting a lot better at various things as I go.
Codex of Ultimate Vibing
Every’s Dan Shipper writes that OpenAI has some catching up to do, as his office has with one exception turned entirely to Claude Code with Opus 4.5, where a year ago it would have been all GPT models, and a month prior there would have been a bunch of Codex CLI and GPT 5.1 in Cursor alongside Claude Code.
Codex did add the ability to instruct mid-execution with new prompts without the need to interrupt the agent (requires /experimental), but Claude Code already did that.
There are those who still prefer Codex and GPT-5.2, such as Hasan Can. They are very much in the minority lately, but if you’re a heavy duty coder definitely check and see which option works best for you, and consider potential hybrid strategies.
One hybrid strategy is that Claude Code can directly call the Gemini CLI, even without an API key. Tyler John reports it is a great workflow, as Gemini can spot things Claude missed and act as a reviewer and way to call out Claude on its mistakes. Gemini CLI is here.
No Soup For You
Contrary to claims by some, including George Hotz, Anthropic did not cut off OpenRouter or other similar services from Claude Opus 4.5. The API exists. They can use it.
What other interfaces cannot do is use the Claude Code authorization token to use the tokens from your Claude subscription for a different service, which was always against Anthropic’s ToS. The subscription is a special deal.
I agree that Anthropic’s communications about this could have been better, but what they actually did was tolerate a rather blatant loophole for a while, allowing people to use Claude on the cheap and probably at a loss for Anthropic, which they have now reversed with demand surging faster than they can spin up servers.
Server Overload
Claude Codes quite a lot, usage is taking off. Here’s OpenRouter (this particular use case might be confounded a bit by the above story where they cut off alternative uses of Claude Code authorization tokens, but I’m guessing mostly it isn’t):
A day later, it looked like this.
Reports are the worst of the outage was due to a service deployment, which took about 4 hours to fix.
Here’s The Pitch
She shared a sample TikTok, showing a woman who doesn’t understand math using Claude to automatically code up visualizations to help her understand science, which seemed great.
OpenAI takes the approach of making things easy on the user and focusing on basic things like cooking or workouts. Anthropic shows you a world where anything is possible and you can learn and engage your imagination. Which way, modern man?
The Lighter Side
And yet some people think the AIs won’t be able to take over.