I'm starting to think Claude is already superhuman at this part.
This is a claim that I find hard to evaluate.
On the one hand, Claude is better than me in a bunch of ways. It knows more without having to look it up. It works faster and without getting tired. It can even work in parallel in ways I can't. So in all those ways it's a better coder than me.
But if I look at the individual output, it's clear that Claude is only at best as good as a P90 human. It's not really able to come up with clever, Carmack-or-Knuth-level solutions to problems. Heck, sometimes it can't even come up with as good a solution as I can! What it can do, though, is just keep applying above-media coding expertise persistently to a problem when directed to do so.
The main problem I see between Claude and being superhuman is it's lack of taste. The main way my coding sessions go off the rails is that Claude gets hung up on an idea, runs with it, and doesn't have the judgement to realize it was a bad idea or to make itself step back and look for a better solution (because it's notion of what's "better" is fairly limited).
Now none of this is to dispute that what Claude can do is really, really cool, and should dramatically increase productivity, since it can do what you would have used to need a human to do, and the human would have been slower even if you paid them more than you pay Claude. So in some limited sense that's "superhuman", but I mostly think we should preserve "superhuman" for when Claude can do something humans cannot modulo time and cost.
I basically agree with this. When I say Claude is superhuman at coding, I mean that when Claude knows what needs to be done, it does it about as well as a human but much faster. When I say Claude isn't superhuman at software engineering in general, it's because sometimes it doesn't take the right approach when an expert software engineer would.
I consider Claude's "taste" to be pretty good, usually, but not P90 of humans with domain experience. I'd characterize his deficiencies more along the lines of a lack of ability to do long-term "steering" at a human level. This is likely related to a lack of long-term memory and hence the ability to do continual learning.
Claude Code is excellent these days and meets my bar for "AGI". It's capable of doing serious amounts of cognitive labor and doesn't get tired (though I did repeatedly hit my limits on the $20 plan and have to wait through the 5-hour cooldown).
I spent a good chunk of this weekend seeing if I could get Claude to write a good Wikipedia article if I told it to use the site's rules and guidelines and then let it iteratively critique and revise against those guidelines until the article fully met the standards. I wrote zero of the text myself, though I did paste some Q&A back and forth to NotebookLM to help with citations and had ChatGPT generate an additional flowchart visual to include.
After getting some second opinions from Gemini and ChatGPT, I will have Claude do a final round of revisions and then actually try to get it on Wikipedia. I will share the link here if it gets accepted--I don't really know how that works, but I bet Claude can help me figure it out.
When you do so, please respect wikipedia's request for disclosure https://en.wikipedia.org/wiki/Wikipedia:Large_language_models#Disclosure .
Is this sufficient? I don't really know the best place to put a disclosure.
https://en.wikipedia.org/wiki/User_talk:Alexis0Olson/Multilayer_perceptron#LLM_Disclosure
I run multiple agents in parallel through the Claude Code web interface, so I actually managed to hit the limits on the Max plan. It was always within 10 minutes of the reset though.
I was also making Claude repeatedly read long investment documents for an unrelated project at the same time though.
Thanks for the write-up! Which Claude subscription did you use for this project, and what was your experience with rate limits?
I'm on the $100 Max plan ("5x more usage than Pro"), although rate limits were doubled for most of this period as a holiday thing[1]. I used Claude Code on the web to fire off concurrent workers pretty often, and I only hit rate limits twice: Once around 11:55 pm (reset at midnight) and once in the afternoon about 10 minutes before the rate limit reset (on a different project, where I was making Claude repeatedly read long financial documents). I used Opus 4.5 exclusively.
Basically the rate limits never really got in my way, and I wouldn't have hit them at all on the $200 plan (4x higher rate limits).
I assume spare capacity since people were off of work.
In the last few weeks, I've been playing around with the newest version of Claude Code, which wrote me a read-it-later service including RSS, email newsletters and an Android app.
Software engineering experience was useful, since I did plan out a lot of the high-level design and data model and sometimes push for simpler designs. Overall though, I mostly felt like a product manager trying to specify features as quickly as possible. While software engineering is more than coding, I'm starting to think Claude is already superhuman at this part.
This was a major change from earlier this year (coding agents were fun but not very useful) and a few months ago (coding agents were good if you held their hands constantly). Claude Opus 4.5 (and supposedly some of the other new models) generally writes reasonable code by default.
And while some features had pretty detailed designs, some of my prompts were very minimal.
After the first day of this, I mostly just merged PRs without looking at them and assumed they'd work right. I've had to back out or fix a small number since then, but even of those, most were fixed with a bug report prompt.
Selected Features
Android App
The most impressive thing Claude did was write an entire Android app from this prompt:
After that, almost all features in the Android app were implemented with the prompt, "Can you implement X in the Android app, like how it works in the web app?"
Narration
The most complicated feature I asked it to implement was article narration, using Readability.js to (optionally) clean up RSS feed content, then (optionally) running it through an LLM to make the text more readable, then using one of two pipelines to convert the text to speech and then linking the spoken narration back to the correct original paragraph.
This was the buggiest part of the app for a while, but mostly because the high-level design was too fragile. Claude itself suggested a few of the improvements, and once we had a less-fragile design it's been working consistently since then.
Selected Problems
Not Invented Here Syndrome
Claude caused a few problems by writing regexes rather than using obvious tools (JSDom, HTML parsers). Having software engineering experience was helpful for noticing these and Claude fixed them easily when asked to.
Bugs in Dependencies
Claude's NIH syndrome was actually partially justified, since the most annoying bugs we ran into were in other people's code. For a bug in database migrations, I actually ended up suggesting NIH and had Claude write a basic database migration tool.
The other major (still unsolved) problem we're having is that the Android emulator doesn't shut down properly in CI. Sadly I think this may be too much for Claude to just replace, but it's also not a critical part of the pipeline like migrations were.
Other Observations
Remaining Issues
The problems Claude still hasn't solved with minimal prompts are:
But that's it, and the biggest problem here is that I'm putting in basically no effort. I expect each of these is solvable if I actually spent an hour on them.
This was an eye-opening experience for me, since AI coding agents went from kind-of-helpful to wildly-productive in just the last month. If you haven't tried them recently, you really should. And keep in mind that this is the worst they will ever be.