How I Handle Automated Programming

HunterJay

This is a write up of my current process, as an independent software engineer, for using Claude Code to write and review all of my code. The specifics of the below will change as models get better. The overall ideas, I suspect, will hold for a little longer.

I’m trying to build software quickly.

When Sonnet 4.5 was released in September 2025, I found that I didn’t need to write code anymore.

When Opus 4.5 was released in November 2025, I found I didn’t need to review the code anymore.

After Opus 4.6 was released earlier this month, I finally set up a system to enable --dangerously-skip-permissions ^[1] .

It’s very fast ^[2] .

The motivation behind this sequence of handovers is basically just asking in a loop “Why am I the limiting factor here? What can I do to stop being the limiting factor?”

Most of the time, the answer is just that I don’t trust the models enough. My system just wasn’t yet set up to handle mistakes from them, so I had to play Quality Control to a frustrating degree.

But, the models are very smart and capable, and it is very quick to iterate yourself more and more out of the loop. Here’s the process I’ve come to over the past year or so of iterating on Claude Code:

Dangerously Skip Permissions

First, I have set up a new user account on my computer for Claude. The user can’t run root, it can’t modify files except in its own (and shared) directories. It has its own SSH keys to the cloud servers my projects run on, it has its own github token, and so on -- all of these have minimum permissions, and most are read-only. It uses its own browser (Playwright Chromium), without any of my data or logins available.

This is far less than perfect isolation, and if Claude were malicious he might be able to cause some trouble, but it is plenty to prevent a bumbling intern from destroying anything important.

With that, I feel reasonably comfortable running Claude with --dangerously-skip-permissions. I’ve actually set up a little script so that typing ‘claude’ will automatically run the folder’s .venv and apply this flag. Speedy!

Claude.md and docs/

When Claude starts up in Claude Code, the Claude.md file is loaded into his context window. As such, you want to keep this short while also making sure it includes a complete overview of the project.

Claude mostly builds this file and handles it well by default, however, I did put some effort into making a docs/ folder with more detailed information about how each part of the software should work. The idea is that Claude can follow pointers up the chain to help him find relevant information quickly, and to do so with a good idea of the overall reasoning and structure:

Claude.md --> Detailed documentation in /docs --> Actual Code

Of course, all of the docs were also written by Claude.

Working

Usually, I will start by running a /new_branch skill ^[3] , which lets Claude creates a new git branch with a name combining the date and the folder we are in.

Once that’s done, I clear the chat and write or paste in whatever the specific feature or problem to work on is. I try to pick a chunk of work that can be finished within a 1m token context window ^[4] (which, to be clear, is nearly everything).

I try to give Claude as much context as possible about the problem, as well as my certainty about it. For example: “A user has emailed with the below report. Could you investigate if this is a real issue?” or; “I’d like to display to users when the page is actively generating content in a way that lets them see it even if they aren’t in the active tab. Is it possible to do this via the .ico somehow? Perhaps showing an animation or different colour while generations are active? Could you investigate the possibilities here and propose some ways we might be able to provide this info to users?”

We often have a bit of back and forth chatting about how to approach the problem and plan out the implementation, and then I read the plan Claude generates semi-carefully, and often suggest changes to it -- usually some missing piece or misunderstood requirement, or a new idea I hadn’t thought of.

After that, Claude goes ahead and does the work which, with the permissions bypassed, is usually completed in one-shot. I avoid clearing the context from before the plan, unless it is polluted because I generally think it contains useful information about how to carry out the plan.

Occasionally there’s some back and forth after the initial work is done -- me asking why a choice was made, or whether an edge case I thought of is covered, or what happens in a particular scenario.

Now, we have a feature which we need to carefully review. This is where most of the work actually is.

Review Step 1 - What Did Claude Intend?

For complex features, or things with UI that Claude isn’t great at, we need to do a little bit of manual reviewing. But we don’t review the code! Oh no, we review the outputs. There are two ways to do this:

For complex flows, I run the /walkthrough skill ^[5] . This prompts Claude to create a script which calls the real code with a bunch of example inputs, processes the outputs of that code into clean, readable logs, and then presents it as a single file you can read top to bottom to see exactly what (apparently) caused what. Claude can also review this himself and fix things based on these outputs.

This is extremely useful for handling things where it’s not super obvious what exactly the code is doing, for example, if you’re building a project where you are assembling prompts for an AI (e.g, a benchmark), you want to see exactly what those prompts look like across a bunch of examples. It’s very handy to have that ‘bunch of examples’ output to a text file!

The /walkthrough output should not be viewed as ‘this is definitely what the code is doing’, but you can reliably view it as ‘this is what Claude intended to build’, which is very useful on its own. I often catch little differences to what I intended and can easily correct them by prompting Claude, whereas without this I would be much more blind.

For UI changes, I can just open the thing up and take a look. I am a big fan of browser based GUIs for most projects, so usually I’ll just flip across to localhost and have a quick look at what we’ve got. This is especially easy, because that /new_branch script also spins up a local server on a port which is specific to this project and this folder, so I know if I’m talking to Claude number four on the second desktop window, then I can go to port 8240 to see what we’ve done.

In both of these cases, it’s simple to notice small things and pepper them back onto Claude to fix. Sometimes I explicitly say “I’m going to throw a bunch of small little requests at you -- just add them to your todo list and get to them once you finish up what you’re doing. Don’t let me interrupt you, I’m just saying things as I see them!” so I don’t need to wait for previous work to be done before sending the next little fix.

Review Step #2 - Are You Happy With Your Work?

Once the code is written and appears to output the correct thing, I run a /pre-pr skill ^[6] which prompts Claude to tidy up and test his work before making a PR to master.

Specifically, Claude is prompted to review the new code for quality and clarity, check that the old code supports the new code & its patterns, remove dead code, write and run tests, check for similar problems to the one we fixed throughout the codebase, plan out manual tests (e.g. browser based tests), run the manual browser tests using the Playwright MCP, and then repeat until no major changes are made.

After that, the skill prompts Claude to update Claude.md and the /docs folder, merge the latest master into our branch, and create a PR. Then, it waits for the PR review and CI tests to complete, and handles those results.

Ideally this would all happen automatically, but often I need to reprompt Claude to check the PR info, or I just directly paste in test failures or PR comments for him.

Review Step #3 - Is The Other Claude Happy?

The PR comments are also automated! A different instance of Claude, running via the Claude github plugin, reviews the diff and comments on the PR. It often has nitpicks, but occasionally catches genuine issues or inconsistencies which need to be resolved.

All of the normal automated unit and integration tests run at this stage too. I use testmon to try and limit it to affected files, and also parallelise the tests, but it still can take a long time for everything to run.

Then it’s merged to master! I merge somewhere on the order of 5-10 PRs a day using this system -- you don’t want big open branches, since everything will work much better if you can keep it all inside one context window and avoid conflicts with other branches as much as possible.

Review Step #4 - Does The Staging Server Work?

Once the code is merged to master, at some point, maybe every couple of days, I will run the /PRstaging skill ^[7] to work on getting the code deployed. This skill basically repeats the /pre-pr checks, but across several of the recent changes and with a fresh instance of Claude. We also repeat our Github automated PR review and CI tests, and often catch a few more little things.

For any project that is used externally, I have a staging server which (should be) as similar to the production server as possible. I run a deploy script to pull the staging branch and deploy the code here first. Then I run the QA Bot

The QA Bot is a little tool I built which spins up a team of Claude Haiku’s to manually test websites using a GUI based browser. The intended pattern is to pass in the PR description and let the supervisor agent select which flows on the website to test (for example, being able to login, sign up, etc, etc).

This is generally quite expensive to run, so you only want to do it once before merging to production. The tool is also not that good yet -- sometimes it finds many false positives, or the agents get stuck, or we run into other issues. Occasionally it notices real bugs though, so I find it useful to run on anything important.

It is very important to provide the QA Bot with a staging server and test credentials which can do little harm, since the agents might test, say, the ‘delete profile’ flow with the URL and credentials you give them. They should ask before doing anything non-reversable, but it is a smaller model.

Once staging looks good, I repeat the process with /PRprod ^[8]

Review Step #5 - What Do The Production Logs Say?

We finally have something in production! That’s great, but also a little nerve-wracking, since at no point in this process did we read the code or click through everything manually. Let’s hope that if there’s any problems, we can catch them quickly.

I have a system to send serious errors to myself immediately via email, but there’s a lot of noisier stuff that could still be a problem that is stored in logs on the server, and on all of the various platforms a project might interact with (DigitalOcean, Stripe, OpenRouter, Mailgun, etc).

So, everyday, I run /check-logs ^[9] , which prompts Claude to access each of those in turn (using read-only credentials) and see if there is anything needing addressing. Claude is great at this, he can process huge lists of logs and notice patterns natively, then present any issues or observations (‘Memory utilisation was a bit high’, ‘We have more 404s than normal’) and dive into anything immediately by also looking at the codebase and recent PRs to production.

I’ve caught many an issue this way. Some platforms I haven’t pulled logs from automatically, but can still occasionally download a .csv and paste it into Claude with a prompt like ‘This is a report form Google Search Console, can you analyse it and flag any improvements we could make?’, and then just go ahead and make the improvements automatically using the above system.

One More Thing

I also occasionally run the /improve skill ^[10] , especially when a new model comes out. This helps keep everything aligned and up to date across the code base by spinning up an agent team to review each section of the project and check if improvements can be made to code quality, security, documentation, design, test coverage, and so on!

When I ran this for the first time with Opus 4.6, it caught 35 distinct issues, some of which were actually consequential.

The only other thing worth calling out is that this is a very parallel workflow -- I have six windows of Claude Code open per project, each working independently on their own repo. Often only three or four are actively running simultaneously, but I have occasionally hit six in parallel.

Conclusions

As the models get better -- which will happen quickly -- some of these steps will become unnecessary, but other parts should remain somewhat useful for a little longer. The general philosophy should be that if we are putting more control in the hands of the AI, or making it clearer about what we want to be developed, then the procedures will work better as the AIs get better.

You can think of it as analogous to working with a human engineer who would otherwise be idle, where asking them to check and explain their work in various ways can help get something good.

I now allow Claude to run without any specific list of commands blocked. ↩︎
Obviously a big chunk of this is just that Claude is getting capable enough to do this himself. Still, for the foreseeable future, a system which catches issues (especially one that catches issues using Claude) will be beneficial even as Claude gets much more capable. ↩︎
/new_branch ↩︎
You can set this in Claude Code with /model. It charges you for ‘extra usage’ at API rates once you go above ~200k tokens. ↩︎
/walkthrough ↩︎
/pre-pr ↩︎
/PRstaging ↩︎
/PRprod ↩︎
/check-logs ↩︎
/improve ↩︎

14