You can read a PR and tell if it actually accomplishes what is says it does, right?
Mostly I can't, not if there are subtle issues. Certainly I can look and see if any bugs jump out at me, or any areas look suspicious, but understanding a piece of code I didn't write deeply enough tk execute it in my head usually takes longer than writing it myself.
What I can do is read a set of clearly-written functional or end-to-end tests, and see if they look like they should exercise the code written in the PR, and whether the assertions they make are the ones I'd expect, and whether there are any obvious cases that are missing. And, of course, I can look at CI and see whether said tests have passed.
Attack Dogs
I mentioned previously that coding agents kind of suck for lots of people. As of January 2026, coding agents lack the long-horizon skills needed to produce effective codebases independently.
However, it's clear to anyone who has used modern coding models - Claude Opus 4.5, GPT 5.2-Codex, hell even GLM 4.7 (open source) - that they are smart, knowledgeable, agentic, and tenacious in a way that is almost uncanny.
Setting Claude Code on a problem with "--dangerously-skip-permissions" feels like letting an attack dog off the leash. It sprints straight at the problem and attacks it with the terrible certainty of something that has never known hesitation, all the violence of its training distilled into pure forward motion.
Which is fine as long as there isn't a fence in the way.
Rather than expecting the attack dog to catch the perp, cuff him, bring him in, and file the relevant papers independently - we can repurpose its knowledge, power, and tenacity as an extension of our own will. The interface of Saying What You Want combines with the utility of the model to present a new view on a codebase.
Codebase Interfaces
The most common interface to a codebase is a text editor. VSCode, Notepad++, IDEA, Vim, etc. You select the file you want to read and it presents a window of text which you can scroll and edit by interacting with your keyboard and mouse to add/remove characters. Maybe it has some functions like find/replace, find symbol references, rename symbol, git integration, DB querying, test runner, build automation, etc.
Text editors are pretty good. The majority of all code ever produced prior to 2025 went through a text editor. Code generation exists, but it's really more of an amplifier for text editor-produced code. Visual programming interfaces exist, but no one likes them because they suck (okay some people like them, sorry Scratch).
Text editors give you one view of the code. A very low-level, raw view of the code. Like reading "SELECT * FROM table" output. You can read the functions, classes, variables, etc. and produce a model at a higher level of abstraction (see Object Oriented Programming, Domain Driven Design, etc.). Then, you make changes at that higher level of abstraction, and translate them back down to key presses in your text editor.
Coding agents can give you a view of a codebase that is already on that higher level of abstraction. You can say:
And get back an accurate diagram of the data flow structure of the system. Then you can say:
And get back a correct answer. Then:
And the plan will be wrong. Probably. Sometimes you get lucky and it's right. But that's okay, you're a skilled engineer who's been tapping keys on a keyboard to manually add/remove individual characters from codebases for years. You can read a PR and tell if it actually accomplishes what is says it does, right? There's definitely a new skill to be learned here, which senior engineers with experience reviewing junior PRs already have a head start on. Worst case scenario, you break the plan down into small parts and go over them individually with the agent.
But hey, don't despair, in a couple of years the models will probably have improved enough to get the plans right first try, too.
Operating at a higher level of abstraction like this has a number of benefits. Subjectively, I find that:
To finish, a few (edited for ease of understanding) prompts from my recent history to give some concrete ideas on how this can be used: