Automated real time monitoring and orchestration of coding agents

kaivu; leni

This is a linkpost for https://fulcrumresearch.ai/2025/10/22/introducing-orchestra-quibbler.html

Fulcrum is excited to open-source two new tools:

Quibbler is a critic for your coding agent.
Orchestra is a multi-agent coding system: it uses a designer agent that spawns and coordinates your parallel coding agents.

Motivation

Human attention is a scarce resource.

In the best case, coding with agents allows that attention to be spent on the “right” parts of your code: its functionality, architecture, and its failure-modes. In the worst case, your attention is spent on the behavior of your agents: preventing them from taking unsafe actions, asking them not to reward hack, and understanding if their outputs are trustworthy.

Can we use agents to help us out?

Quibbler

Quibbler is a background agent that monitors and critiques your coding agent’s actions using hooks. Unlike most critics and guardrails, Quibbler is an agent: it can read and understand the context of an agent’s action to see if it made a mistake.

We’ve found Quibbler useful in preventing agents from

fabricating results without running commands
not running tests,
not following your coding style

In longer running tasks, we found Quibbler useful in enforcing intent, allowing us to check in on our agent less. You can configure your guardrails, and Quibbler learns from your usage. Quibbler currently supports Claude Code: we are adding support for other agents soon!

Orchestra

Orchestra is a step towards true multi-agent coding: with parallel execution, active coordination, and full visibility of your coding agents. You plan with a designer agent, which spawns executors that work in isolated environments. When an executor needs help, it messages the designer agent (which gets your input if needed).

We’ve used Orchestra to:

Iterate on the plan of a complex feature implementation and decompose it into bite sized subtasks
Work on multiple independent features at the same time
Implement features in a best-of-n style, where the designer merges the best result in after reviewing the code with you.
Conduct automated quality or security review on multiple vulnerabilities

Orchestra's oversight features -- model to model orchestration and monitoring -- are what actually makes it possible for parallelization to be useful, and not destructive.

Looking Ahead

What are the properties of the interfaces that we will use to manage agents in real time? Here are some ideas:

Agentic validation: since human attention is the most expensive resource, we can spend agents compute liberally on falsifying or critiquing other agents’ results.
Tree-like structure: users see high level context by default, but they should be able to “dig” into the work agents are doing in arbitrary depth.

Real-time oversight of agent systems will be critical as agents scale beyond coding. Live monitoring of agents in real-world settings might also teach us lessons about AI oversight that can later be applied to x-risk mitigation.

If you’re interested, we’d love to hear from you. We're hiring!

LESSWRONG
LW