The Orchestration Layer ─ David Hurley

The previous post argued that the evaluation function is the most important decision in an agent-native company. That post was about what to measure. This one is about what to show.

A self-evolving agent that handles waste hauling dispatch, insurance claims, care coordination, or construction lending does not need a traditional software interface. It does not need a table of records. It does not need a sidebar of filters. It does not need a dashboard with twelve charts that nobody looks at after the first week. What it needs is a way to surface the right information to the right human at the right moment, and then get out of the way.

That surface is the orchestration layer. It is the product your customer actually touches. And almost nobody is designing it well, because almost everybody is designing it as if the human is still doing the work.

The old interface assumed the human was the operator

Every SaaS dashboard built in the last twenty years shares the same assumption: the human is the one performing the workflow, and the software is the environment in which they perform it. Columns, rows, status badges, action buttons, bulk operations, search, filter, sort. The interface is a workspace. The human sits inside it for hours.

When the agent does the work, that interface becomes overhead. The human is no longer operating the system. The human is supervising the agent that operates the system. The information needs are completely different.

An operator needs to see everything because they are deciding what to do next. A supervisor needs to see almost nothing, because the agent is deciding what to do next. The supervisor needs to see exactly three things: what just happened that requires attention, what is about to happen that requires approval, and what went wrong that requires correction.

Everything else is noise. And noise in a supervisory interface is worse than noise in an operational one, because it trains the human to stop looking.

Three surfaces, not one

The orchestration layer has three distinct surfaces. Each one answers a different question at a different cadence. Here is what they look like in practice:

Decision requiredJust now

Vendor X has not confirmed for Job #4721.
Delivery: tomorrow, 8:00 AM.

Two alternatives available:
  → Vendor Y — $40 more, confirmed
  → Delay to Thursday — 94% confirm rate

[Approve substitute]  [Delay]  [Call customer]

The interrupt. Something happened that the human needs to know about right now. The agent encountered a situation outside its delegated authority. A margin calculation came back negative. A vendor failed to confirm. A customer wrote in Spanish and the communication style prompt has not evolved bilingual handling yet. A regression was detected in the last evolution cycle.

The interrupt is not a notification. Notifications are the failure mode of every application that could not decide what mattered. The interrupt is the product's editorial judgment about what deserves to break the human's attention. Getting this wrong in either direction is catastrophic. Too many interrupts and the human ignores them, which is the same as having no orchestration at all. Too few and the human misses something consequential, which destroys trust.

The design pattern: an interrupt should carry the decision, not just the data. Not "Vendor X did not confirm for Job #4721." Instead: "Vendor X has not confirmed for Job #4721, delivery scheduled tomorrow at 8am. Two alternatives are available: Vendor Y at $40 more, available and confirmed, or delay to Thursday when Vendor X's confirmation rate is 94%. Approve substitute, delay, or call the customer?" This is the pattern Reeve builds for dispatch operations: the agent frames the decision, the human makes the call.

The human's job is to choose. The agent's job is to frame the choice.

The summary. A periodic compression of what the agent has been doing, designed for the human who checks in once a day or once a week. The summary is the trust-building surface. It is how the human develops confidence that the agent is handling things correctly without watching every task.

The evolution report I described earlier in the series is one version of this: here is how your agent changed this quarter, here are the business rules it discovered, here are the edge cases it learned to handle. But the operational summary is more frequent and more granular. Today the agent processed 14 jobs, dispatched 12 without intervention, escalated 2, and one escalation was a new edge case that has been flagged for the next evolution cycle.

The design pattern: summaries should get shorter over time, not longer. As the agent handles more cases autonomously, there is less to report. A summary that grows longer every week is a sign the agent is not improving. A summary that shrinks to three lines is a sign the system is working. The product should treat that shrinkage as a feature, not a bug. The customer who opens a summary that says "nothing required your attention today" should feel relieved, not suspicious.

The audit. The complete, detailed, searchable record of everything the agent did, why it did it, what resources it used, and what version of each resource was active at the time. The audit surface is not for daily use. It is for the moment when something goes wrong, or when a regulator asks, or when the customer wants to understand why the agent made a specific decision three weeks ago.

The sovereignty post described the provenance infrastructure that makes this possible: version lineage, tenant isolation, evolution history. The audit surface is the product expression of that infrastructure. It exists so the interrupt and summary surfaces can be sparse. The human does not need to see everything in real time because they know they can reconstruct anything after the fact.

The design pattern: the audit should be navigable by question, not by timestamp. "Why did the agent select Vendor Y for this job?" should produce the specific decision trace, the resource versions active at that moment, the evaluation score, and the business rules that applied. Chronological logs are for engineers. Decision traces are for the humans who are accountable.

Each verb has a different pattern

The Monday Morning post described four categories of work that become more valuable as agents absorb the rest: create, relate, steward, and rest. Each one needs a different orchestration pattern because the human's relationship to the agent is different in each.

Create: the conductor. The human directs. Agents execute sections. The interrupt: "I have three drafts, which direction?" The human must see and override before anything ships. The agent proposes. The human disposes.

Relate: the logistics layer. The agent clears the path so the human can be present. The interrupt: "dinner is Thursday, six confirmed, one allergic to shellfish." The orchestration layer should be invisible during the gathering itself. If the host is checking the app while the guests are talking, the product has failed.

Steward: the decision surface. This is the pattern WithAgency applies to AI agencies. Every screen is a decision with context, alternatives, and the cost of being wrong. The product must make consequences visible at the moment of choice, not afterward.

Rest: the absence. The product's job is to not be there. The interrupt threshold is nearly infinite. A "do not disturb" toggle that the user can override is not rest. A hard lock-out that the user agreed to in advance and cannot bypass is rest.

The interface disappears as the agent improves

There is a trajectory here that is worth naming. In the first week of deployment, the orchestration layer is busy. The interrupt surface fires frequently. The summary is long. The human is learning to trust the agent, and the agent is learning the customer's operations. The interface feels like a cockpit.

By month three, if the self-evolution loop is working, the interrupt surface fires rarely. The summary is short. The audit is deep but seldom needed. The interface feels like a pager in a doctor's pocket: present, available, almost never active.

By month twelve, the orchestration layer is the thinnest possible membrane between the human and the agent. The human checks in. The agent reports. The human approves the rare edge case. The agent evolves. The product's measure of success is how little attention it requires.

This is the opposite of every engagement metric that the software industry has optimized for over the last two decades. Time on screen, daily active users, sessions per week. The orchestration layer for an agent-native company should optimize for time away from screen. The best product is the one the customer barely touches because the agent is handling everything and the human trusts it to do so.

The measure of a great orchestration layer is not how much the human sees. It is how confidently the human looks away.

What to build first

If you are designing the orchestration layer for an agent-native product, start with the interrupt. It is the surface that determines whether the human trusts the system. Get the interrupt wrong and no amount of summary or audit quality will save you, because the human will either be overwhelmed or blindsided, and both destroy confidence.

The first version of the interrupt should be too conservative. Surface too much. Let the human tell you what they do not need to see. Then remove it. The evolution of the orchestration layer mirrors the evolution of the agent itself: it starts broad and narrows as trust compounds. The difference is that the agent narrows by learning. The orchestration layer narrows by listening to the human who is using it.

Build the summary second. Build the audit third. Ship all three before you ship the agent to a paying customer, because the customer will ask "what is it doing?" before they ask "is it doing it well?" The orchestration layer is how you answer the first question. The evaluation function is how you answer the second.