Human-in-the-Loop Agent Task Management for Claude, Gemini & ACP Agents
Back to Blog
April 28, 2026 | AgentRQ Team

Supervisor and Workers: The Architecture Behind Self-Growing Agentic Systems

Here is the short answer: a supervisor that sees everything, and workers that know only their own domain. That separation — enforced at the MCP protocol level — is what allows an agentic system to scale, specialize, and improve without becoming brittle.

If you want to understand why that works, it starts with a question most teams skip: *what is the right unit of isolation for an AI agent?*

The answer shapes everything else. Get it wrong and your agents share too much context, interfere with each other's memory, or require centralized coordination that collapses under load. Get it right and each agent becomes a self-contained learning unit that compounds over time — and the overall system improves without you explicitly managing it.

AgentRQ two-tier supervisor and worker MCP architecture

The Two Tiers

AgentRQ exposes two levels of MCP server, each designed for a different kind of agent.

The supervisor connects at https://mcp.agentrq.com/mcp. It has account-wide visibility: all workspaces, all tasks, all agent status, all history. An agent using the supervisor MCP can delegate work to any workspace, monitor progress across all of them, move tasks between workers, approve outputs, and observe patterns that no individual worker ever sees. The supervisor is the system's organizational layer.

Workers connect at https://.mcp.agentrq.com/mcp — one URL per workspace. Each worker is a persona: a scoped agent with a specific mission, its own memory, and a dedicated task inbox. A worker does not know about other workspaces. It does not have access to the supervisor's global view. It knows exactly one thing: its mission, its current tasks, and what it has learned from previous work in this workspace.

This is not a constraint. It is the architecture.

Why Bounded Context Is the Feature

The moment you give an agent access to everything, it must decide what to pay attention to. That decision is both expensive and error-prone. A worker scoped to backend code review has a memory shaped entirely by backend code reviews — it learns the codebase's naming conventions, the recurring anti-patterns, the edge cases that keep appearing in PRs. None of that context is available to a general-purpose agent processing a mixed queue.

Bounded context creates two compounding effects.

The first is specialization: a worker that only does one kind of task gets better at that task faster than a generalist agent. Each completed task is a signal. Approved outputs reinforce patterns. Rejected outputs trigger memory updates. The domain-specific memory grows precise.

The second is semantic cleanliness: when a workspace updates its memory — automatically from task outcomes or via explicit instruction from the supervisor — the update is scoped. There is no risk of a code review insight corrupting a content writing workspace. Each worker's mental model stays coherent.

Hidden inside this design is something that only becomes visible at scale: workers can be evaluated independently. The supervisor can observe which workspaces consistently produce high-quality outputs, which ones require frequent revision, and which ones have developed genuine expertise. That signal is invisible from within any individual worker — it only exists because the supervisor and workers are separate tiers.

Worker sandbox — what a worker can and cannot see

The Closed Loop at the Worker Level

Every workspace runs a closed loop automatically.

  1. A task arrives in the workspace inbox
  2. The agent executes against its mission and current memory
  3. The output surfaces for review — by a human or by the supervisor
  4. The review generates a signal: approved, rejected, modified, escalated
  5. That signal updates the workspace memory
  6. The next task is handled by an agent that knows a little more than it did before

This loop requires no external intervention. It runs inside the sandbox, driven entirely by the feedback on completed tasks. The compound effect accumulates task by task.

The practical consequence is that month three looks different from day one — not because the underlying model changed, but because the workspace memory has been shaped by hundreds of real task outcomes. The agent has, in effect, been trained on your specific context, your standards, and your patterns. Without fine-tuning. Without a training pipeline.

The Closed Loop at the Supervisor Level

The supervisor runs a second loop at a higher level of abstraction.

It observes patterns across all workers simultaneously. Which workspaces are handling their tasks efficiently? Which ones have high revision rates? Which task categories don't have a dedicated worker yet? Where is a worker operating outside its original mission scope?

The supervisor can act on those observations directly: assign more tasks to high-performing workspaces to accelerate their learning, update the mission of a struggling workspace to narrow its scope, create a new workspace for an emerging category, or redirect tasks from a worker that is being misused.

This is coordination without micromanagement. The supervisor doesn't tell workers how to do their jobs — it manages the allocation of work across them and updates the conditions under which each worker operates.

The Self-Growing System

The architecture becomes genuinely self-growing when the supervisor is itself an automated agent.

A supervisor agent — running against https://mcp.agentrq.com/mcp — can identify task categories that are growing in volume and create new workspaces to handle them. It can observe that a workspace's approval rate has reached a threshold and expand its mission scope. It can detect that two workspaces are handling similar tasks and merge their memory into a single specialized worker.

Workers don't need to know any of this is happening. They receive tasks, execute, receive feedback, and update. The supervisor handles the organizational evolution of the workforce.

The result is a system where the capability envelope expands over time without requiring step-by-step human direction. The human's role shifts from managing individual tasks to approving direction. The approval gates — built into the worker feedback loop — become the steering mechanism for a system that handles the execution itself.

Self-growing closed loop — supervisor and worker compound improvement

What Compounds

Specialization compounds. Memory compounds. Routing efficiency compounds.

A team running three workers — backend review, content, and data analysis — accumulates domain-specific memory in all three areas simultaneously. At month six, the backend reviewer knows the codebase's specific failure modes. The content writer has internalized the brand voice. The analyst has calibrated to the team's preferred output format.

None of that happened through explicit training. It happened because every approved task added a signal, and every signal shaped the next output. The system that runs at month six is operating on the same models as the system that ran on day one. What changed is the accumulated context — the mission-shaped, task-validated, feedback-pruned memory that lives in each workspace.

That accumulated context is the moat. It cannot be replicated by spinning up a fresh agent. It cannot be transferred to a competitor's platform. It is specific, earned, and persistent.

Getting Started

Connect a supervisor agent by pointing any MCP-compatible client at https://mcp.agentrq.com/mcp. Create workspaces through the AgentRQ dashboard, each with a scoped mission. Each workspace gets a unique worker URL (https://.mcp.agentrq.com/mcp) for the agent that operates within that domain.

The rest — feedback accumulation, memory updates, cross-workspace visibility from the supervisor — is built into the protocol. You define the missions. The architecture handles the compounding.

The system you have at the end of the year will not look like the system you started with. That is the point.

Start Free