Skip to main content
Back to Blog

Loom Mills: From Agent Swarms to Software Production Lines

6 min read

labloomloom-coreagentsorchestrationdeveloper-toolsmcphud

The next Loom Core feature I'm most excited about started with the codename Hive.

That name was directionally useful. It pointed at many agents working together, division of labor, and a control plane that keeps the work moving while the laptop sleeps.

But it also started to feel like the wrong metaphor for Loom. Loom's language is about threads, patterns, shuttles, fabric, and production. The thing we are building is less "a swarm buzzing around" and more "a mill floor with specialized stations, visible quality gates, and a manager who can stop the line before bad work ships."

So the public name I want to move toward is Loom Mills.

The code may keep hive names for a while because stable CLI/API names matter. The brand and product story should not be trapped by the first internal codename.

TL;DR

  • Loom Mills is the planned always-on agent orchestration layer above Loom's existing daemon, HUD, weaver, spawn, MentatLab, devbox, and agent-context pieces.
  • The first implementation used the internal codename Hive: a cluster-resident operator, SQLite-backed canonical state, council planning, gated pipelines, and HUD visibility.
  • The v2 direction adds persistent domain teams, adversarial audit, cross-repo coordination, debate mode, bounded recursion, cost preview, adaptive policy proposals, and mobile visibility.
  • "Mills" fits the Loom metaphor better than "Hive": work moves through stations, quality gates, and production lanes, not just a swarm.
  • The practical goal is not magic autonomy. It is boring repeatability: plan, route, build, verify, review, merge, attribute, and learn.

Why Another Layer?

Loom Core already does a lot:

  • one loom proxy entrypoint for several MCP clients,
  • loomd for server lifecycle, routing, audit, cost, and health,
  • HUD views for agents, sessions, tasks, traces, servers, sandboxes, and workflow state,
  • mcp-agent-context for presence, sessions, tasks, memory, handoffs, and worktrees,
  • mcp-devbox for project-aware sandbox execution,
  • weaver and spawn machinery for headless agent work.

That is enough to make one agent productive.

It is not quite enough to make many agents useful without turning the operator into a traffic cop.

The missing layer is production coordination:

  1. Decide which work is worth doing.
  2. Split it into bounded slices.
  3. Route each slice to the right kind of worker.
  4. Run tests and policy gates before promotion.
  5. Open and merge changes without losing auditability.
  6. Attribute outcomes back to the planning loop.
  7. Keep the whole thing visible in the HUD.

That is the Loom Mills job.

The Shape: Council, Lines, Squads, Audit

The initial design has two big motions.

First, a Council reads roadmap intent, planning notes, recent merges, backlog state, alerts, and operational signals. It produces concrete backlog deltas and planning artifacts instead of vague "next steps."

Second, a gated Pipeline turns one backlog item into a merged change. The default line looks like this:

plan_slice -> research -> implement -> tests -> pr_self_review -> mr -> ci_watch -> merge -> cleanup

That is the production-line part of the metaphor. A backlog item moves through stations. Each station has inputs, outputs, and promotion criteria.

The v2 direction adds a middle layer: Squads.

Squads are persistent domain-owning teams, starting with two practical examples:

  • hud-frontend
  • gitops

Each squad can own path patterns, test lanes, required gates, reviewer/editor model preferences, budgets, and working memory. A HUD-heavy item should not route like a Flux/Kustomize item. The system should learn that difference.

Then there is the Audit lane.

Audit is intentionally lateral. It is not the same judge used by the production line. It scores council artifacts and merged pipeline diffs with a different pool and rubric. In v2.0 that audit is advisory: it opens follow-up issues, records survival scores, and makes risk visible. After enough low-noise runs, it can become blocking.

In Mills language:

  • Council decides what pattern to weave next.
  • Squads are specialized rooms on the floor.
  • Pipeline is the production line.
  • Gates are inspection stations.
  • Audit is quality control with its own clipboard.
  • HUD is the floor board.

Dry, but useful. My favorite kind of product architecture.

Why Cluster-Resident?

The important implementation choice is that this does not run as a long-lived process on the Mac.

The operator lives in k3s. The Mac-side loom CLI is a client.

That matters because agent orchestration can be slow, bursty, and stateful. Laptops sleep. Terminal sessions die. Local processes get restarted in the middle of a conversation. A production layer should not disappear because I closed the lid.

The planned operator keeps canonical state in SQLite on a Longhorn-backed persistent volume. GitLab issues and .loom/backlog/*.yaml become derived or synchronized views. That gives the system a source of truth even when GitLab, a model provider, or the local machine is unavailable.

The early internal shape is intentionally simple:

Mac CLI / HUD
    |
    v
loom-mill operator in k3s
    |
    +-- SQLite canonical state
    +-- policy + budgets
    +-- council runs
    +-- pipeline runs
    +-- gate outcomes
    +-- eval scores
    +-- audit findings

The old codename shows up in branches, docs, commands, and metrics as hive. That is fine for now. Internal compatibility buys time. Public language should still improve.

What "Better Than Swarms" Means

I have never loved "agent swarm" as a production term.

It describes quantity, not control.

The hard part is not getting ten agents to do ten things. The hard part is making sure the ten things are:

  • scoped,
  • non-overlapping,
  • budgeted,
  • verified,
  • attributable,
  • resumable,
  • reviewable,
  • easy to stop.

That is where Mills is a better metaphor. A mill is coordinated throughput. It has stations, operators, inspection, downtime, logs, and maintenance. It can run continuously, but only because the process is constrained.

For agentic engineering, constraints are the product:

ProblemMills answer
Agents duplicate workRoute by squad, path, ownership, and recent outcomes.
Agents touch overlapping filesAllocate worktrees and surface file claims.
Plans rotFeed downstream merge and regression outcomes back into the council brief.
Bad changes slip throughRun gates, self-review, CI, and adversarial audit.
Costs driftBudget per council run, pipeline run, squad, and day.
Operators lose trackPut status, traces, queue depth, budgets, and audit findings in HUD.

That is not "autonomy" as a vibe. It is autonomy as a production system.

The Branding Change I Want

Here is the naming direction I'm going to use in product copy:

Old/internalNew/publicWhy
HiveLoom MillsFits Loom's textile/production metaphor.
SwarmMill floor / agent floorEmphasizes coordinated work, not raw activity.
SquadWorkroom / squad"Squad" is still useful for domain teams; "workroom" may fit UI later.
PipelineProduction lineSame technical meaning, clearer metaphor.
GateInspection stationKeeps the quality-control model obvious.
CouncilCouncilThis one still works; it is planning, debate, and prioritization.

I would not mass-rename code yet. loom hive status can stay as a compatibility command until the feature is stable enough to carry an alias like loom mills status or loom mill status.

The right near-term move is:

  • product/blog/docs say Loom Mills,
  • architecture notes say "formerly/internal codename Hive,"
  • code keeps hive where renaming would create churn,
  • new UX labels prefer Mills language when they are not part of a stable API.

What I'll Watch

The feature is only useful if it improves real delivery. The metrics I care about are practical:

  • time from backlog item to merged change,
  • cost per merged change,
  • gate pass rate by squad,
  • post-merge regression rate,
  • number of escalations that produced useful handoffs,
  • audit findings that survived human review,
  • percentage of plans that led to downstream shipped work.

Those measurements keep the system honest. A mill that produces lots of fabric nobody can use is just expensive noise.

Where This Goes

The next Loom Core arc is about turning the agent fleet from "visible" into "operable."

Visibility was the first step: sessions, presence, spawned agents, traces, costs, tasks, sandboxes, and server health all need to be observable.

Mills is the next step: the system should coordinate work, route it to specialized teams, gate it, audit it, and learn from outcomes.

The name matters because metaphors shape product decisions. "Hive" pushes toward swarm behavior. "Mills" pushes toward production discipline.

For Loom, that is the better future: not more agents doing more random work, but a visible floor where each station has a job, each promotion has evidence, and the operator can understand the whole run without reading every thread by hand.

Related Articles

Comments

Join the discussion. Be respectful.