Multi-Agent Homelab Automation: What Worked and What Was Overkill

Why Multiple Agents Instead of One Script

Homelab automation usually starts as a pile of shell scripts and ends as a pile of shell scripts nobody trusts. When I started using LLM agents for lab work, the obvious move was one general assistant with access to everything. I went the other way: a registry of specialized agent definitions, coordinated by a Python orchestrator that executes workflows defined in YAML.

The reasoning was less about model capability and more about scoping. A task like "build a monitoring dashboard for Proxmox, TrueNAS, and UniFi" decomposes naturally: API design, frontend implementation, deployment configuration, and a final production-readiness review. Giving each phase to an agent with a narrow persona and a narrow prompt produced more focused output than one agent context-switching through all of it. It also made the workflow inspectable -- each task has an id, an agent, a prompt, and explicit depends_on relationships, so I can read the plan before anything runs.

How Responsibilities Are Divided

The repo ships agent personality definitions organized by division -- engineering, design, testing, support, and so on (the catalog grew to 50+ definitions, imported from a project called The Agency). In practice the working set was four: a backend architect, a frontend developer, a DevOps automator, and a "reality checker" whose entire job is poking holes in what the others produced.

The orchestrator supports sequential, parallel, and conditional execution modes. Sequential runs tasks one after another and stops on first failure. Parallel runs independent tasks concurrently up to a max_parallel cap while still respecting dependencies. Conditional branching never got finished -- the enum and the dispatch branch exist, but _execute_conditional is a stub that silently falls back to sequential. Worse than missing, honestly: a workflow can declare conditional mode and get sequential behavior with no warning. Two modes covered every workflow I actually ran, which is its own data point about speculative features.

The reality-checker pattern is the one division of labor I'd keep in any future system. A separate agent whose prompt is adversarial ("check security, error handling, edge cases") catches things the building agent glosses over, because the building agent is invested in its own output looking done.

Guardrails Around Touching Infrastructure

Agents that can provision VMs are a different risk class than agents that write code. The guardrails were layered:

Approval gates by default. Workflows print the full execution plan and stop at a Proceed? (y/N) prompt. Autonomous execution requires explicitly passing --auto-approve (the docs call it YOLO mode, which is an honest name).
Dry runs. --dry-run shows the plan without executing anything.
Scoped credentials. Infrastructure access goes through dedicated connectors (Proxmox via API token, TrueNAS and UniFi planned) with credentials in .env, not through agents improvising shell commands. The Proxmox token is a dedicated automation token, not my admin login.
Per-task timeouts and full logging. Every task gets a timeout, and every run writes a results JSON plus per-task logs, so there is always an audit trail of what an agent actually did.

The structural choice underneath all of this: agents never talk to infrastructure directly. They talk to connector code I wrote, and the connector defines the verbs available. For Proxmox that surface is a dozen read operations (cluster status, nodes, VMs, containers, storage, network) plus exactly two writes, start VM and stop VM. There is no allowlist inside the connector beyond that -- the implemented methods are the allowlist, and the API token's permissions back them up. If I extended it, destructive verbs like delete would get an application-level confirmation gate on top of the token, not just a new method.

What Worked

The orchestration spine earned its keep: YAML workflows with dependencies, the approval gate, and the logging. Decomposing a build into agent-sized tasks with explicit handoffs is the part that transferred to everything I've built since -- CLAUDE.md in my workspace still calls this repo "the original architecture reference," and that is the part it references.

The Proxmox connector worked and got real use for VM inventory and lifecycle operations.

What Was Overkill

The 50-agent catalog. Browsing nine divisions of marketing and spatial-computing personas to automate a homelab is absurd in hindsight; four agents did all the real work, and the rest were inventory I maintained for no reason. A management GUI also got built (React frontend, FastAPI backend with WebSocket execution monitoring) -- impressive to demo, but for a single operator the CLI plus YAML files was always the actual interface. The GUI still sits in the repo today, Dockerfiles and compose files included, which says something: code you build for the demo rather than the workflow doesn't get deleted, it gets abandoned in place.

The deeper lesson: in a multi-agent system the durable value is the orchestration contract -- task definitions, dependencies, approval gates, logs. Agent personas are cheap and interchangeable. I over-invested in the cheap part and under-invested early in the part that mattered, which is the connectors and their permission boundaries.

What I'd Do Differently

Start with the connectors and the approval gate, add agents only when a workflow demands one, and treat every persona without a completed workflow behind it as deletable. I would also write the reality-checker's prompt first, before any builder agent exists -- defining what "done and safe" means up front is cheaper than retrofitting it after an agent has already provisioned something.

None of that diminishes what the project did for me. It was where the patterns got worked out, and the systems that came after inherited the good parts without the inventory.

Why Multiple Agents Instead of One Script#

How Responsibilities Are Divided#

Guardrails Around Touching Infrastructure#

What Worked#

What Was Overkill#

What I'd Do Differently#