The Problem That Never Changed
C² Faction HQ is a command center for a Torn.com faction -- real-time ranked war tracking, organized crime coordination, member analytics, and payout accounting, all built on top of the game's API and its 100-requests-per-minute rate limit. The requirements have been stable for the entire life of the project. What changed three times was my opinion about how much architecture those requirements deserve.
Round One: Next.js and Five Containers
The first build was Next.js 16 with the App Router, PostgreSQL 17, Redis 7, and BullMQ workers, deployed as Docker Compose on a VPS: app, worker, postgres, redis, and a one-shot migration container. It started on better-sqlite3 and migrated to PostgreSQL in February 2026 when concurrent access became a real problem.
Version one got the hard parts right, and those parts have survived every rewrite since:
- Multi-tenant faction isolation. Every query scoped by faction ID, enforced in middleware rather than trusted to route handlers.
- API key handling. Members hand over personal Torn API keys, which grant access to sensitive in-game data. AES-256-GCM at rest, keys never returned to the frontend, decrypted only when a worker needs them.
- Rate limit discipline. A Redis token bucket per API key, with timestamp-based incremental fetching so a multi-hour war does not reprocess thousands of duplicate attack records.
- RBAC mapped to in-game positions. 17 granular permissions configured against faction ranks the leaders already understand, instead of an invented role hierarchy.
The complaints were real but modest. Five containers is more operational surface than a homelab tool wants, the worker and the web app shared one Node runtime's performance ceiling, and test coverage lagged behind feature pace. None of that justified what I did next.
Round Two: The "Do It Properly" Rewrite
Next-gen was the full rewrite: SolidJS frontend, Rust backend, and a microservice architecture -- an Axum REST gateway translating to 28 containers of tonic/prost gRPC services, one per feature module (war-tracker, ocs, profiles, properties, gym, armory, racing, stocks, and so on), plus PostgreSQL 17 with row-level security, Redis, nginx, and a CI/CD pipeline through GitHub Actions and GHCR.
I want to be fair to it, because it was not a failure in the usual sense. It shipped. It runs in production at c2fhq.com today, it has carried the faction through real wars, and its patterns -- RLS with SET LOCAL per transaction, the 16-byte-IV AES-256-GCM scheme kept wire-compatible with the original Node implementation, snake_case JSON contracts -- are battle-tested. The feature surface also grew enormously: dashboards with 18 widgets, a gym training simulator, education tracking, a userscript repository.
The architecture was still wrong, and I learned that the expensive way. Microservices solve an organizational problem: independent teams shipping independently deployable units. I am one person. The bill came due during the ranked war of 2026-05-19. Every member running the faction-communicator userscript held a live SSE connection back to the gateway, with Redis pub/sub bridging the gateway to the war and chain services behind it. With 30-plus users hammering call-target, call-rally, and push-orders for two hours, the gateway's single-threaded SSE fan-out pegged one core at 107-129% -- one core saturated, three sitting idle, on a 4 vCPU / 3.8 GiB box. The same design made restarts a thundering herd of reconnecting userscripts, so deploying during a war was simply forbidden. And every feature paid a distributed-systems tax -- a .proto definition, gRPC plumbing, cross-service calls for things like key decryption -- to move data between modules that always deploy together anyway.
Round Three: The Retreat
v2 is explicitly a port, not another rewrite -- the functionality already exists and is proven in next-gen, and v2's job is to move it, not reinvent it. Twenty-eight containers become three systemd-managed binaries: the 28 gRPC services collapse into one Rust monolith (c2-api, Axum 0.8) with internal modules calling plain functions instead of RPC; background jobs become a single workers binary using Postgres advisory locks for leader election; and the SolidStart frontend ships as a Node SSR process. Docker survives only as a local-dev convenience for Postgres and Redis. The real-time hot path -- the fan-out that actually pegged the CPU -- moves off the VPS entirely, to Cloudflare Durable Objects at the edge: one WarDO per active war, one ChainDO per chain, speaking WebSockets instead of SSE. The DO code is complete; wiring the userscripts through the edge gateway is the part still in flight, and the war-day CPU claim stays unproven until it is. Work happens on the v2-port branch, live at dev.c2fhq.com.
The project charter is three rules I wrote down after a scope-drift audit, because rewrites lie to themselves:
- Full feature parity before cutover. Every v1 capability ships in v2 on day one. "We'll port it later" is banned -- it is how rewrites quietly become regressions.
- The point is operational, not architectural. v2 only counts as delivered if peak war-time CPU is measurably lower than the microservice build at the same concurrency -- which means the Durable Object path has to be live and measured under a real war, not just written. Prometheus and Grafana are in the stack specifically to hold that claim to evidence.
- No cold start. Every record -- war history, audit logs, encrypted API keys, payouts -- migrates. Members do not re-enter keys; leadership does not reconfigure permissions.
Three Versions, Side by Side
| v1 (c2-faction-hq) | Next-gen (c2-next-gen) | v2 (c2fhq-v2) | |
|---|---|---|---|
| Frontend | Next.js 16 App Router | SolidJS / SolidStart | SolidStart (carried over) |
| Backend | Node, API routes + BullMQ workers | 28 Rust gRPC services + Axum gateway | One Rust Axum monolith + one workers binary |
| Real-time | 30-second polling, SSR refresh | SSE fan-out from the gateway (Redis pub/sub) | WebSockets via Cloudflare Durable Objects at the edge |
| Data | PostgreSQL 17 + Prisma, scoping in ORM middleware | PostgreSQL 17 + SeaORM, row-level security | Same RLS Postgres, self-hosted |
| Deploy | Docker Compose, 5 containers | Docker Compose, ~32 containers, GitHub Actions to GHCR | 3 systemd units, no Docker |
| Status | Superseded | In production at c2fhq.com | Active port on v2-port, live at dev.c2fhq.com |
What Each Version Taught the Next
The Next.js build proved the domain model: the entities, the rate-limit strategy, the encryption scheme, and the position-mapped RBAC all survive into v2 nearly unchanged. The Rust rewrite proved the technology -- Rust plus SolidJS is the right pair for this workload, and RLS is a better tenant boundary than ORM middleware -- while disproving the topology. v2 keeps the languages, keeps the patterns, and deletes the distribution.
The monolith was available the whole time. Choosing it in round three was not a compromise; it was the first time I sized the architecture from measured load instead of from ambition. A faction tool with a few dozen concurrent users during a war does not need independent deployability. It needs to be debuggable at 2 a.m. mid-war by one person, and one binary with one log stream wins that contest every time.
What I Would Do Differently
I would skip next-gen's topology entirely. The honest path from version one was: keep the boundaries as modules, rewrite the hot paths in Rust inside one process, and move the WebSocket fan-out to the edge -- which is exactly v2, reachable without building and operating 28 services first. The rewrite was partly resume-driven: I wanted to have built a Rust microservice system more than the problem wanted to be one.
I would also write the three commandments at the start of any rewrite, not after a gap audit. Parity, a measurable operational target, and a data migration plan are the difference between a rewrite and an expensive sibling. And I would distrust any architecture decision I cannot tie to a number. "107% CPU during a war" justified v2 in one sentence. Round two never had a sentence like that.