Skip to content

Round Runbook

Round Runbook

Operational steps for deploying, rolling back, swapping models, and provisioning capacity for the round service.

Topology

  • Single Railway service beef-round, region us-east4-eqdc4a, 1 replica (apps/round/railway.json).
  • Builder: Dockerfile, watch path /apps/round/**.
  • Healthcheck: HTTP GET /health on GRPC_PORT (default 8080), timeout 120 s.
  • Restart policy: ON_FAILURE, max 10 retries.
  • Callers: sirloin (gRPC), brain (gRPC). No public ingress.

CPU-only; no GPU node provisioning today (see services/round-env).

Deploy

Round deploys via Railway from main on push, exactly like other services. There is no separate release workflow under .github/workflows/ for round.

Standard flow:

  1. Open a PR touching apps/round/** (or proto/round/v1/** if RPC shape changes).
  2. CI runs make lint and make run-tests (with -race).
  3. Merge to main. Railway picks up the change via watchPatterns and builds the Dockerfile.
  4. Build downloads model files from R2 / HuggingFace into the runtime image. Watch for non-zero exit on the model-download stage — the Dockerfile fails the build if a download produced a 0-byte or LFS-pointer stub.
  5. Healthcheck GET /health must return 200 within 120 s of the new container starting. Health flips to SERVING only after loadModels completes (apps/round/cmd/app/main.go), so failures inside loadModels will stall the deploy.

Verify a deploy:

Terminal window
# from the repo root
railway status --json | jq '.services[] | select(.name=="beef-round") | .latestDeployment'
# or via grpcurl against the public-internal hostname (only inside the network)
grpcurl -plaintext round:8080 round.v1.RoundService/ListModels

Cross-check logs in axiom for Models loaded successfully and gRPC+health listener accepting connections.

Rollback

Railway exposes a one-click rollback to the previous successful deployment. Use it when:

  • Healthcheck is failing post-deploy.
  • A model file was swapped to a bad URL and round is now producing INTERNAL for one model_id.
  • A proto change is incompatible with the live sirloin / brain clients.

Steps:

  1. Railway dashboard → beef-round → Deployments → previous green build → Redeploy.
  2. Confirm /health returns 200 and ListModels returns the expected set.
  3. Revert the offending commit on main so the next deploy does not re-introduce the bug.

If the bad deploy was a proto change, also redeploy any caller services that already shipped with the new client stubs (sirloin / brain) so their request shape matches the rolled-back round.

Model swap

Changing a model URL or version goes through the build, not a runtime config. The two-stage flow:

  1. Update the relevant default URL in apps/round/Dockerfile (RETINAFACE_MODEL_URL, LVFACE_MODEL_URL, MODELS_BASE_URL) or the embeddings URLs in code / Railway env (ROUND_EMBEDDINGS_MODEL_URL, ROUND_EMBEDDINGS_TOKENIZER_URL). See services/round-env for the full list.
  2. Bump any version metadata in services/round-models so the doc reflects what is actually serving.
  3. Open a PR; let CI build and Railway deploy.
  4. After the deploy is green, call ListModels and confirm version and model_id match the new spec.

For one-off A/B testing without redeploying:

  • Set the override env at the Railway service level (e.g. ROUND_LVFACE_URL=https://...).
  • Trigger a redeploy so the runtime fetches the new file into MODEL_CACHE_DIR.
  • Watch heap_alloc_mb in the resource snapshot logs for unexpected growth.

There is no hot-swap path. The registry is built once per process, so a model change always implies a restart.

flowchart LR
edit[Edit Dockerfile / env] --> pr[Open PR]
pr --> ci[CI lint + tests]
ci --> merge[Merge to main]
merge --> build[Railway Docker build<br/>downloads ONNX]
build --> healthcheck["/health 200"]
healthcheck --> serving[SERVING flag flipped]
serving --> done[Live traffic]

Capacity

There is one replica today and no horizontal autoscaling. To scale:

  1. Edit apps/round/railway.jsonmultiRegionConfig.us-east4-eqdc4a.numReplicas.
  2. Mirror the change in the Railway service config (it does not auto-follow the file — see operations/railway).
  3. After scaling, watch FD and goroutine counters in axiom. Each replica re-loads all ONNX models into RAM, so memory cost scales linearly.

If GPU is ever introduced, capacity provisioning becomes a separate decision tracked by an ADR; today the answer is “more CPU replicas”. TODO(@law): confirm the Railway plan tier sustains the memory ceiling needed for both face models (RetinaFace mv1_0.25 + LVFace-B Glint360K, see apps/round/Dockerfile) loaded simultaneously — the plan/tier is not declared in apps/round/railway.json.

Common operations

Restart the service

Railway dashboard → beef-roundRestart. Round handles SIGTERM gracefully:

  1. Health flips to NOT_SERVING immediately.
  2. gRPC server stops accepting new streams; in-flight RPCs are given up to 30 s (shutdownTimeout).
  3. HTTP server gets a 10 s grace on Shutdown.

Inspect live RPCs

grpcurl -plaintext round:8080 list (reflection is enabled). Then Health/Check, ListModels, or a small Infer payload.

Tail logs

Filter axiom for service:round. Useful queries:

  • service:round msg:"Inference failed" — surfaces INTERNAL errors with model_id.
  • service:round msg:"Resource usage snapshot" — periodic monitoring lines, every 30 s.
  • service:round level:warn msg:"High file descriptor usage detected" — FD pressure.

Disaster scenarios

ScenarioFirst actionEscalation
Deploy stuck — healthcheck never returns 200Check Railway build logs for model-download failures or Failed to load models.Roll back to previous deployment.
All Infer calls returning INTERNALCheck axiom for Inference failed and ONNX init errors. Confirm model files are present in MODEL_CACHE_DIR.Roll back; restart with verified model URLs.
OOMKilled loopInspect Resource usage snapshot for memory trend before the kill.Bump Railway memory tier; consider unloading the optional face-embedding model.
Caller reports UNAVAILABLE stormsConfirm replica count and that MaxConcurrentStreams (100) is not the bottleneck under load.Add a replica; coordinate keepalive defaults with sirloin / brain clients.

See services/round-oncall for alert thresholds and paging routes.