Round Errors
Round Errors
Round returns errors via gRPC status codes only — never inside InferResponse.output. This page enumerates the codes, the conditions that produce them, and the recovery action expected from callers and on-call.
Status code map
| Code | When | Source | Caller action |
|---|---|---|---|
INVALID_ARGUMENT | Malformed request — empty model_id, missing input, oversized text or image, invalid base64. | services/inference.go::Infer | Fix the client. Do not retry. |
NOT_FOUND | Health Watch for an unknown service name. | services/health.go | Check service name; mostly a developer error. |
INTERNAL | Anything raised below the validator: registry lookup miss, ONNX session error, OOM, model output decode failure. | services/inference.go (catch-all inference failed: %v) | Retry with backoff (idempotent). Page on-call if sustained. |
UNAVAILABLE | Returned by the gRPC stack when the listener is draining or rejecting due to keepalive enforcement. | grpc-go server | Retry with backoff and jitter; respect Railway drain. |
RESOURCE_EXHAUSTED | gRPC server enforces MaxRecvMsgSize = 15 MiB. Oversized frames are rejected before the handler runs. | grpc-go server | Trim payload before retry. |
DEADLINE_EXCEEDED | Caller deadline elapsed mid-inference (round itself does not impose per-RPC deadlines today). | grpc-go server | Increase deadline or shrink input. |
The catch-all wrapping at apps/round/internal/app/services/inference.go is intentionally broad — adding finer-grained codes (NOT_FOUND for unknown model_id, RESOURCE_EXHAUSTED for OOM) is tracked under operational hardening but is not implemented yet. TODO(@law): coordinate with sirloin/brain caller owners before introducing finer-grained codes — today every non-validation failure surfaces as INTERNAL, so callers retry on it.
Failure scenarios
Model not loaded
Symptom. Caller sends model_id that is not in the registry. registry.Infer returns an error; the handler wraps it as INTERNAL: inference failed: model not found: <id>.
Why. The registry is populated once at boot by cmd/app/main.go::loadModels. Models are not hot-loaded.
- The
embeddingsandface-detectionmodels are required: failure to load them is fatal at boot (logger.Fatal). - The
face-embeddingmodel (LVFace) is optional: if its ONNX file is missing,loadModelslogs a warning and continues. Calls tomodel_id=face-embeddingwill then returnINTERNALuntil the file is present and the service is restarted.
Recovery. Verify the file is at MODEL_CACHE_DIR/lvface/LVFace-B_Glint360K.onnx (or its overridden URL is reachable), then restart the deployment. See the rollback steps in services/round-runbook.
Out-of-memory / resource exhaustion
Symptom. Container restarts with OOMKilled. Health flips NOT_SERVING. Callers see UNAVAILABLE until Railway brings the replica back.
Causes.
- ONNX session memory + heap exceeding the Railway memory limit. Each model holds its weights in RAM continuously.
- File-descriptor exhaustion — the resource monitor (
internal/pkg/monitoring) warns at 80 % FD usage and at >1000 goroutines, both visible in axiom. - Bursts of large
image_base64payloads up to 15 MiB on the wire.
Recovery. Bump the Railway memory ceiling, then investigate via the resource snapshot logs (Resource usage snapshot lines from monitoring.logResourceUsage) covering heap_alloc_mb, goroutines, and fd_usage_percent. See services/round-oncall for thresholds.
Malformed input
Symptom. INVALID_ARGUMENT returned synchronously. Common subcases:
| Message | Trigger |
|---|---|
model_id is required | Empty model_id. |
either text or image_base64 input is required | Neither input field set. |
text input exceeds maximum length of N bytes | len(text) > MAX_TEXT_LENGTH. |
invalid base64 encoding: <err> | base64.StdEncoding.DecodeString failed — caller sent URL-safe base64 or junk. |
decoded image exceeds maximum size of N bytes | Decoded bytes > MAX_BINARY_SIZE. |
These are hot paths during integration work. Validate at the caller (sirloin / brain) when possible to keep network and CPU cost off round.
GPU errors
Not applicable — round has no CUDA path today. Any future GPU-specific failures (CUDA_ERROR_OUT_OF_MEMORY, missing libcudart) will need to be added here when the GPU build lands. See services/round-env for the current CPU-only posture.
Panics inside handlers
The recoveryInterceptor (see internal/app/server/interceptors.go) converts panics into INTERNAL and logs the stack at error level. The process keeps serving — panics do not bring the listener down — but a sustained spike means a model is producing un-decodable output and should be paged.
Logging
Inference logs are structured (zerolog) and include:
model_idhas_text,has_binaryoutput_lengthon successerron failure
Search axiom with service:round level:error and pivot on model_id to scope an incident to a single model.
Caller checklist
When wiring sirloin or brain to round:
- Validate inputs locally before the RPC.
INVALID_ARGUMENTshould never escape your service to a user. - Treat
INTERNALandUNAVAILABLEas retryable. Use exponential backoff with jitter; cap retries at 3 inside a request and shed load if all fail. - Wrap
Inferwith a per-call deadline (1–5 s forembeddings, 5–10 s forface-detection). - Log the round-side
metadata.model_idandmodel_idyou sent on every error — it is the only correlation key when round logs are sampled.
Mapping to alerts
The on-call surface in services/round-oncall watches:
error_rateofINTERNALover a 5-minute window (model not loaded, ONNX errors).error_rateofUNAVAILABLE(drain, restart loops).INVALID_ARGUMENTis intentionally not alerted — it is caller hygiene.