Skip to content

Round Errors

Round Errors

Round returns errors via gRPC status codes only — never inside InferResponse.output. This page enumerates the codes, the conditions that produce them, and the recovery action expected from callers and on-call.

Status code map

CodeWhenSourceCaller action
INVALID_ARGUMENTMalformed request — empty model_id, missing input, oversized text or image, invalid base64.services/inference.go::InferFix the client. Do not retry.
NOT_FOUNDHealth Watch for an unknown service name.services/health.goCheck service name; mostly a developer error.
INTERNALAnything raised below the validator: registry lookup miss, ONNX session error, OOM, model output decode failure.services/inference.go (catch-all inference failed: %v)Retry with backoff (idempotent). Page on-call if sustained.
UNAVAILABLEReturned by the gRPC stack when the listener is draining or rejecting due to keepalive enforcement.grpc-go serverRetry with backoff and jitter; respect Railway drain.
RESOURCE_EXHAUSTEDgRPC server enforces MaxRecvMsgSize = 15 MiB. Oversized frames are rejected before the handler runs.grpc-go serverTrim payload before retry.
DEADLINE_EXCEEDEDCaller deadline elapsed mid-inference (round itself does not impose per-RPC deadlines today).grpc-go serverIncrease deadline or shrink input.

The catch-all wrapping at apps/round/internal/app/services/inference.go is intentionally broad — adding finer-grained codes (NOT_FOUND for unknown model_id, RESOURCE_EXHAUSTED for OOM) is tracked under operational hardening but is not implemented yet. TODO(@law): coordinate with sirloin/brain caller owners before introducing finer-grained codes — today every non-validation failure surfaces as INTERNAL, so callers retry on it.

Failure scenarios

Model not loaded

Symptom. Caller sends model_id that is not in the registry. registry.Infer returns an error; the handler wraps it as INTERNAL: inference failed: model not found: <id>.

Why. The registry is populated once at boot by cmd/app/main.go::loadModels. Models are not hot-loaded.

  • The embeddings and face-detection models are required: failure to load them is fatal at boot (logger.Fatal).
  • The face-embedding model (LVFace) is optional: if its ONNX file is missing, loadModels logs a warning and continues. Calls to model_id=face-embedding will then return INTERNAL until the file is present and the service is restarted.

Recovery. Verify the file is at MODEL_CACHE_DIR/lvface/LVFace-B_Glint360K.onnx (or its overridden URL is reachable), then restart the deployment. See the rollback steps in services/round-runbook.

Out-of-memory / resource exhaustion

Symptom. Container restarts with OOMKilled. Health flips NOT_SERVING. Callers see UNAVAILABLE until Railway brings the replica back.

Causes.

  1. ONNX session memory + heap exceeding the Railway memory limit. Each model holds its weights in RAM continuously.
  2. File-descriptor exhaustion — the resource monitor (internal/pkg/monitoring) warns at 80 % FD usage and at >1000 goroutines, both visible in axiom.
  3. Bursts of large image_base64 payloads up to 15 MiB on the wire.

Recovery. Bump the Railway memory ceiling, then investigate via the resource snapshot logs (Resource usage snapshot lines from monitoring.logResourceUsage) covering heap_alloc_mb, goroutines, and fd_usage_percent. See services/round-oncall for thresholds.

Malformed input

Symptom. INVALID_ARGUMENT returned synchronously. Common subcases:

MessageTrigger
model_id is requiredEmpty model_id.
either text or image_base64 input is requiredNeither input field set.
text input exceeds maximum length of N byteslen(text) > MAX_TEXT_LENGTH.
invalid base64 encoding: <err>base64.StdEncoding.DecodeString failed — caller sent URL-safe base64 or junk.
decoded image exceeds maximum size of N bytesDecoded bytes > MAX_BINARY_SIZE.

These are hot paths during integration work. Validate at the caller (sirloin / brain) when possible to keep network and CPU cost off round.

GPU errors

Not applicable — round has no CUDA path today. Any future GPU-specific failures (CUDA_ERROR_OUT_OF_MEMORY, missing libcudart) will need to be added here when the GPU build lands. See services/round-env for the current CPU-only posture.

Panics inside handlers

The recoveryInterceptor (see internal/app/server/interceptors.go) converts panics into INTERNAL and logs the stack at error level. The process keeps serving — panics do not bring the listener down — but a sustained spike means a model is producing un-decodable output and should be paged.

Logging

Inference logs are structured (zerolog) and include:

  • model_id
  • has_text, has_binary
  • output_length on success
  • err on failure

Search axiom with service:round level:error and pivot on model_id to scope an incident to a single model.

Caller checklist

When wiring sirloin or brain to round:

  1. Validate inputs locally before the RPC. INVALID_ARGUMENT should never escape your service to a user.
  2. Treat INTERNAL and UNAVAILABLE as retryable. Use exponential backoff with jitter; cap retries at 3 inside a request and shed load if all fail.
  3. Wrap Infer with a per-call deadline (1–5 s for embeddings, 5–10 s for face-detection).
  4. Log the round-side metadata.model_id and model_id you sent on every error — it is the only correlation key when round logs are sampled.

Mapping to alerts

The on-call surface in services/round-oncall watches:

  • error_rate of INTERNAL over a 5-minute window (model not loaded, ONNX errors).
  • error_rate of UNAVAILABLE (drain, restart loops).
  • INVALID_ARGUMENT is intentionally not alerted — it is caller hygiene.