Skip to content

Round

Round

Responsibility

Round is the internal Go gRPC inference service for ONNX-backed model serving. Brain calls Round for text embeddings and image analysis through one model-serving API.

Runtime

Round runs as a Go service with native ONNX Runtime and tokenizer dependencies, so CGO_ENABLED=1 is required. It uses ONNX Runtime through github.com/yalue/onnxruntime_go; the embeddings model also uses the Rust tokenizer bindings from github.com/daulet/tokenizers.

The Docker image installs ONNX Runtime through the ONNX_VERSION build argument, currently 1.20.1, and tokenizers shared libraries. Model files live under MODEL_CACHE_DIR and are downloaded during Docker build or first use depending on the model. Use persistent storage for MODEL_CACHE_DIR when downloaded models must survive container restarts.

Models

Round currently registers these model IDs when their dependencies are available:

  • embeddings: text embeddings from BAAI/bge-small-en-v1.5, returned as a 384-dimensional vector.
  • face-detection: RetinaFace MobileNet V1 0.25 face detection for JPEG or PNG images.
  • face-embedding: LVFace-B_Glint360K face embedding extraction, using face detection and alignment before embedding.

Face detection and face embedding are optional at startup. If a model cannot be downloaded or loaded, Round logs the failure and continues with the models that did load.

Configuration

VariableDefaultPurpose
GRPC_PORT8080gRPC server port.
HOST0.0.0.0Server bind address.
MODEL_CACHE_DIR/opt/modelsDirectory for model files.
MAX_TEXT_LENGTH10240Maximum text input size in bytes.
MAX_BINARY_SIZE10485760Maximum image input size in bytes.
LOG_LEVELinfodebug, info, warn, or error.

Model download URLs can be overridden with ROUND_EMBEDDINGS_MODEL_URL, ROUND_EMBEDDINGS_TOKENIZER_URL, ROUND_RETINAFACE_URL, and ROUND_LVFACE_URL.

Primary Source Paths

  • apps/round/cmd/app/
  • apps/round/internal/app/services/
  • apps/round/internal/pkg/config/
  • apps/round/internal/pkg/models/
  • proto/round/v1/

Contracts And Generated References

Round exposes round.v1.RoundService with:

  • Infer: runs inference for a selected model_id with text or image input.
  • ListModels: returns metadata for currently registered models.

The service also registers gRPC health checking and server reflection.

Operational Notes

Round validates request shape and input size before inference, including missing model IDs, missing inputs, oversized payloads, and invalid base64 image payloads. The Infer path currently wraps registry and model inference failures as internal gRPC errors; more specific status codes only apply when lower layers return unmapped domain errors through the server interceptors.

The server uses structured zerolog logging, health status changes during startup and shutdown, and graceful shutdown for SIGINT and SIGTERM.

Round decisions are recorded under docs/src/content/docs/decisions/ when durable.

Operations

Local model setup can require apps/round/LOCAL-DEV.md and setup-local-dev.sh.

Local Commands

  • cd apps/round && make build
  • cd apps/round && make run-tests
  • cd apps/round && make lint