Round

Responsibility

Round is the internal Go gRPC inference service for ONNX-backed model serving. Brain calls Round for text embeddings and image analysis through one model-serving API.

Runtime

Round runs as a Go service with native ONNX Runtime and tokenizer dependencies, so CGO_ENABLED=1 is required. It uses ONNX Runtime through github.com/yalue/onnxruntime_go; the embeddings model also uses the Rust tokenizer bindings from github.com/daulet/tokenizers.

The Docker image installs ONNX Runtime through the ONNX_VERSION build argument, currently 1.20.1, and tokenizers shared libraries. Model files live under MODEL_CACHE_DIR and are downloaded during Docker build or first use depending on the model. Use persistent storage for MODEL_CACHE_DIR when downloaded models must survive container restarts.

Models

Round currently registers these model IDs when their dependencies are available:

embeddings: text embeddings from BAAI/bge-small-en-v1.5, returned as a 384-dimensional vector.
face-detection: RetinaFace MobileNet V1 0.25 face detection for JPEG or PNG images.
face-embedding: LVFace-B_Glint360K face embedding extraction, using face detection and alignment before embedding.

Face detection and face embedding are optional at startup. If a model cannot be downloaded or loaded, Round logs the failure and continues with the models that did load.

Configuration

Variable	Default	Purpose
`GRPC_PORT`	`8080`	gRPC server port.
`HOST`	`0.0.0.0`	Server bind address.
`MODEL_CACHE_DIR`	`/opt/models`	Directory for model files.
`MAX_TEXT_LENGTH`	`10240`	Maximum text input size in bytes.
`MAX_BINARY_SIZE`	`10485760`	Maximum image input size in bytes.
`LOG_LEVEL`	`info`	`debug`, `info`, `warn`, or `error`.

Model download URLs can be overridden with ROUND_EMBEDDINGS_MODEL_URL, ROUND_EMBEDDINGS_TOKENIZER_URL, ROUND_RETINAFACE_URL, and ROUND_LVFACE_URL.

Primary Source Paths

apps/round/cmd/app/
apps/round/internal/app/services/
apps/round/internal/pkg/config/
apps/round/internal/pkg/models/
proto/round/v1/

Contracts And Generated References

Round exposes round.v1.RoundService with:

Infer: runs inference for a selected model_id with text or image input.
ListModels: returns metadata for currently registered models.

The service also registers gRPC health checking and server reflection.

Operational Notes

Round validates request shape and input size before inference, including missing model IDs, missing inputs, oversized payloads, and invalid base64 image payloads. The Infer path currently wraps registry and model inference failures as internal gRPC errors; more specific status codes only apply when lower layers return unmapped domain errors through the server interceptors.

The server uses structured zerolog logging, health status changes during startup and shutdown, and graceful shutdown for SIGINT and SIGTERM.

Deployment Environment

Round decisions are recorded under docs/src/content/docs/decisions/ when durable.

Operations

Local model setup can require apps/round/LOCAL-DEV.md and setup-local-dev.sh.

Local Commands

cd apps/round && make build
cd apps/round && make run-tests
cd apps/round && make lint

Round

Round

Responsibility

Runtime

Models

Configuration

Primary Source Paths

Contracts And Generated References

Operational Notes

Related Flows

Related Standards

Related Decisions

Operations

Local Commands