gRPC Mesh Integration Playbook
gRPC is the primary inter-service protocol across the beef monorepo. REST is exposed only at edges (public API to chargebee/primer webhooks, public REST gateway on sirloin) and HTTP is used for one specific internal call (sirloin → brain control plane, not gRPC). This page is the playbook for the internal mesh: who calls whom, how it’s secured, how protos are versioned and generated, what to watch for when it breaks.
1. Overview
- Transport: HTTP/2 over plaintext (h2c) — see Section 4.
- Schema source of truth:
proto/at the repo root, organised asproto/<service>/<version>/*.proto(proto-files-list). - Codegen tool:
bufv2 (buf.yaml,buf.gen.yaml) with local plugins (protoc-gen-go,protoc-gen-go-grpc,ts-proto,@bufbuild/protoc-gen-es). See Section 7. - Generated stubs are checked in under each service’s
internal/pkg/pb/(Go),src/generated/(brain), orpkg/(brisket). - Service auth on the wire: minimal. Network-trust + per-edge mechanisms (API key for sirloin → brain; Clerk JWT propagated as user_id in proto payloads for user-bound calls). See Section 5.
- Total surface area: 105 RPCs across six
.protofiles. See the full index in/generated/proto-api.
The single biggest correctness invariant: every gRPC hop is plaintext, internal-only. No call is meant to traverse the public internet. If you find yourself wanting to expose a gRPC port externally, stop and route through sirloin’s REST gateway instead.
2. Topology
2.1 Service mesh
flowchart LR subgraph Edge[Public Edge] Browser[User browser] Admin[Admin browser] CB[Chargebee] PR[Primer] end
subgraph Mesh[Internal mesh — plaintext h2c] BR[brisket Next.js] FE[fennec admin] ST[strip SSR] FL[flank] SI[(sirloin)] BN[(brain)] RD[(round ML)] end
Browser -->|HTTPS REST tRPC| BR Admin -->|HTTPS| ST Browser -->|HTTPS Clerk| FE Browser -->|HTTPS Clerk| FL CB -->|HTTPS REST + Basic auth| SI PR -->|HTTPS REST + HMAC| SI
BR -->|gRPC ConnectRPC| SI ST -->|gRPC| SI FL -->|gRPC FlankStorageService| SI SI -->|gRPC FlankExecutionService| FL SI -->|HTTP + API key| BN SI -->|gRPC insecure| RD BN -->|gRPC insecure| RD2.2 Direction notes
- brisket → sirloin is gRPC over Connect (not REST, despite the wording in
the root
CLAUDE.md). Two clients are constructed inapps/brisket/server/api/sirloin-api.ts(SirloinService,BillingService) usingcreateGrpcTransport({ baseUrl: env.SIRLOIN_URL, peerMaxConcurrentStreams: 500 }). The transport is refreshed every 30 minutes to mitigate stale H2 connections. - strip → sirloin is plain Go gRPC
(
apps/strip/cmd/app/main.go:165—grpc.NewClient). user_id is passed inside proto request bodies, not via header metadata. - sirloin → brain is HTTP (not gRPC) using a shared API key
(
SIRLOIN_BRAIN_API_KEY); brain validates it viaapps/brain/src/modules/application/auth/strategies/api-key.strategy.ts. This is the one cross-service call where the codegen pipeline is bypassed. TODO(@law): if/when brain exposes gRPC, the topology should converge. - sirloin → round is gRPC, plaintext
(
apps/sirloin/cmd/app/main.go:237,263—insecure.NewCredentials()). - brain → round is gRPC, plaintext
(
apps/brain/src/modules/application/round/providers/round-grpc-client.provider.ts:19—credentials.createInsecure()). - sirloin ↔ flank is bidirectional gRPC: sirloin triggers
FlankExecutionServiceon flank’s gRPC server (apps/flank/server/grpc-server.ts); flank reads workflows/adapters/secrets fromFlankStorageServiceon sirloin via Connect-es (apps/flank/app/lib/grpc-client.ts). - fennec is a SPA — its gRPC needs are funneled through sirloin’s REST
gateway. Confirmed: no
@bufbuild/@connectrpcimport orgrpcdial inapps/fennec/src/. - chuck (Strapi) is HTTP only; not part of the mesh.
3. Services + RPCs index
Per .proto file and owning service:
| Proto file | Owner service (server) | Common callers (client) |
|---|---|---|
proto/round/v1/round.proto | round | sirloin, brain |
proto/sirloin/v5/sirloin.proto | sirloin | brisket, flank, strip |
proto/sirloin/v5/billing.proto | sirloin | brisket |
proto/sirloin/v5/strip.proto | sirloin | strip |
proto/sirloin/v5/flank.proto | sirloin (FlankStorageService) | flank |
proto/flank/v1/flank.proto | flank (FlankExecutionService) | sirloin |
The full RPC list (105 RPCs, request/response shapes, field-level docs) is
auto-generated from these protos; see /generated/proto-api,
regenerated via bash scripts/gen-proto-docs.sh.
4. Transport security
Today: none on the wire. All inter-service gRPC uses h2c (HTTP/2 plaintext). Evidence:
| Site | Code | Evidence |
|---|---|---|
| sirloin gRPC server | apps/sirloin/cmd/app/main.go:539 | grpc.NewServer(...) — no grpc.Creds(...) option |
| sirloin → round client | apps/sirloin/cmd/app/main.go:237,263 | grpc.WithTransportCredentials(insecure.NewCredentials()) |
| brain → round client | apps/brain/.../round-grpc-client.provider.ts:19 | credentials.createInsecure() |
| round gRPC server | apps/round/internal/app/server/server.go:63 | grpc.NewServer(...) — no creds |
| strip → sirloin client | apps/strip/cmd/app/main.go:165 | grpc.NewClient(...) (insecure) |
| flank gRPC server | apps/flank/server/grpc-server.ts | connectNodeAdapter(...) over http2.createServer (h2c) |
| sirloin → flank client | apps/sirloin/internal/app/flankmcp/grpc.go:56,98 | insecure.NewCredentials() |
Confidentiality / integrity therefore relies on the network boundary:
- Local dev: shared Docker network created by
docker-compose.dev.yml. - Production: Railway private network. TODO(@law): confirm the network
guarantees (private VPC, no cross-tenant egress) against Railway docs and
current platform config; cross-link to
/standards/security-model/#in-transit.
flowchart LR subgraph Internet User[User] end subgraph Railway[Railway private network — plaintext within] direction LR SI[sirloin] -.h2c.-> RD[round] SI -.h2c.-> FL[flank] BN[brain] -.h2c.-> RD BR[brisket] -.h2c.-> SI end User ==HTTPS==> BR User ==HTTPS==> SI classDef plain fill:#fee,stroke:#c33; class SI,RD,FL,BN,BR plain;Finding: flank’s gRPC FlankExecutionService
(apps/flank/server/grpc-server.ts) does not apply the Clerk auth gate
that protects flank’s HTTP server functions
(apps/flank/app/lib/auth.ts:requireAuth). The gRPC entry point trusts that
sirloin is the only caller. Mitigation today is network-level only — no
per-call token check. TODO(@law) tracked in
/standards/auth-model/#service-to-service-auth.
5. Service-to-service auth
Cross-link: see /standards/auth-model/ for the
full table.
| Caller → Callee | Mechanism | Evidence |
|---|---|---|
| sirloin → brain (HTTP) | API key header SIRLOIN_BRAIN_API_KEY | .env.example; apps/brain/.../api-key.strategy.ts |
| sirloin → round (gRPC) | None (network-trust) | apps/sirloin/cmd/app/main.go:237,263 |
| brain → round (gRPC) | None (network-trust) | round-grpc-client.provider.ts:19 |
| sirloin → flank (gRPC) | None — Clerk gate not applied to gRPC | apps/flank/server/grpc-server.ts |
| flank → sirloin (gRPC) | None at gRPC layer; service token in env (FLANK_SERVICE_TOKEN) | apps/flank/app/lib/grpc-client.ts |
| brisket → sirloin (gRPC) | Clerk JWT forwarded in headers via grpc-headers.ts; sirloin verifies user context | apps/brisket/server/api/sirloin-api.ts |
| strip → sirloin (gRPC) | user_id passed inside proto request body; verified by sirloin against Clerk session | apps/strip/cmd/app/main.go:165 |
flowchart TB subgraph Auth[Auth boundaries] direction TB User -->|Clerk JWT| BR[brisket] User -->|Clerk JWT| ST[strip] BR -->|Clerk JWT in metadata| SI[sirloin] ST -->|user_id in proto| SI SI -->|API key header| BN[brain] SI -->|no auth| RD[round] SI -->|no auth| FL[flank] FL -->|no auth| SI BN -->|no auth| RD endDirection: there is no mTLS, no SPIFFE, no per-call signed identity. The backstop assumption is the private network. Anything that erodes that boundary (a public listener, a misconfigured ingress) becomes a critical finding immediately. TODO(@law): plan for mTLS or signed service tokens.
6. Versioning
Proto versions in use today:
| Proto path | Version | Notes |
|---|---|---|
proto/round/v1/ | v1 | Stable; embedding/face-detection RPCs |
proto/sirloin/v5/ | v5 | Current; previous v1–v4 not retained in repo |
proto/flank/v1/ | v1 | Workflow execution |
Compatibility rules (buf.yaml enforces):
breaking: use: - FILEFILE-level breaking-change detection means each .proto is checked against
its previous state for backwards-incompatible changes (renamed fields,
deleted services, type changes). buf breaking is not wired into CI
today (no buf breaking invocation in .github/workflows/). TODO(@law):
add buf breaking --against '.git#branch=main' to the proto pipeline.
What is compatible (free to add):
- New RPCs on existing services.
- New fields with new tag numbers.
- New optional messages.
What requires a major bump (e.g. v5 → v6):
- Removing or renaming an RPC.
- Renumbering or repurposing a field tag.
- Changing field type, cardinality (singular ↔ repeated), or wrapper.
- Renaming an enum value visible on the wire.
flowchart LR P1[proto/sirloin/v5] -->|add field| P1 P1 -->|add RPC| P1 P1 -->|remove field| P2[proto/sirloin/v6] P1 -.compat.-> P1 P2 -.parallel until v5 cleanup.-> P1Coexistence: introducing a new major means generating both versions side-by-side until callers migrate. There is no shadowing/aliasing helper today; each service must serve both interfaces during the cutover. TODO(@law): write a migration runbook — none exists yet.
7. Codegen workflow
Source of truth: buf.yaml at repo root plus the scoped generation templates
under proto/buf.gen.*.yaml, driven by make generate-proto. The root
buf.gen.yaml is docs-only so plain buf generate does not write app stubs
outside their intended consumers.
# Generate stubs for every service from /protomake generate-proto
# Lint proto definitionsmake lint-protobuf.gen.yaml invokes local plugins (not BSR-hosted) to dodge rate
limits. Output mapping:
| Plugin | Output dir | Consumer |
|---|---|---|
protoc-gen-go + protoc-gen-go-grpc | apps/round/internal/pkg/pb/ | round |
protoc-gen-go + protoc-gen-go-grpc | apps/sirloin/internal/pkg/pb/ | sirloin |
protoc-gen-go + protoc-gen-go-grpc | apps/strip/internal/pkg/pb/ | strip |
protoc-gen-ts_proto (outputServices=grpc-js) | apps/brain/src/generated/ | brain |
@bufbuild/protoc-gen-es | apps/brisket/pkg/ | brisket |
@bufbuild/protoc-gen-es | apps/flank/pkg/ | flank |
The Makefile scopes each buf generate invocation with --path, so
cross-service outputs are not generated into unrelated app directories.
flowchart LR P[/proto/*.proto/] P --> BUF[buf generate] BUF --> GO_R[round/internal/pkg/pb] BUF --> GO_S[sirloin/internal/pkg/pb] BUF --> GO_ST[strip/internal/pkg/pb] BUF --> TS_B[brain/src/generated] BUF --> ES_BR[brisket/pkg] BUF --> ES_FL[flank/pkg] BUF --> CLEAN[Makefile cleanup]How services consume the output:
- Go services import the package directly (
import pb "github.com/.../internal/pkg/pb/sirloin/v5"). - brain (NestJS) imports from
src/generated/...; gRPC clients are wired via providers (e.g.round-grpc-client.provider.ts). - brisket constructs Connect transports against
apps/brisket/pkg/sirloin/v5/...types.
CI drift guard: confirmed missing — no generate-proto invocation in
.github/workflows/. TODO(@law): add a job that runs make generate-proto
and fails on git diff.
8. Failure modes
| Failure | Detection | Mitigation |
|---|---|---|
Proto drift — generator not run after .proto edit | Compile error in dependent service; type mismatch on field rename | Run make generate-proto and commit; CI guard (TODO) to fail on diff |
| Schema-breaking change merged — field renumber, type swap | Old client deserialises garbage; production incidents | buf breaking in CI; bump major version (v5 → v6) and parallel-run |
| Unauthenticated cross-service call — caller skips API-key header | Brain returns 401; flank gRPC accepts (no gate) → silent privilege escalation | Network-level isolation; add interceptor (TODO) on every server enforcing peer identity |
Missing deadline — client grpc.NewClient without WithTimeout per call | Slow/stuck upstream pins client goroutine; cascading queue depth | Mandate per-call context.WithTimeout; add deadline-required interceptor (TODO) |
| Retry storm — client retries on transient 5xx without backoff | Spike in grpc.code=UNAVAILABLE followed by sirloin saturation | Use bounded retry policy (none today — TODO(@law)); circuit breaker on hot RPCs |
| Stale H2 connection — long-lived stream goes half-closed | INTERNAL errors from brisket → sirloin after ~hours | brisket already mitigates with 30-min transport refresh (apps/brisket/server/api/sirloin-api.ts); replicate elsewhere |
| Insecure listener publicly exposed — gRPC port reachable from internet | Network scan; unexpected requests on :8080/:50051 | Audit Railway service expose settings; never set PORT for gRPC services public-facing |
9. Observability
| Service | gRPC instrumentation | Evidence |
|---|---|---|
| sirloin (server) | OpenTelemetry stats handler + chained unary interceptors (rate-limit, auth, recovery) | apps/sirloin/cmd/app/main.go:539-542 |
| sirloin (client → round/flank) | otelgrpc.NewClientHandler() | apps/sirloin/cmd/app/main.go:238,264 |
| round (server) | grpc.ChainUnaryInterceptor(recovery, logging, monitoring, validation, errorMapping) — no OTel handler | apps/round/internal/app/server/server.go:63-90 |
| brain (client → round) | NestJS @opentelemetry/sdk-node auto-instruments gRPC | brain bootstrap |
| flank (server) | Connect handlers log via apps/flank/server/engine/logger.ts; tracing uses @opentelemetry/api only (apps/flank/server/engine/tracing.ts:13) without an OTel SDK exporter, so spans terminate locally and never reach Axiom | |
| brisket (client → sirloin) | OTel via @opentelemetry/sdk-node; W3C traceparent propagated | apps/brisket/server/api/grpc-headers.ts |
Trace propagation: traceparent and tracestate headers cross every
gRPC boundary that has an interceptor. The chain currently breaks at
round (no OTel imports observed) and flank (Connect adapter, no
auto-instrumentation wired). Cross-link:
/standards/observability/#trace-propagation (TODO(@law): confirm anchor exists in the rendered page).
Logging: structured logs at every service. Correlation ID falls back
to x-correlation-id if traceparent is absent.
10. Local dev
- Run a service against a local upstream: every gRPC client reads a
*_GRPC_URLenv var (SIRLOIN_GRPC_URL,ROUND_GRPC_URL, etc.). Point atlocalhost:<port>for a locally-running upstream, or use the docker-compose service name from inside the network (make dev-up-d). - Mock a peer: round is the most commonly mocked. Use the generated
Go interface (
apps/sirloin/internal/pkg/pb/round/v1/round_grpc.pb.go) withmockery(config inapps/sirloin/.mockery.yml) to generate a mock, or stand upapps/rounditself — model loading dominates start-up time but its gRPC surface is small. - Bypass auth in flank dev: set
FLANK_AUTH_BYPASS_UUIDto a fixed UUID; this is honoured byrequireAuth()inapps/flank/app/lib/auth.ts. Do not ship this var to prod. - Inspect traffic:
grpcurl -plaintext localhost:<port> listagainst any service that exposes reflection. Confirmed enabled on sirloin (apps/sirloin/cmd/app/main.go:572) and round (apps/round/cmd/app/main.go:80); flank uses Connect handlers which expose schema via the Connect protocol rather than gRPC reflection.
11. Runbook hooks
Per-service deep dives:
- services/sirloin — server config, RPC list, rate limits.
- services/brain-clerk-flow — sirloin ↔ brain HTTP edge.
- services/round — TODO(@law): confirm page exists.
- services/flank — TODO(@law): confirm page exists.
- generated/proto-api — every RPC, every field.
When a gRPC incident fires, check in this order: (1) is the upstream
healthy? (/services/<name>/runbook); (2) has a proto change shipped
recently? (git log --since=24h -- proto/); (3) is the network path
intact? (Railway private network status); (4) has the codegen pipeline
drifted? (make generate-proto && git diff).