Skip to content

gRPC Mesh Integration Playbook

gRPC is the primary inter-service protocol across the beef monorepo. REST is exposed only at edges (public API to chargebee/primer webhooks, public REST gateway on sirloin) and HTTP is used for one specific internal call (sirloin → brain control plane, not gRPC). This page is the playbook for the internal mesh: who calls whom, how it’s secured, how protos are versioned and generated, what to watch for when it breaks.

1. Overview

  • Transport: HTTP/2 over plaintext (h2c) — see Section 4.
  • Schema source of truth: proto/ at the repo root, organised as proto/<service>/<version>/*.proto (proto-files-list).
  • Codegen tool: buf v2 (buf.yaml, buf.gen.yaml) with local plugins (protoc-gen-go, protoc-gen-go-grpc, ts-proto, @bufbuild/protoc-gen-es). See Section 7.
  • Generated stubs are checked in under each service’s internal/pkg/pb/ (Go), src/generated/ (brain), or pkg/ (brisket).
  • Service auth on the wire: minimal. Network-trust + per-edge mechanisms (API key for sirloin → brain; Clerk JWT propagated as user_id in proto payloads for user-bound calls). See Section 5.
  • Total surface area: 105 RPCs across six .proto files. See the full index in /generated/proto-api.

The single biggest correctness invariant: every gRPC hop is plaintext, internal-only. No call is meant to traverse the public internet. If you find yourself wanting to expose a gRPC port externally, stop and route through sirloin’s REST gateway instead.

2. Topology

2.1 Service mesh

flowchart LR
subgraph Edge[Public Edge]
Browser[User browser]
Admin[Admin browser]
CB[Chargebee]
PR[Primer]
end
subgraph Mesh[Internal mesh — plaintext h2c]
BR[brisket Next.js]
FE[fennec admin]
ST[strip SSR]
FL[flank]
SI[(sirloin)]
BN[(brain)]
RD[(round ML)]
end
Browser -->|HTTPS REST tRPC| BR
Admin -->|HTTPS| ST
Browser -->|HTTPS Clerk| FE
Browser -->|HTTPS Clerk| FL
CB -->|HTTPS REST + Basic auth| SI
PR -->|HTTPS REST + HMAC| SI
BR -->|gRPC ConnectRPC| SI
ST -->|gRPC| SI
FL -->|gRPC FlankStorageService| SI
SI -->|gRPC FlankExecutionService| FL
SI -->|HTTP + API key| BN
SI -->|gRPC insecure| RD
BN -->|gRPC insecure| RD

2.2 Direction notes

  • brisket → sirloin is gRPC over Connect (not REST, despite the wording in the root CLAUDE.md). Two clients are constructed in apps/brisket/server/api/sirloin-api.ts (SirloinService, BillingService) using createGrpcTransport({ baseUrl: env.SIRLOIN_URL, peerMaxConcurrentStreams: 500 }). The transport is refreshed every 30 minutes to mitigate stale H2 connections.
  • strip → sirloin is plain Go gRPC (apps/strip/cmd/app/main.go:165grpc.NewClient). user_id is passed inside proto request bodies, not via header metadata.
  • sirloin → brain is HTTP (not gRPC) using a shared API key (SIRLOIN_BRAIN_API_KEY); brain validates it via apps/brain/src/modules/application/auth/strategies/api-key.strategy.ts. This is the one cross-service call where the codegen pipeline is bypassed. TODO(@law): if/when brain exposes gRPC, the topology should converge.
  • sirloin → round is gRPC, plaintext (apps/sirloin/cmd/app/main.go:237,263insecure.NewCredentials()).
  • brain → round is gRPC, plaintext (apps/brain/src/modules/application/round/providers/round-grpc-client.provider.ts:19credentials.createInsecure()).
  • sirloin ↔ flank is bidirectional gRPC: sirloin triggers FlankExecutionService on flank’s gRPC server (apps/flank/server/grpc-server.ts); flank reads workflows/adapters/secrets from FlankStorageService on sirloin via Connect-es (apps/flank/app/lib/grpc-client.ts).
  • fennec is a SPA — its gRPC needs are funneled through sirloin’s REST gateway. Confirmed: no @bufbuild/@connectrpc import or grpc dial in apps/fennec/src/.
  • chuck (Strapi) is HTTP only; not part of the mesh.

3. Services + RPCs index

Per .proto file and owning service:

Proto fileOwner service (server)Common callers (client)
proto/round/v1/round.protoroundsirloin, brain
proto/sirloin/v5/sirloin.protosirloinbrisket, flank, strip
proto/sirloin/v5/billing.protosirloinbrisket
proto/sirloin/v5/strip.protosirloinstrip
proto/sirloin/v5/flank.protosirloin (FlankStorageService)flank
proto/flank/v1/flank.protoflank (FlankExecutionService)sirloin

The full RPC list (105 RPCs, request/response shapes, field-level docs) is auto-generated from these protos; see /generated/proto-api, regenerated via bash scripts/gen-proto-docs.sh.

4. Transport security

Today: none on the wire. All inter-service gRPC uses h2c (HTTP/2 plaintext). Evidence:

SiteCodeEvidence
sirloin gRPC serverapps/sirloin/cmd/app/main.go:539grpc.NewServer(...) — no grpc.Creds(...) option
sirloin → round clientapps/sirloin/cmd/app/main.go:237,263grpc.WithTransportCredentials(insecure.NewCredentials())
brain → round clientapps/brain/.../round-grpc-client.provider.ts:19credentials.createInsecure()
round gRPC serverapps/round/internal/app/server/server.go:63grpc.NewServer(...) — no creds
strip → sirloin clientapps/strip/cmd/app/main.go:165grpc.NewClient(...) (insecure)
flank gRPC serverapps/flank/server/grpc-server.tsconnectNodeAdapter(...) over http2.createServer (h2c)
sirloin → flank clientapps/sirloin/internal/app/flankmcp/grpc.go:56,98insecure.NewCredentials()

Confidentiality / integrity therefore relies on the network boundary:

  • Local dev: shared Docker network created by docker-compose.dev.yml.
  • Production: Railway private network. TODO(@law): confirm the network guarantees (private VPC, no cross-tenant egress) against Railway docs and current platform config; cross-link to /standards/security-model/#in-transit.
flowchart LR
subgraph Internet
User[User]
end
subgraph Railway[Railway private network — plaintext within]
direction LR
SI[sirloin] -.h2c.-> RD[round]
SI -.h2c.-> FL[flank]
BN[brain] -.h2c.-> RD
BR[brisket] -.h2c.-> SI
end
User ==HTTPS==> BR
User ==HTTPS==> SI
classDef plain fill:#fee,stroke:#c33;
class SI,RD,FL,BN,BR plain;

Finding: flank’s gRPC FlankExecutionService (apps/flank/server/grpc-server.ts) does not apply the Clerk auth gate that protects flank’s HTTP server functions (apps/flank/app/lib/auth.ts:requireAuth). The gRPC entry point trusts that sirloin is the only caller. Mitigation today is network-level only — no per-call token check. TODO(@law) tracked in /standards/auth-model/#service-to-service-auth.

5. Service-to-service auth

Cross-link: see /standards/auth-model/ for the full table.

Caller → CalleeMechanismEvidence
sirloin → brain (HTTP)API key header SIRLOIN_BRAIN_API_KEY.env.example; apps/brain/.../api-key.strategy.ts
sirloin → round (gRPC)None (network-trust)apps/sirloin/cmd/app/main.go:237,263
brain → round (gRPC)None (network-trust)round-grpc-client.provider.ts:19
sirloin → flank (gRPC)None — Clerk gate not applied to gRPCapps/flank/server/grpc-server.ts
flank → sirloin (gRPC)None at gRPC layer; service token in env (FLANK_SERVICE_TOKEN)apps/flank/app/lib/grpc-client.ts
brisket → sirloin (gRPC)Clerk JWT forwarded in headers via grpc-headers.ts; sirloin verifies user contextapps/brisket/server/api/sirloin-api.ts
strip → sirloin (gRPC)user_id passed inside proto request body; verified by sirloin against Clerk sessionapps/strip/cmd/app/main.go:165
flowchart TB
subgraph Auth[Auth boundaries]
direction TB
User -->|Clerk JWT| BR[brisket]
User -->|Clerk JWT| ST[strip]
BR -->|Clerk JWT in metadata| SI[sirloin]
ST -->|user_id in proto| SI
SI -->|API key header| BN[brain]
SI -->|no auth| RD[round]
SI -->|no auth| FL[flank]
FL -->|no auth| SI
BN -->|no auth| RD
end

Direction: there is no mTLS, no SPIFFE, no per-call signed identity. The backstop assumption is the private network. Anything that erodes that boundary (a public listener, a misconfigured ingress) becomes a critical finding immediately. TODO(@law): plan for mTLS or signed service tokens.

6. Versioning

Proto versions in use today:

Proto pathVersionNotes
proto/round/v1/v1Stable; embedding/face-detection RPCs
proto/sirloin/v5/v5Current; previous v1–v4 not retained in repo
proto/flank/v1/v1Workflow execution

Compatibility rules (buf.yaml enforces):

breaking:
use:
- FILE

FILE-level breaking-change detection means each .proto is checked against its previous state for backwards-incompatible changes (renamed fields, deleted services, type changes). buf breaking is not wired into CI today (no buf breaking invocation in .github/workflows/). TODO(@law): add buf breaking --against '.git#branch=main' to the proto pipeline.

What is compatible (free to add):

  • New RPCs on existing services.
  • New fields with new tag numbers.
  • New optional messages.

What requires a major bump (e.g. v5v6):

  • Removing or renaming an RPC.
  • Renumbering or repurposing a field tag.
  • Changing field type, cardinality (singular ↔ repeated), or wrapper.
  • Renaming an enum value visible on the wire.
flowchart LR
P1[proto/sirloin/v5] -->|add field| P1
P1 -->|add RPC| P1
P1 -->|remove field| P2[proto/sirloin/v6]
P1 -.compat.-> P1
P2 -.parallel until v5 cleanup.-> P1

Coexistence: introducing a new major means generating both versions side-by-side until callers migrate. There is no shadowing/aliasing helper today; each service must serve both interfaces during the cutover. TODO(@law): write a migration runbook — none exists yet.

7. Codegen workflow

Source of truth: buf.yaml at repo root plus the scoped generation templates under proto/buf.gen.*.yaml, driven by make generate-proto. The root buf.gen.yaml is docs-only so plain buf generate does not write app stubs outside their intended consumers.

Terminal window
# Generate stubs for every service from /proto
make generate-proto
# Lint proto definitions
make lint-proto

buf.gen.yaml invokes local plugins (not BSR-hosted) to dodge rate limits. Output mapping:

PluginOutput dirConsumer
protoc-gen-go + protoc-gen-go-grpcapps/round/internal/pkg/pb/round
protoc-gen-go + protoc-gen-go-grpcapps/sirloin/internal/pkg/pb/sirloin
protoc-gen-go + protoc-gen-go-grpcapps/strip/internal/pkg/pb/strip
protoc-gen-ts_proto (outputServices=grpc-js)apps/brain/src/generated/brain
@bufbuild/protoc-gen-esapps/brisket/pkg/brisket
@bufbuild/protoc-gen-esapps/flank/pkg/flank

The Makefile scopes each buf generate invocation with --path, so cross-service outputs are not generated into unrelated app directories.

flowchart LR
P[/proto/*.proto/]
P --> BUF[buf generate]
BUF --> GO_R[round/internal/pkg/pb]
BUF --> GO_S[sirloin/internal/pkg/pb]
BUF --> GO_ST[strip/internal/pkg/pb]
BUF --> TS_B[brain/src/generated]
BUF --> ES_BR[brisket/pkg]
BUF --> ES_FL[flank/pkg]
BUF --> CLEAN[Makefile cleanup]

How services consume the output:

  • Go services import the package directly (import pb "github.com/.../internal/pkg/pb/sirloin/v5").
  • brain (NestJS) imports from src/generated/...; gRPC clients are wired via providers (e.g. round-grpc-client.provider.ts).
  • brisket constructs Connect transports against apps/brisket/pkg/sirloin/v5/... types.

CI drift guard: confirmed missing — no generate-proto invocation in .github/workflows/. TODO(@law): add a job that runs make generate-proto and fails on git diff.

8. Failure modes

FailureDetectionMitigation
Proto drift — generator not run after .proto editCompile error in dependent service; type mismatch on field renameRun make generate-proto and commit; CI guard (TODO) to fail on diff
Schema-breaking change merged — field renumber, type swapOld client deserialises garbage; production incidentsbuf breaking in CI; bump major version (v5v6) and parallel-run
Unauthenticated cross-service call — caller skips API-key headerBrain returns 401; flank gRPC accepts (no gate) → silent privilege escalationNetwork-level isolation; add interceptor (TODO) on every server enforcing peer identity
Missing deadline — client grpc.NewClient without WithTimeout per callSlow/stuck upstream pins client goroutine; cascading queue depthMandate per-call context.WithTimeout; add deadline-required interceptor (TODO)
Retry storm — client retries on transient 5xx without backoffSpike in grpc.code=UNAVAILABLE followed by sirloin saturationUse bounded retry policy (none today — TODO(@law)); circuit breaker on hot RPCs
Stale H2 connection — long-lived stream goes half-closedINTERNAL errors from brisket → sirloin after ~hoursbrisket already mitigates with 30-min transport refresh (apps/brisket/server/api/sirloin-api.ts); replicate elsewhere
Insecure listener publicly exposed — gRPC port reachable from internetNetwork scan; unexpected requests on :8080/:50051Audit Railway service expose settings; never set PORT for gRPC services public-facing

9. Observability

ServicegRPC instrumentationEvidence
sirloin (server)OpenTelemetry stats handler + chained unary interceptors (rate-limit, auth, recovery)apps/sirloin/cmd/app/main.go:539-542
sirloin (client → round/flank)otelgrpc.NewClientHandler()apps/sirloin/cmd/app/main.go:238,264
round (server)grpc.ChainUnaryInterceptor(recovery, logging, monitoring, validation, errorMapping) — no OTel handlerapps/round/internal/app/server/server.go:63-90
brain (client → round)NestJS @opentelemetry/sdk-node auto-instruments gRPCbrain bootstrap
flank (server)Connect handlers log via apps/flank/server/engine/logger.ts; tracing uses @opentelemetry/api only (apps/flank/server/engine/tracing.ts:13) without an OTel SDK exporter, so spans terminate locally and never reach Axiom
brisket (client → sirloin)OTel via @opentelemetry/sdk-node; W3C traceparent propagatedapps/brisket/server/api/grpc-headers.ts

Trace propagation: traceparent and tracestate headers cross every gRPC boundary that has an interceptor. The chain currently breaks at round (no OTel imports observed) and flank (Connect adapter, no auto-instrumentation wired). Cross-link: /standards/observability/#trace-propagation (TODO(@law): confirm anchor exists in the rendered page).

Logging: structured logs at every service. Correlation ID falls back to x-correlation-id if traceparent is absent.

10. Local dev

  • Run a service against a local upstream: every gRPC client reads a *_GRPC_URL env var (SIRLOIN_GRPC_URL, ROUND_GRPC_URL, etc.). Point at localhost:<port> for a locally-running upstream, or use the docker-compose service name from inside the network (make dev-up-d).
  • Mock a peer: round is the most commonly mocked. Use the generated Go interface (apps/sirloin/internal/pkg/pb/round/v1/round_grpc.pb.go) with mockery (config in apps/sirloin/.mockery.yml) to generate a mock, or stand up apps/round itself — model loading dominates start-up time but its gRPC surface is small.
  • Bypass auth in flank dev: set FLANK_AUTH_BYPASS_UUID to a fixed UUID; this is honoured by requireAuth() in apps/flank/app/lib/auth.ts. Do not ship this var to prod.
  • Inspect traffic: grpcurl -plaintext localhost:<port> list against any service that exposes reflection. Confirmed enabled on sirloin (apps/sirloin/cmd/app/main.go:572) and round (apps/round/cmd/app/main.go:80); flank uses Connect handlers which expose schema via the Connect protocol rather than gRPC reflection.

11. Runbook hooks

Per-service deep dives:

When a gRPC incident fires, check in this order: (1) is the upstream healthy? (/services/<name>/runbook); (2) has a proto change shipped recently? (git log --since=24h -- proto/); (3) is the network path intact? (Railway private network status); (4) has the codegen pipeline drifted? (make generate-proto && git diff).