CS Tooling Hardening Runbook
CS Tooling Hardening Runbook
Operator steps for the FOXY-70 “CS Tooling: Faster & Safer” track. Covers the strip↔sirloin service-token rollout (FOXY-72), the new audit tables, pprof gating, and the ops-script confirmation guard. Read the rollout section in full before touching production env vars — the service-token steps are order-sensitive and an out-of-order change locks strip out of sirloin.
Strip service-token rollout (FOXY-72)
StripService RPCs on sirloin are gated by a single server interceptor that constant-time
compares an x-service-token gRPC metadata value against the configured token
(apps/sirloin/internal/app/services/services.go:1168). Strip attaches the same token on every
outgoing sirloin call (apps/strip/internal/app/config/config.go:19).
Two environment variables drive the rollout. They must hold the same secret value:
| Variable | Service | Role |
|---|---|---|
STRIP_SIRLOIN_SERVICE_TOKEN | strip | Token strip sends as x-service-token (apps/strip/internal/pkg/env/variables.go:14). |
SIRLOIN_STRIP_SERVICE_TOKEN | sirloin | Token sirloin enforces inbound (apps/sirloin/internal/pkg/env/variables.go:76). |
Rollout ordering (do NOT reorder)
When sirloin’s token is empty the interceptor warns and allows the call, so strip keeps
working while it is not yet sending the header (apps/sirloin/internal/app/services/services.go:1180).
Once sirloin’s token is set, any StripService call without a matching token is rejected with
Unauthenticated (apps/sirloin/internal/app/services/services.go:1194).
- Deploy sirloin with an EMPTY
SIRLOIN_STRIP_SERVICE_TOKEN— guard is in warn-and-allow mode; strip continues to work without sending the header. - Set
STRIP_SIRLOIN_SERVICE_TOKENin strip — strip now attaches the header on every call. - Set the SAME value in
SIRLOIN_STRIP_SERVICE_TOKEN— guard now enforces; only strip can reachStripServiceRPCs.
Lockout warning: never set sirloin’s
SIRLOIN_STRIP_SERVICE_TOKENbefore strip is sendingSTRIP_SIRLOIN_SERVICE_TOKEN(step 2). Doing so makes sirloin enforce a token that strip is not yet attaching, and everyStripServiceRPC fails withUnauthenticateduntil strip is redeployed with the matching token. The in-code rollout note is atapps/sirloin/internal/app/services/services.go:1152.
Rollback: clear SIRLOIN_STRIP_SERVICE_TOKEN on sirloin to drop back to warn-and-allow without
touching strip.
Why it matters: enforcing the token gates StripService to authenticated-strip callers only, so
the admin_user_id strip forwards in the request body can be trusted (a non-strip caller can no
longer spoof an arbitrary actor). The token authenticates the service; it does not replace the
per-admin admin_user_id the request still carries.
New audit tables
All three tables live in the audits schema and are append-only. Migrations are applied by the
standard sirloin migrate flow.
| Table | Migration | Records |
|---|---|---|
audits.strip_filter_presets | apps/sirloin/internal/app/migrate/schema/118_strip_filter_presets.sql:4 | Saved Strip list-view filter presets (per admin_user_id + list_type); shared presets are team-visible, private presets owner-only. |
audits.foxy360_query_audit | apps/sirloin/internal/app/migrate/schema/119_foxy360_query_audit.sql:5 | One row per foxy360 MCP tool call: actor + role, tool name, a redacted/length-capped args summary (never raw PII), and the allow/deny outcome. Written on both allow and deny paths (apps/sirloin/internal/app/foxy360/audit.go:54). |
audits.ops_script_audit | apps/sirloin/internal/app/migrate/schema/120_ops_script_audit.sql:7 | One row per one-off ops-script run: script name, operator, redacted args summary, whether the run is destructive, and whether it was confirmed. |
pprof gating
The pprof debug server is only started when SIRLOIN_PPROF_PORT is set
(apps/sirloin/internal/pkg/env/variables.go:143). In production, requests must present an
X-Debug-Token header that constant-time matches SIRLOIN_PPROF_TOKEN
(apps/sirloin/internal/pkg/env/variables.go:144); otherwise the handler returns 404 rather
than 403, so it does not reveal that pprof exists (apps/sirloin/internal/app/debug/pprof.go:51).
- An empty
SIRLOIN_PPROF_TOKENin production means pprof is never reachable — every request 404s. Config logs a warning if the port is set but the token is empty (apps/sirloin/internal/app/config/config.go:444). - Non-production stages pass through with no token required
(
apps/sirloin/internal/app/debug/pprof.go:42).
To profile prod: set both SIRLOIN_PPROF_PORT and SIRLOIN_PPROF_TOKEN, then send the token in
the X-Debug-Token header.
Ops-script confirmation guard
Destructive one-off scripts under apps/sirloin/cmd/scripts/* route through
opsaudit.Guard (apps/sirloin/internal/pkg/opsaudit/opsaudit.go:85). Behaviour:
- A destructive run fails closed without explicit confirmation:
GuardreturnsErrConfirmationRequiredunless the--confirmflag is passed orOPS_CONFIRM=1is set in the environment (apps/sirloin/internal/pkg/opsaudit/opsaudit.go:94). The script surfaces the error and exits non-zero. - Each run is recorded best-effort to
audits.ops_script_audit(operator, redacted args, destructive flag, confirmed flag). Audit-sink errors do not block the run — auditing is best-effort and a nil sink is a no-op (apps/sirloin/internal/pkg/opsaudit/opsaudit.go:122). - The operator is resolved from
OPS_OPERATOR(falling back to the OS user); secret-like args are redacted before they are written.
Example wiring lives in apps/sirloin/cmd/scripts/chargebee-auto-collection-off/main.go:54.
Note on fail-closed scope: it is the confirmation that fails closed (destructive + not confirmed => abort), not the audit write. If you need the audit row to be a hard prerequisite for a destructive run, that is not the current behaviour. TODO(@law): decide whether audit-sink failures should block destructive ops-script runs.
CS productivity features
Brief operator-facing notes on the productivity changes shipped in the same track:
- Saved filter presets — Strip list views (users / characters / media / audit logs) can save
named filters, private or shared, persisted in
audits.strip_filter_presets. - foxy360 export — read-only foxy360 query results can be exported as CSV or JSON artifacts
(
apps/sirloin/internal/app/foxy360/export.go:16). workflow_namelabel — media views show a humanized workflow label (e.g.foxy_image_to_videorendered as readable text) viaformat.HumanizeWorkflowName(apps/strip/internal/app/templates/format/workflowname.go:10).