Skip to content

CS Tooling Hardening Runbook

CS Tooling Hardening Runbook

Operator steps for the FOXY-70 “CS Tooling: Faster & Safer” track. Covers the strip↔sirloin service-token rollout (FOXY-72), the new audit tables, pprof gating, and the ops-script confirmation guard. Read the rollout section in full before touching production env vars — the service-token steps are order-sensitive and an out-of-order change locks strip out of sirloin.

Strip service-token rollout (FOXY-72)

StripService RPCs on sirloin are gated by a single server interceptor that constant-time compares an x-service-token gRPC metadata value against the configured token (apps/sirloin/internal/app/services/services.go:1168). Strip attaches the same token on every outgoing sirloin call (apps/strip/internal/app/config/config.go:19).

Two environment variables drive the rollout. They must hold the same secret value:

VariableServiceRole
STRIP_SIRLOIN_SERVICE_TOKENstripToken strip sends as x-service-token (apps/strip/internal/pkg/env/variables.go:14).
SIRLOIN_STRIP_SERVICE_TOKENsirloinToken sirloin enforces inbound (apps/sirloin/internal/pkg/env/variables.go:76).

Rollout ordering (do NOT reorder)

When sirloin’s token is empty the interceptor warns and allows the call, so strip keeps working while it is not yet sending the header (apps/sirloin/internal/app/services/services.go:1180). Once sirloin’s token is set, any StripService call without a matching token is rejected with Unauthenticated (apps/sirloin/internal/app/services/services.go:1194).

  1. Deploy sirloin with an EMPTY SIRLOIN_STRIP_SERVICE_TOKEN — guard is in warn-and-allow mode; strip continues to work without sending the header.
  2. Set STRIP_SIRLOIN_SERVICE_TOKEN in strip — strip now attaches the header on every call.
  3. Set the SAME value in SIRLOIN_STRIP_SERVICE_TOKEN — guard now enforces; only strip can reach StripService RPCs.

Lockout warning: never set sirloin’s SIRLOIN_STRIP_SERVICE_TOKEN before strip is sending STRIP_SIRLOIN_SERVICE_TOKEN (step 2). Doing so makes sirloin enforce a token that strip is not yet attaching, and every StripService RPC fails with Unauthenticated until strip is redeployed with the matching token. The in-code rollout note is at apps/sirloin/internal/app/services/services.go:1152.

Rollback: clear SIRLOIN_STRIP_SERVICE_TOKEN on sirloin to drop back to warn-and-allow without touching strip.

Why it matters: enforcing the token gates StripService to authenticated-strip callers only, so the admin_user_id strip forwards in the request body can be trusted (a non-strip caller can no longer spoof an arbitrary actor). The token authenticates the service; it does not replace the per-admin admin_user_id the request still carries.

New audit tables

All three tables live in the audits schema and are append-only. Migrations are applied by the standard sirloin migrate flow.

TableMigrationRecords
audits.strip_filter_presetsapps/sirloin/internal/app/migrate/schema/118_strip_filter_presets.sql:4Saved Strip list-view filter presets (per admin_user_id + list_type); shared presets are team-visible, private presets owner-only.
audits.foxy360_query_auditapps/sirloin/internal/app/migrate/schema/119_foxy360_query_audit.sql:5One row per foxy360 MCP tool call: actor + role, tool name, a redacted/length-capped args summary (never raw PII), and the allow/deny outcome. Written on both allow and deny paths (apps/sirloin/internal/app/foxy360/audit.go:54).
audits.ops_script_auditapps/sirloin/internal/app/migrate/schema/120_ops_script_audit.sql:7One row per one-off ops-script run: script name, operator, redacted args summary, whether the run is destructive, and whether it was confirmed.

pprof gating

The pprof debug server is only started when SIRLOIN_PPROF_PORT is set (apps/sirloin/internal/pkg/env/variables.go:143). In production, requests must present an X-Debug-Token header that constant-time matches SIRLOIN_PPROF_TOKEN (apps/sirloin/internal/pkg/env/variables.go:144); otherwise the handler returns 404 rather than 403, so it does not reveal that pprof exists (apps/sirloin/internal/app/debug/pprof.go:51).

  • An empty SIRLOIN_PPROF_TOKEN in production means pprof is never reachable — every request 404s. Config logs a warning if the port is set but the token is empty (apps/sirloin/internal/app/config/config.go:444).
  • Non-production stages pass through with no token required (apps/sirloin/internal/app/debug/pprof.go:42).

To profile prod: set both SIRLOIN_PPROF_PORT and SIRLOIN_PPROF_TOKEN, then send the token in the X-Debug-Token header.

Ops-script confirmation guard

Destructive one-off scripts under apps/sirloin/cmd/scripts/* route through opsaudit.Guard (apps/sirloin/internal/pkg/opsaudit/opsaudit.go:85). Behaviour:

  • A destructive run fails closed without explicit confirmation: Guard returns ErrConfirmationRequired unless the --confirm flag is passed or OPS_CONFIRM=1 is set in the environment (apps/sirloin/internal/pkg/opsaudit/opsaudit.go:94). The script surfaces the error and exits non-zero.
  • Each run is recorded best-effort to audits.ops_script_audit (operator, redacted args, destructive flag, confirmed flag). Audit-sink errors do not block the run — auditing is best-effort and a nil sink is a no-op (apps/sirloin/internal/pkg/opsaudit/opsaudit.go:122).
  • The operator is resolved from OPS_OPERATOR (falling back to the OS user); secret-like args are redacted before they are written.

Example wiring lives in apps/sirloin/cmd/scripts/chargebee-auto-collection-off/main.go:54.

Note on fail-closed scope: it is the confirmation that fails closed (destructive + not confirmed => abort), not the audit write. If you need the audit row to be a hard prerequisite for a destructive run, that is not the current behaviour. TODO(@law): decide whether audit-sink failures should block destructive ops-script runs.

CS productivity features

Brief operator-facing notes on the productivity changes shipped in the same track:

  • Saved filter presets — Strip list views (users / characters / media / audit logs) can save named filters, private or shared, persisted in audits.strip_filter_presets.
  • foxy360 export — read-only foxy360 query results can be exported as CSV or JSON artifacts (apps/sirloin/internal/app/foxy360/export.go:16).
  • workflow_name label — media views show a humanized workflow label (e.g. foxy_image_to_video rendered as readable text) via format.HumanizeWorkflowName (apps/strip/internal/app/templates/format/workflowname.go:10).