Chuck On-Call

This page is the on-call cheat sheet for Chuck. It complements chuck-runbook (procedures) by setting expectations on tier, alerts, and escalation.

Service Tier

Tier: secondary. Chuck is a marketing-content CMS. It serves brisket public landing pages, FAQ sections, blog/academy entries, and policy pages.

Dimension	Chuck	Compare to sirloin / brain
User-facing impact of outage	Marketing pages render with stale or empty content.	sirloin / brain outage breaks signup, billing, generation.
Revenue impact	Indirect (conversion of new visitors).	Direct.
Page time-to-recover (informal target)	< 4h business hours.	Minutes.
Wake on-call at 03:00?	No unless the marketing page is fully broken AND the on-call agrees the impact warrants it.	Yes for sirloin / brain.

There is no formal SLO doc for chuck in this repo (find docs/src/content/docs -name '*chuck*slo*' returns no matches). The billing SLO doc sets the model to follow — TODO(@marty): decide whether chuck warrants its own SLO doc.

Alerts and Signals

There is no alert routing checked into this repo for chuck (verified by grep -rn 'chuck\|strapi' .github/workflows, grep -nE 'chuck' docker-compose*.yml, no apps/chuck/railway.json). Recommended signals to wire (TODO(@marty): confirm which exist in the runtime — none of the items below are sourced from a checked-in alert config):

Top alerts to set up

chuck unreachable — https://<chuck-host>/_health returns non-2xx for > 5 minutes. Source: synthetic check.
brisket → chuck error rate — brisket logs STRAPI_* fetch failures spike. Source: brisket OTel trace (operations/observability).
chuck 5xx rate — Strapi process emits 500-class responses (DB / R2 / plugin failures). TODO(@marty): document where chuck logs ship — apps/chuck/meat/config/ has no log-shipper config and no OTel exporter.
R2 upload failures — provider returns non-2xx; surfaces as admin-side error. Source: R2 provider metrics, if exported.
Postgres unreachable — DB connection acquire times out.
Boot failures — process exits during startup. Source: deploy platform logs.

Symptom → triage map

Pager / report	First check	Doc
”Marketing FAQ is empty”	Hit `GET /api/checkout-faq-section?populate=*` with brisket’s `STRAPI_TOKEN`. Is the entry published?	chuck-api
”brisket 500s on landing”	Did `STRAPI_URL` or `STRAPI_TOKEN` change in brisket env? Did chuck rotate `API_TOKEN_SALT`?	chuck-env
”Editor can’t log in to /admin”	`ADMIN_JWT_SECRET` rotated, or `APP_KEYS` rotation invalidated session cookies	chuck-env, chuck-runbook
”Image upload fails”	R2 credentials, bucket, public URL	chuck-errors
”Process won’t start”	Env vars missing, Node version, Postgres reachable	chuck-errors
”Editor reports save failed”	Field-level validation; `ValidationError` payload	chuck-errors

Escalation

There is no @oncall-chuck rotation defined; escalation is best-effort.

Page primary on-call. Same rotation as the rest of beef. They judge whether to wake other people.
Loop in service owner. Chuck owner is @marty per chuck.md frontmatter.
Loop in brisket owner if the visible failure is on brisket landing pages.
Loop in marketing / content team for editor-side issues (publish-state, missing copy). TODO(@marty): record the marketing point of contact — no MARKETING.md or marketing handle is referenced in this repo.

If the failure mode is ambiguous (e.g. brisket renders empty hero — chuck bug or brisket bug?), reproduce the failing fetch with curl and STRAPI_TOKEN against chuck. If chuck returns the expected payload, the issue is on brisket.

What is not a chuck on-call concern

Sign-up / login — handled by Clerk via brisket / fennec / brain.
Checkout / billing — handled by sirloin (see billing runbook).
Media generation — handled by brain / round.
Email delivery for transactional flows — that is sirloin’s Nodemailer + shank templates, not chuck’s email plugin (which exists but is largely idle in this deployment).

Comms During Incidents

A chuck-only outage typically does not warrant a status-page notice. Use internal Slack instead.
A multi-service outage where chuck is implicated should follow the cross-service incident playbook. TODO(@marty): link the playbook — no cross-service incident doc is currently checked into docs/src/content/docs/operations/.
Do not paste API tokens, salts, or APP_KEYS into incident channels. See security-model.

Anchor Docs

API surface: chuck-api
Errors: chuck-errors
Runbook: chuck-runbook
Observability: operations/observability