Shank On-Call
Summary
Shank has no direct on-call rotation. It is a build-time React Email project with no runtime, no network surface, and no production deploy. There is nothing to page on for “shank itself” — it cannot be down.
Email-related incidents are routed to two different responders depending on the failure mode:
| Failure mode | Owner | Notes |
|---|---|---|
| Emails not being delivered (SMTP errors, auth failures, recipient lookup failures, rate-limit blocks) | sirloin on-call | The send path lives in apps/sirloin/internal/pkg/emails/client.go. Errors surface in sirloin logs and Sentry. |
| Email content / rendering bugs (broken layout, wrong copy, missing image, placeholder appearing literally) | template author (typically frontend / design) | Fixed by editing TSX, re-exporting, and shipping a sirloin redeploy. See shank-runbook.md. |
| Mass deliverability degradation (recipients reporting spam folder, DKIM/SPF failures, blocklist hits) | sirloin on-call + ops | Likely DNS / SMTP-provider issue, not template content. Coordinate with whoever owns the SMTP credentials. |
| Suspected PII leak in an email | security on-call + sirloin on-call | Treat as incident. See docs/src/content/docs/standards/security-model.md. Pull the offending template, redeploy sirloin with the previous HTML, then triage. |
Why no rotation
- No process to crash.
pnpm devis a developer-laptop preview only. - No deploy target. Output is committed HTML inside the sirloin tree.
- No external dependencies. No DB, queue, third-party API, secret store. The only “deps” are the npm packages used at build time.
- No SLO. A broken template surfaces only on the next email send, through sirloin telemetry. Sirloin already has the alerts and on-call for that path.
Practical decision tree for an incident
Is the issue "users are not getting emails"? → sirloin on-call (SMTP, Clerk lookup, worker triggers)
Is the issue "users got an email but it looks wrong"? → template author re-renders, re-exports, ships via sirloin redeploy (see shank-runbook.md)
Is the issue "an email contained data it should not have"? → security on-call leads, sirloin on-call assists with rollbackContacts
- Default template owner:
@zen(perdocs/src/content/docs/services/shank.mdfrontmatter). - Sirloin on-call: see
docs/src/content/docs/services/sirloin.md. - TODO(@zen): document the formal rotation tooling — no on-call config (PagerDuty / Opsgenie / Grafana OnCall) is checked into this repo, and no schedule link is recorded for shank.
Escalation outside business hours
If a template change is the suspected cause of a deliverability or PII incident outside business hours, the sirloin on-call is authorised to revert the offending sirloin deploy without waiting for the template author. The author can patch the TSX during business hours.
Do not attempt to “hotfix” a template by editing the exported HTML in
apps/sirloin/internal/pkg/emails/templates/ directly without also
updating the source TSX in apps/shank/react-templates/emails/. The next
pnpm export will silently overwrite the patch.