Skip to content

Shank On-Call

Summary

Shank has no direct on-call rotation. It is a build-time React Email project with no runtime, no network surface, and no production deploy. There is nothing to page on for “shank itself” — it cannot be down.

Email-related incidents are routed to two different responders depending on the failure mode:

Failure modeOwnerNotes
Emails not being delivered (SMTP errors, auth failures, recipient lookup failures, rate-limit blocks)sirloin on-callThe send path lives in apps/sirloin/internal/pkg/emails/client.go. Errors surface in sirloin logs and Sentry.
Email content / rendering bugs (broken layout, wrong copy, missing image, placeholder appearing literally)template author (typically frontend / design)Fixed by editing TSX, re-exporting, and shipping a sirloin redeploy. See shank-runbook.md.
Mass deliverability degradation (recipients reporting spam folder, DKIM/SPF failures, blocklist hits)sirloin on-call + opsLikely DNS / SMTP-provider issue, not template content. Coordinate with whoever owns the SMTP credentials.
Suspected PII leak in an emailsecurity on-call + sirloin on-callTreat as incident. See docs/src/content/docs/standards/security-model.md. Pull the offending template, redeploy sirloin with the previous HTML, then triage.

Why no rotation

  • No process to crash. pnpm dev is a developer-laptop preview only.
  • No deploy target. Output is committed HTML inside the sirloin tree.
  • No external dependencies. No DB, queue, third-party API, secret store. The only “deps” are the npm packages used at build time.
  • No SLO. A broken template surfaces only on the next email send, through sirloin telemetry. Sirloin already has the alerts and on-call for that path.

Practical decision tree for an incident

Is the issue "users are not getting emails"?
→ sirloin on-call (SMTP, Clerk lookup, worker triggers)
Is the issue "users got an email but it looks wrong"?
→ template author re-renders, re-exports, ships via sirloin redeploy
(see shank-runbook.md)
Is the issue "an email contained data it should not have"?
→ security on-call leads, sirloin on-call assists with rollback

Contacts

  • Default template owner: @zen (per docs/src/content/docs/services/shank.md frontmatter).
  • Sirloin on-call: see docs/src/content/docs/services/sirloin.md.
  • TODO(@zen): document the formal rotation tooling — no on-call config (PagerDuty / Opsgenie / Grafana OnCall) is checked into this repo, and no schedule link is recorded for shank.

Escalation outside business hours

If a template change is the suspected cause of a deliverability or PII incident outside business hours, the sirloin on-call is authorised to revert the offending sirloin deploy without waiting for the template author. The author can patch the TSX during business hours.

Do not attempt to “hotfix” a template by editing the exported HTML in apps/sirloin/internal/pkg/emails/templates/ directly without also updating the source TSX in apps/shank/react-templates/emails/. The next pnpm export will silently overwrite the patch.