Flank On-Call
This page is the on-call quick-reference for flank. For step-by-step procedures see flank-runbook. For error semantics see flank-errors.
What flank is and isn’t
- Is: the visual workflow editor — a React/ReactFlow app on port
3100where admins author the workflow graphs that brain executes. Stateless; all data lives in brain. - Isn’t: the workflow engine, or the source of truth for workflows, executions, adapters, or secrets — those live in brain. Flank does not run executions.
Blame propagation order: editor → flank server functions → brain. If brain is down, flank cannot meaningfully serve traffic.
Top alerts
| Alert | Likely cause | First action | Escalate to |
|---|---|---|---|
beef-flank Railway healthcheck failing | Process crash or boot failure (port collision, build error) | Open Railway logs; look for server.started. Redeploy if last healthy commit is recent. | flank owner (@law) |
| Editor loads but workflow lists empty / saves fail | Flank can’t reach brain — BRAIN_API_URL wrong, brain down, or JWT rejected | Verify BRAIN_API_URL on beef-flank; check brain health; check Clerk session/ADMIN role. See runbook → “Editor can’t reach brain”. | brain on-call (brain down), platform/auth (Clerk) |
Brain fetch errors flooding flank logs (brain … failed) | Brain unavailable or returning 5xx | Check brain health and BRAIN_API_URL. | brain on-call |
| Executions stuck or failing | Engine/storage problem in brain — flank only reads traces | Investigate in brain; brain re-enqueues RUNNING executions on boot. | brain on-call |
401/403 from brain on workflow operations | Expired Clerk session, missing ADMIN role, or Clerk keys rotated without redeploy | Verify operator has ADMIN in brain; check CLERK_* env on beef-flank. | platform/auth owner |
401/Unauthorized flood on UI server functions | Clerk outage or CLERK_SECRET_KEY rotated without redeploy | Check Clerk status; verify env on beef-flank. | platform/auth owner |
Decision tree
- Is brain healthy? If no, defer flank investigation — flank can’t read or write any workflow data without brain.
- Is the flank pod up? Railway dashboard. If down, check last deploy and roll back if a recent change correlates.
- Can flank reach brain? Check
BRAIN_API_URLand tail flank logs forbrain … failedfetch errors. - Is auth working?
401/403from brain → Clerk session or ADMIN-role problem. - Are executions the real issue? If executions are stuck or failing, that’s a brain incident — escalate to brain on-call.
Escalation
| Domain | Owner | When to page |
|---|---|---|
| Flank editor app, deploy, Railway service | @law | Default flank on-call |
| Brain workflow engine, executions, adapters, secrets | brain on-call | Executions stuck/failing, storage errors, adapter issues |
| Railway platform / networking | platform on-call | Healthcheck, internal DNS, deploy pipeline |
| Clerk / auth | platform/auth owner | UI auth flood, brain 401/403, key rotation |
When you’ve handled an incident, update the runbook with anything missing and bump
last_reviewed on this page.