Saga Pattern For Distributed Payments
ADR 006: Saga Pattern for Distributed Payment Transactions
Date: 2026-04
Status: Accepted
Context: Payment processing spans multiple services (Primer, Chargebee, internal DB); must handle partial failures gracefully.
Problem
Payment recording involves multiple steps:
- Verify payment with Chargebee (invoice.record_payment)
- Activate subscription (subscription.change_term_end)
- Record purchase in DB (for idempotency)
- Invalidate cache
- Send analytics
If step 3 fails after step 2 succeeds, we’ve activated a subscription but can’t detect the purchase as already-recorded on retry.
How do we safely coordinate these steps without distributed transactions?
Decision
Use the Saga pattern with compensating transactions:
- Steps execute in order, with careful state recording
- If a step fails, previous steps are logged (not rolled back)
- Retry-safety through idempotency checks + distributed locks
- DB purchase record is the “saga completion marker”
Implementation
// payments/processor.go - RecordPayment flowfunc (p *Processor) RecordPayment(ctx context.Context, req *PaymentRequest) (*PaymentResult, error) { // LOCK: Acquire distributed lock (serializes concurrent attempts) lock, err := locker.TryLock(ctx, lockKey) defer lock.Release(ctx)
// IDEMPOTENCY: Check if already processed isDuplicate, _ := p.IsDuplicate(ctx, req.TransactionID) if isDuplicate { return &PaymentResult{Success: true, AlreadyProcessed: true}, nil }
// STEP 1: Record payment in Chargebee invoiceInfo, err := p.recordChargebeePayment(ctx, req) if err != nil { return nil, domain.NewPaymentError(req.InvoiceID, err) }
// STEP 2: Activate subscription if needed activated := p.handleSubscriptionActivation(ctx, invoiceInfo)
// STEP 3: Record in DB (saga completion marker) err = p.repo.PurchaseCredits(ctx, &models.Purchase{ UserID: req.UserID, TransactionID: req.TransactionID, ItemPriceID: req.ItemPriceID, Amount: req.Amount, CreatedAt: time.Now(), }) if err != nil { // If we get here, Chargebee was updated but DB wasn't // Next retry will: // 1. Acquire lock (serialize) // 2. Check IsDuplicate (finds DB record if it exists) → return success // 3. If no DB record, start over // In either case, we don't double-activate because // subscription is already active + lock prevents double-record return nil, err }
// STEP 4: Invalidate cache p.cacheManager.InvalidateUser(customerID)
return &PaymentResult{...}, nil}
// Idempotency check: DB record is the source of truthfunc (p *Processor) IsDuplicate(ctx context.Context, transactionID string) (bool, error) { purchase, err := p.repo.GetPurchaseByTransactionID(ctx, transactionID) return purchase != nil, err}Rationale
Why Saga, Not Distributed Transactions?
- No ACID transaction across services: Chargebee is external; we can’t 2-phase commit
- Simpler semantics: Saga acknowledges eventual consistency
- Explicit recovery: Compensating actions are logged and visible
- Operational visibility: Failed steps are in logs; easy to debug
Why Not Rollback?
Compensation (rollback) would require:
- Void the Chargebee payment (loses money)
- Cancel the subscription (complex state transition)
- Unclear which user state is correct
Instead, we preserve the Chargebee mutation and retry the DB step. If the DB step fails persistently, human intervention is required.
Saga Boundaries
Unrecoverable Failures (immediate error):└─ Chargebee API error (invalid invoice, no subscription, etc.) → Fail fast; don't attempt saga
Retryable Failures (saga incomplete):├─ DB write fails├─ Cache invalidation fails└─ Analytics notification fails → Log error; customer can retry; next attempt completes saga
Idempotency Check (saga already complete):└─ transactionID already in DB → Return success immediately (no re-execution)Consequences
Positive
- No distributed transactions: Works across service boundaries
- Idempotency safe: Replay of failed requests is harmless
- Explicit state: Each step’s success is observable in logs
- Operational: Stuck sagas can be manually completed/compensated
Negative
- Eventual consistency: Chargebee and DB may briefly disagree
- Human intervention: Stuck sagas require operational tooling
- Complexity: More states to consider (partially executed saga)
- Monitoring required: Must alert on incomplete sagas
Testing
// Test: Saga completes atomicallyresult, err := processor.RecordPayment(ctx, request)require.NoError(t, err)require.True(t, result.Success)
// Verify all saga steps succeeded// Step 1: Chargebee payment recordedinvoice, _ := cb.RetrieveInvoice(invoiceID)require.Equal(t, "paid", invoice.Status)
// Step 2: Subscription activatedsub, _ := cb.RetrieveSubscription(subID)require.Equal(t, "active", sub.Status)
// Step 3: DB purchase recordedpurchase, _ := repo.GetPurchaseByTransactionID(ctx, transactionID)require.NotNil(t, purchase)
// Test: Idempotency on retryresult2, _ := processor.RecordPayment(ctx, sameRequest)require.True(t, result2.AlreadyProcessed)require.True(t, result2.Success)
// Test: Partial failure recovery// (Simulate DB write failure)repo.SetWriteError(errors.New("network error"))result, err := processor.RecordPayment(ctx, request)require.Error(t, err)
// Verify Chargebee side succeededinvoice, _ := cb.RetrieveInvoice(invoiceID)require.Equal(t, "paid", invoice.Status)
// Recover: Fix the DB, retryrepo.SetWriteError(nil)result2, err := processor.RecordPayment(ctx, request)require.NoError(t, err)Related ADRs
- 005: Distributed Locks — Ensures saga steps don’t interleave
- 002: Polling Over Webhooks — Saga is retried if webhook/polling re-triggers