Skip to content

Saga Pattern For Distributed Payments

ADR 006: Saga Pattern for Distributed Payment Transactions

Date: 2026-04
Status: Accepted
Context: Payment processing spans multiple services (Primer, Chargebee, internal DB); must handle partial failures gracefully.

Problem

Payment recording involves multiple steps:

  1. Verify payment with Chargebee (invoice.record_payment)
  2. Activate subscription (subscription.change_term_end)
  3. Record purchase in DB (for idempotency)
  4. Invalidate cache
  5. Send analytics

If step 3 fails after step 2 succeeds, we’ve activated a subscription but can’t detect the purchase as already-recorded on retry.

How do we safely coordinate these steps without distributed transactions?

Decision

Use the Saga pattern with compensating transactions:

  • Steps execute in order, with careful state recording
  • If a step fails, previous steps are logged (not rolled back)
  • Retry-safety through idempotency checks + distributed locks
  • DB purchase record is the “saga completion marker”

Implementation

// payments/processor.go - RecordPayment flow
func (p *Processor) RecordPayment(ctx context.Context, req *PaymentRequest) (*PaymentResult, error) {
// LOCK: Acquire distributed lock (serializes concurrent attempts)
lock, err := locker.TryLock(ctx, lockKey)
defer lock.Release(ctx)
// IDEMPOTENCY: Check if already processed
isDuplicate, _ := p.IsDuplicate(ctx, req.TransactionID)
if isDuplicate {
return &PaymentResult{Success: true, AlreadyProcessed: true}, nil
}
// STEP 1: Record payment in Chargebee
invoiceInfo, err := p.recordChargebeePayment(ctx, req)
if err != nil {
return nil, domain.NewPaymentError(req.InvoiceID, err)
}
// STEP 2: Activate subscription if needed
activated := p.handleSubscriptionActivation(ctx, invoiceInfo)
// STEP 3: Record in DB (saga completion marker)
err = p.repo.PurchaseCredits(ctx, &models.Purchase{
UserID: req.UserID,
TransactionID: req.TransactionID,
ItemPriceID: req.ItemPriceID,
Amount: req.Amount,
CreatedAt: time.Now(),
})
if err != nil {
// If we get here, Chargebee was updated but DB wasn't
// Next retry will:
// 1. Acquire lock (serialize)
// 2. Check IsDuplicate (finds DB record if it exists) → return success
// 3. If no DB record, start over
// In either case, we don't double-activate because
// subscription is already active + lock prevents double-record
return nil, err
}
// STEP 4: Invalidate cache
p.cacheManager.InvalidateUser(customerID)
return &PaymentResult{...}, nil
}
// Idempotency check: DB record is the source of truth
func (p *Processor) IsDuplicate(ctx context.Context, transactionID string) (bool, error) {
purchase, err := p.repo.GetPurchaseByTransactionID(ctx, transactionID)
return purchase != nil, err
}

Rationale

Why Saga, Not Distributed Transactions?

  1. No ACID transaction across services: Chargebee is external; we can’t 2-phase commit
  2. Simpler semantics: Saga acknowledges eventual consistency
  3. Explicit recovery: Compensating actions are logged and visible
  4. Operational visibility: Failed steps are in logs; easy to debug

Why Not Rollback?

Compensation (rollback) would require:

  • Void the Chargebee payment (loses money)
  • Cancel the subscription (complex state transition)
  • Unclear which user state is correct

Instead, we preserve the Chargebee mutation and retry the DB step. If the DB step fails persistently, human intervention is required.

Saga Boundaries

Unrecoverable Failures (immediate error):
└─ Chargebee API error (invalid invoice, no subscription, etc.)
→ Fail fast; don't attempt saga
Retryable Failures (saga incomplete):
├─ DB write fails
├─ Cache invalidation fails
└─ Analytics notification fails
→ Log error; customer can retry; next attempt completes saga
Idempotency Check (saga already complete):
└─ transactionID already in DB
→ Return success immediately (no re-execution)

Consequences

Positive

  • No distributed transactions: Works across service boundaries
  • Idempotency safe: Replay of failed requests is harmless
  • Explicit state: Each step’s success is observable in logs
  • Operational: Stuck sagas can be manually completed/compensated

Negative

  • Eventual consistency: Chargebee and DB may briefly disagree
  • Human intervention: Stuck sagas require operational tooling
  • Complexity: More states to consider (partially executed saga)
  • Monitoring required: Must alert on incomplete sagas

Testing

// Test: Saga completes atomically
result, err := processor.RecordPayment(ctx, request)
require.NoError(t, err)
require.True(t, result.Success)
// Verify all saga steps succeeded
// Step 1: Chargebee payment recorded
invoice, _ := cb.RetrieveInvoice(invoiceID)
require.Equal(t, "paid", invoice.Status)
// Step 2: Subscription activated
sub, _ := cb.RetrieveSubscription(subID)
require.Equal(t, "active", sub.Status)
// Step 3: DB purchase recorded
purchase, _ := repo.GetPurchaseByTransactionID(ctx, transactionID)
require.NotNil(t, purchase)
// Test: Idempotency on retry
result2, _ := processor.RecordPayment(ctx, sameRequest)
require.True(t, result2.AlreadyProcessed)
require.True(t, result2.Success)
// Test: Partial failure recovery
// (Simulate DB write failure)
repo.SetWriteError(errors.New("network error"))
result, err := processor.RecordPayment(ctx, request)
require.Error(t, err)
// Verify Chargebee side succeeded
invoice, _ := cb.RetrieveInvoice(invoiceID)
require.Equal(t, "paid", invoice.Status)
// Recover: Fix the DB, retry
repo.SetWriteError(nil)
result2, err := processor.RecordPayment(ctx, request)
require.NoError(t, err)