Episode 9 — System Design / 9.11 — Real World System Design Problems

9.11.k Design a Payment System (Stripe / PayPal)

Problem Statement

Design a payment processing system like Stripe or PayPal that handles online transactions between merchants and customers. The system must guarantee exactly-once payment processing, handle multiple payment methods, detect fraud, maintain PCI-DSS compliance, and support refunds, chargebacks, and reconciliation.


1. Requirements

Functional Requirements

  • Process payments (credit card, debit card, bank transfer, digital wallets)
  • Support authorize-then-capture flow and direct charge flow
  • Support merchant onboarding with KYC verification
  • Provide checkout APIs for merchant integration
  • Handle refunds (full and partial)
  • Handle chargebacks and disputes
  • Maintain transaction ledger with double-entry bookkeeping
  • Send payment notifications via webhooks to merchants
  • Support multiple currencies with exchange rate conversion
  • Generate settlement reports for merchants
  • Provide a merchant dashboard for transaction management

Non-Functional Requirements

  • Exactly-once payment processing (idempotency guarantee)
  • 99.999% availability for the payment path
  • Transaction latency < 2 seconds end-to-end
  • PCI-DSS Level 1 compliance for card data handling
  • Support 10,000 transactions per second at peak
  • Strong consistency for all financial data
  • Comprehensive audit trail for every state change
  • Disaster recovery with RPO = 0 (no data loss)

2. Capacity Estimation

Traffic

Daily transactions:      50 million
Transactions per second: 50M / 86,400 ~= 580 TPS
Peak TPS:                ~10,000 (Black Friday, flash sales)

Webhook deliveries:      50M * 3 events avg = 150M webhooks/day
Webhook rate:            ~1,750/sec

API calls per transaction: ~5 (auth, capture, status check, webhook, etc.)
Total API calls/sec:       580 * 5 = 2,900/sec average, 50K peak

Storage

Transaction record:       ~2 KB (amounts, status, metadata, audit fields)
Daily transaction data:   50M * 2 KB = 100 GB/day
Yearly transaction data:  ~36 TB
7-year retention:         ~252 TB

Ledger entries:           50M * 4 entries avg = 200M entries/day * 200 bytes = 40 GB/day
Yearly ledger:            ~14.6 TB

Card vault (tokenized):   500 million cards * 500 bytes = 250 GB
Merchant data:            2 million merchants * 5 KB = 10 GB
Audit log:                ~200 GB/day (every state change logged)

Bandwidth

Inbound (payment requests):  580/sec * 2 KB = 1.16 MB/s
Outbound (responses):        580/sec * 1 KB = 0.58 MB/s
Webhook outbound:            1,750/sec * 1 KB = 1.75 MB/s

3. High-Level Architecture

+----------+        +-------------------+
| Merchant |------->| API Gateway       |
| Server   |        | + Rate Limiter    |
+----------+        | + Auth (API Keys) |
                    +--------+----------+
                             |
          +------------------+------------------+
          |                  |                  |
   +------v------+   +------v------+   +------v------+
   | Payment     |   | Merchant    |   | Webhook     |
   | Service     |   | Service     |   | Service     |
   +------+------+   +-------------+   +------+------+
          |                                    |
   +------v------+                     +-------v------+
   | Risk/Fraud  |                     | Webhook Queue|
   | Engine      |                     | (SQS/Kafka)  |
   +------+------+                     +--------------+
          |
   +------v------+
   | Payment     |           +------------------+
   | Router      |---------->| Card Vault       |
   +------+------+           | (PCI Isolated)   |
          |                  +------------------+
   +------v------+------+------+
   |             |             |
+--v---+   +----v---+   +----v----+
| Visa |   |Mastercard|  | Bank   |
| PSP  |   | PSP    |   | ACH    |
+------+   +--------+   +--------+
          |
   +------v------+
   | Ledger      |    +------------------+
   | Service     |--->| Ledger DB        |
   +------+------+    | (Append-only)    |
          |           +------------------+
   +------v------+
   | Settlement  |    +------------------+
   | Service     |--->| Settlement DB    |
   +-------------+    +------------------+

+------------------+    +------------------+
| Reconciliation   |    | Reporting        |
| Service (batch)  |    | Service          |
+------------------+    +------------------+

Payment Router Detail

The Payment Router selects the optimal PSP for each transaction:

+------------------+
| Payment Router   |
+--------+---------+
         |
         |  Routing rules:
         |  1. Card BIN -> preferred PSP (Visa Direct for Visa cards)
         |  2. Geographic routing (EU cards -> Adyen, US cards -> Stripe)
         |  3. Cost optimization (cheapest PSP for transaction type)
         |  4. Failover (if primary PSP down, route to secondary)
         |  5. Load balancing (spread across PSPs when equivalent)
         |
    +----+----+----+----+
    |         |         |
+---v---+ +--v----+ +--v--------+
| PSP-1 | | PSP-2 | | PSP-3    |
| (Visa | | (Adyen| | (Checkout |
|  Net) | |      )| |  .com)   |
+-------+ +-------+ +----------+

4. API Design

POST /api/v1/payments
  Headers:
    Authorization: Bearer <merchant_api_key>
    Idempotency-Key: "unique-request-id-abc123"
  Body: {
    "amount": 4999,                 // in smallest currency unit (cents)
    "currency": "USD",
    "payment_method_id": "pm_card_visa_4242",
    "customer_id": "cust_abc",
    "description": "Order #12345",
    "metadata": { "order_id": "12345" },
    "capture": true                 // false for auth-only
  }
  Response 201: {
    "payment_id": "pay_7xKq9mP3",
    "status": "succeeded",         // pending|succeeded|failed|cancelled
    "amount": 4999,
    "currency": "USD",
    "payment_method": { "type": "card", "last4": "4242", "brand": "visa" },
    "created_at": "2026-04-11T10:00:00Z",
    "idempotency_key": "unique-request-id-abc123"
  }

POST /api/v1/payments/{payment_id}/capture
  Headers:
    Authorization: Bearer <merchant_api_key>
    Idempotency-Key: "capture-unique-id"
  Body: { "amount": 4999 }   // can capture less than authorized
  Response 200: {
    "payment_id": "pay_7xKq9mP3",
    "status": "captured",
    "captured_amount": 4999
  }

POST /api/v1/payments/{payment_id}/refund
  Headers:
    Authorization: Bearer <merchant_api_key>
    Idempotency-Key: "refund-unique-id-xyz"
  Body: {
    "amount": 2000,                // partial refund; omit for full refund
    "reason": "customer_request"
  }
  Response 201: {
    "refund_id": "ref_abc123",
    "payment_id": "pay_7xKq9mP3",
    "amount": 2000,
    "status": "succeeded",
    "created_at": "2026-04-11T11:00:00Z"
  }

POST /api/v1/payment_methods
  Headers: Authorization: Bearer <merchant_api_key>
  Body: {
    "type": "card",
    "card": {
      "number": "4242424242424242",   // tokenized client-side before reaching API
      "exp_month": 12,
      "exp_year": 2028,
      "cvc": "123"
    },
    "customer_id": "cust_abc"
  }
  Response 201: {
    "payment_method_id": "pm_card_visa_4242",
    "type": "card",
    "card": { "last4": "4242", "brand": "visa", "exp_month": 12, "exp_year": 2028 }
  }

GET /api/v1/payments/{payment_id}
  Response 200: { ... full payment object with event timeline ... }

GET /api/v1/payments?customer_id=cust_abc&limit=20&starting_after=pay_xyz
  Response 200: { "data": [...], "has_more": true }

POST /api/v1/webhooks
  Body: {
    "url": "https://merchant.com/webhooks/payments",
    "events": ["payment.succeeded", "payment.failed", "refund.created",
               "chargeback.created", "chargeback.resolved"]
  }
  Response 201: {
    "webhook_id": "wh_abc",
    "secret": "whsec_xxx..."    // for signature verification
  }

5. Database Schema

Payments Table (PostgreSQL -- ACID required)

CREATE TABLE payments (
    payment_id        VARCHAR(20) PRIMARY KEY,
    merchant_id       VARCHAR(20) NOT NULL REFERENCES merchants(merchant_id),
    customer_id       VARCHAR(20),
    amount            BIGINT NOT NULL,          -- in smallest currency unit
    currency          VARCHAR(3) NOT NULL,
    status            VARCHAR(20) NOT NULL DEFAULT 'pending',
    payment_method_id VARCHAR(50),
    payment_method_type VARCHAR(20),
    description       VARCHAR(500),
    metadata          JSONB,
    idempotency_key   VARCHAR(255) UNIQUE,
    risk_score        DECIMAL(5,4),
    risk_action       VARCHAR(20),             -- 'approve','review','decline'
    failure_code      VARCHAR(50),
    failure_message   VARCHAR(500),
    authorized_amount BIGINT DEFAULT 0,
    captured_amount   BIGINT DEFAULT 0,
    refunded_amount   BIGINT DEFAULT 0,
    fee_amount        BIGINT DEFAULT 0,
    net_amount        BIGINT DEFAULT 0,
    psp_name          VARCHAR(50),
    psp_reference     VARCHAR(255),            -- external PSP transaction ID
    settlement_batch  VARCHAR(50),
    created_at        TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at        TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    version           INTEGER DEFAULT 0        -- optimistic locking
);

CREATE INDEX idx_payments_merchant ON payments(merchant_id, created_at DESC);
CREATE INDEX idx_payments_customer ON payments(customer_id, created_at DESC);
CREATE INDEX idx_payments_idempotency ON payments(idempotency_key);
CREATE INDEX idx_payments_status ON payments(status) WHERE status IN ('pending', 'authorized');
CREATE INDEX idx_payments_settlement ON payments(settlement_batch) WHERE settlement_batch IS NOT NULL;

Ledger Entries Table (Append-Only)

CREATE TABLE ledger_entries (
    entry_id          BIGSERIAL PRIMARY KEY,
    payment_id        VARCHAR(20) NOT NULL,
    entry_type        VARCHAR(10) NOT NULL,    -- 'debit' or 'credit'
    account_id        VARCHAR(50) NOT NULL,    -- merchant, platform, customer
    account_type      VARCHAR(20) NOT NULL,    -- 'merchant', 'platform_fee', 'psp_fee'
    amount            BIGINT NOT NULL,
    currency          VARCHAR(3) NOT NULL,
    balance_after     BIGINT NOT NULL,
    description       VARCHAR(500),
    created_at        TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Append-only: no UPDATE or DELETE allowed (enforced by DB triggers)
CREATE INDEX idx_ledger_account ON ledger_entries(account_id, created_at DESC);
CREATE INDEX idx_ledger_payment ON ledger_entries(payment_id);

Refunds Table

CREATE TABLE refunds (
    refund_id         VARCHAR(20) PRIMARY KEY,
    payment_id        VARCHAR(20) NOT NULL REFERENCES payments(payment_id),
    amount            BIGINT NOT NULL,
    currency          VARCHAR(3) NOT NULL,
    status            VARCHAR(20) NOT NULL DEFAULT 'pending',
    reason            VARCHAR(100),
    idempotency_key   VARCHAR(255) UNIQUE,
    psp_reference     VARCHAR(255),
    initiated_by      VARCHAR(20),            -- 'merchant', 'system', 'admin'
    created_at        TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at        TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_refunds_payment ON refunds(payment_id);

Chargebacks Table

CREATE TABLE chargebacks (
    chargeback_id     VARCHAR(20) PRIMARY KEY,
    payment_id        VARCHAR(20) NOT NULL REFERENCES payments(payment_id),
    amount            BIGINT NOT NULL,
    currency          VARCHAR(3) NOT NULL,
    status            VARCHAR(20) NOT NULL DEFAULT 'open',
                      -- open|evidence_submitted|won|lost|expired
    reason_code       VARCHAR(10),             -- card network reason code
    reason_description VARCHAR(500),
    evidence_due_by   TIMESTAMP,
    evidence_submitted JSONB,                  -- documents, emails, logs
    psp_reference     VARCHAR(255),
    opened_at         TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    resolved_at       TIMESTAMP,
    outcome           VARCHAR(10)              -- 'won', 'lost'
);

CREATE INDEX idx_chargebacks_payment ON chargebacks(payment_id);
CREATE INDEX idx_chargebacks_status ON chargebacks(status) WHERE status = 'open';
CREATE INDEX idx_chargebacks_due ON chargebacks(evidence_due_by) WHERE status = 'open';

Idempotency Keys Table

CREATE TABLE idempotency_keys (
    key               VARCHAR(255) PRIMARY KEY,
    merchant_id       VARCHAR(20) NOT NULL,
    request_hash      VARCHAR(64) NOT NULL,     -- SHA-256 of request body
    response_code     INTEGER,
    response_body     JSONB,
    created_at        TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    expires_at        TIMESTAMP DEFAULT (CURRENT_TIMESTAMP + INTERVAL '24 hours')
);

Merchants Table

CREATE TABLE merchants (
    merchant_id       VARCHAR(20) PRIMARY KEY,
    business_name     VARCHAR(200) NOT NULL,
    email             VARCHAR(255) NOT NULL,
    country           VARCHAR(2) NOT NULL,
    default_currency  VARCHAR(3) NOT NULL,
    kyc_status        VARCHAR(20) DEFAULT 'pending',  -- pending|verified|rejected
    risk_level        VARCHAR(10) DEFAULT 'standard',
    settlement_schedule VARCHAR(20) DEFAULT 'T+2',
    fee_rate_percent  DECIMAL(5,4) DEFAULT 0.0290,    -- 2.9%
    fee_fixed_cents   INTEGER DEFAULT 30,              -- $0.30
    api_key_hash      VARCHAR(64) NOT NULL,
    webhook_url       VARCHAR(2048),
    webhook_secret    VARCHAR(64),
    payout_account    JSONB,                           -- bank account details
    created_at        TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Payment Audit Log (Append-Only, Immutable)

CREATE TABLE payment_audit_log (
    log_id            BIGSERIAL PRIMARY KEY,
    payment_id        VARCHAR(20) NOT NULL,
    event_type        VARCHAR(50) NOT NULL,
    old_status        VARCHAR(20),
    new_status        VARCHAR(20),
    event_data        JSONB,
    actor             VARCHAR(100),          -- 'system', 'merchant', 'admin'
    ip_address        VARCHAR(45),
    created_at        TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Partitioned by month for efficient archival
-- NEVER UPDATE or DELETE rows
CREATE INDEX idx_audit_payment ON payment_audit_log(payment_id, created_at);

6. Deep Dive: Payment Processing Flow

Happy Path (Direct Charge)

Merchant            API Gateway       Payment Service    Fraud Engine    PSP (Visa)
  |                     |                  |                 |              |
  |-- POST /payments -->|                  |                 |              |
  |  (Idempotency-Key)  |                  |                 |              |
  |                     |-- auth + rate -->|                 |              |
  |                     |   limit check    |                 |              |
  |                     |                  |                 |              |
  |                     |                  |-- check key --->|              |
  |                     |                  |   idempotency   |              |
  |                     |                  |   table         |              |
  |                     |                  |                 |              |
  |                     |                  |-- risk check -->|              |
  |                     |                  |                 |-- score=0.1  |
  |                     |                  |                 |  (low risk)  |
  |                     |                  |                 |              |
  |                     |                  |-- save payment  |              |
  |                     |                  |   status=pending|              |
  |                     |                  |                 |              |
  |                     |                  |-- charge card --|------------->|
  |                     |                  |                 |              |
  |                     |                  |<-- approved ----|--------------|
  |                     |                  |                 |              |
  |                     |                  |-- update payment|              |
  |                     |                  |   status=success|              |
  |                     |                  |                 |              |
  |                     |                  |-- write ledger  |              |
  |                     |                  |   entries       |              |
  |                     |                  |                 |              |
  |                     |                  |-- save idempotency response    |
  |                     |                  |                 |              |
  |                     |                  |-- enqueue webhook              |
  |                     |                  |                 |              |
  |<-- 201 Created -----|------------------|                 |              |

Auth-Then-Capture Flow

Day 1: Authorization (reserve funds)
  POST /api/v1/payments  (capture: false)
  -> Card network places a hold on customer's card
  -> Payment status: "authorized"
  -> No money moves yet

Day 3: Capture (after merchant ships the item)
  POST /api/v1/payments/{id}/capture
  -> Card network settles the held amount
  -> Payment status: "captured"
  -> Ledger entries created, money moves

This two-step flow prevents charging customers before goods are shipped.
Hotels, car rentals, and marketplaces commonly use this pattern.
Authorization hold expires after 7 days (configurable by card network).

State Machine

                    +----------+
                    | CREATED  |
                    +----+-----+
                         |
                    +----v-----+
            +------>| PENDING  |<------+
            |       +----+-----+       |
            |            |             |
       (retry)     +-----+-----+   (timeout)
            |      |           |       |
       +----+---+  |      +---+----+  |
       | FAILED |  |      |DECLINED|  |
       +--------+  |      +--------+  |
                   v
            +------+------+
            | AUTHORIZED  |----> (auth-only flow)
            +------+------+
                   |
            +------v------+       +----------+
            | CAPTURED /  |       | VOIDED   |
            | SUCCEEDED   |       | (auth    |
            +------+------+       |  cancelled)
                   |              +----------+
            +------v-------+
            | PARTIALLY    |
            | REFUNDED     |
            +------+-------+
                   |
            +------v------+
            | FULLY       |
            | REFUNDED    |
            +-------------+

Separate path (chargebacks):
  SUCCEEDED --> DISPUTED (chargeback initiated by cardholder)
  DISPUTED  --> EVIDENCE_SUBMITTED (merchant provides evidence)
  EVIDENCE_SUBMITTED --> DISPUTE_WON | DISPUTE_LOST

State Transition Rules (enforced in code):
  VALID_TRANSITIONS = {
      "created":            ["pending"],
      "pending":            ["authorized", "succeeded", "failed", "declined"],
      "authorized":         ["captured", "voided"],
      "captured":           ["succeeded"],
      "succeeded":          ["partially_refunded", "disputed"],
      "partially_refunded": ["fully_refunded", "disputed"],
      "disputed":           ["evidence_submitted"],
      "evidence_submitted": ["dispute_won", "dispute_lost"],
  }

7. Deep Dive: Idempotency (Exactly-Once Processing)

Why Idempotency is Critical

Problem scenario without idempotency:
  1. Merchant sends POST /payments (charge $50)
  2. Our system processes the charge, Visa approves
  3. Network timeout before merchant receives response
  4. Merchant retries POST /payments (charge $50 again!)
  5. Customer is charged $100 instead of $50

With idempotency:
  1. Merchant sends POST /payments with Idempotency-Key: "order-123"
  2. System processes the charge, Visa approves
  3. Network timeout before merchant receives response
  4. Merchant retries with same Idempotency-Key: "order-123"
  5. System finds existing result, returns saved response
  6. Customer is charged only $50

Implementation

def process_payment(request):
    key = request.headers["Idempotency-Key"]
    merchant_id = request.auth.merchant_id
    request_hash = sha256(canonicalize(request.body))
    
    # Step 1: Check if we have seen this key before
    existing = db.query(
        "SELECT * FROM idempotency_keys WHERE key = %s AND merchant_id = %s",
        key, merchant_id
    )
    
    if existing:
        # Verify request body matches (prevent key reuse with different params)
        if existing.request_hash != request_hash:
            raise Error(422, "Idempotency key reused with different request body")
        
        # Return cached response
        return Response(existing.response_code, existing.response_body)
    
    # Step 2: Acquire lock on the key (prevent concurrent duplicates)
    lock_acquired = db.execute(
        "INSERT INTO idempotency_keys (key, merchant_id, request_hash) "
        "VALUES (%s, %s, %s) ON CONFLICT DO NOTHING RETURNING key",
        key, merchant_id, request_hash
    )
    
    if not lock_acquired:
        # Another request with same key is in progress
        raise Error(409, "Request with this idempotency key is in progress")
    
    # Step 3: Process payment normally
    try:
        result = do_payment_processing(request)
        
        # Step 4: Save response for future idempotent lookups
        db.execute(
            "UPDATE idempotency_keys SET response_code=%s, response_body=%s "
            "WHERE key = %s",
            result.status_code, result.body, key
        )
        
        return result
    except Exception as e:
        # Remove key so merchant can retry
        db.execute("DELETE FROM idempotency_keys WHERE key = %s", key)
        raise

End-to-End Idempotency

Idempotency must be enforced at every boundary:

Merchant --> Our API    : Idempotency-Key header
Our API --> PSP (Visa)  : PSP-specific idempotency key (payment_id + "_charge")
Our API --> Ledger      : Unique constraint on (payment_id, entry_type)
Our API --> Webhook     : Event ID deduplication at merchant

If any layer retries, the next layer rejects the duplicate.

8. Deep Dive: Double-Entry Bookkeeping

Principle

Every financial transaction creates at least TWO ledger entries:
  1. A DEBIT from one account
  2. A CREDIT to another account

Sum of all debits == Sum of all credits (always balanced)
This invariant is checked continuously by the reconciliation system.

Payment Example ($50.00 payment)

Payment: Customer pays $50.00 to Merchant
Platform fee: 2.9% + $0.30 = $1.75
PSP fee: $0.25

Ledger entries:
+-------+-------------------+--------+--------+---------+
| Entry | Account           | Debit  | Credit | Balance |
+-------+-------------------+--------+--------+---------+
| 1     | Customer (source) | $50.00 |        |         |
| 2     | Platform Holding  |        | $50.00 | +$50.00 |
| 3     | Platform Holding  | $48.00 |        | +$2.00  |
| 4     | Merchant Balance  |        | $48.00 | +$48.00 |
| 5     | Platform Holding  | $1.75  |        | +$0.25  |
| 6     | Platform Revenue  |        | $1.75  | +$1.75  |
| 7     | Platform Holding  | $0.25  |        | $0.00   |
| 8     | PSP Payable       |        | $0.25  | +$0.25  |
+-------+-------------------+--------+--------+---------+

Verification: Total debits = $50 + $48 + $1.75 + $0.25 = $100
              Total credits = $50 + $48 + $1.75 + $0.25 = $100  (balanced)

Refund Example ($50 full refund)

Ledger entries (reverse the original):
+-------+-------------------+--------+--------+
| Entry | Account           | Debit  | Credit |
+-------+-------------------+--------+--------+
| 9     | Merchant Balance  | $48.00 |        |
| 10    | Platform Holding  |        | $48.00 |
| 11    | Platform Revenue  | $1.75  |        |
| 12    | Platform Holding  |        | $1.75  |
| 13    | Platform Holding  | $50.00 |        |
| 14    | Customer (refund) |        | $50.00 |
+-------+-------------------+--------+--------+

Note: PSP fee ($0.25) is typically NOT refunded.
Merchant absorbs the PSP fee on refunds.

9. Deep Dive: Fraud Detection

Multi-Layer Fraud Detection

Layer 1: Rule-Based (synchronous, < 5ms)
  - Velocity checks: > 5 transactions in 1 minute from same card
  - Amount thresholds: single transaction > $10,000
  - Geographic anomaly: transaction from country different than card issuer
  - BIN checks: known high-risk card BINs
  - Blocked lists: known fraudulent cards, IPs, emails

Layer 2: ML Model (synchronous, < 100ms)
  - Features:
    - Transaction amount relative to customer average
    - Time since last transaction
    - Device fingerprint match
    - IP geolocation vs billing address
    - Merchant category risk score
    - Historical chargeback rate for this card
    - Behavioral signals (typing speed, mouse patterns on checkout)
  - Output: risk score 0.0 - 1.0
  
Layer 3: Manual Review Queue (asynchronous)
  - Transactions with score 0.7-0.9 queued for human review
  - Score > 0.9 auto-declined
  - Score < 0.3 auto-approved
  - Score 0.3-0.7 approved with enhanced monitoring

Fraud Detection Pipeline:
+--------+     +----------+     +----------+     +---------+
| Request| --> | Rule     | --> | ML       | --> | Decision|
|        |     | Engine   |     | Scoring  |     | Engine  |
+--------+     +----------+     +----------+     +---------+
                    |                                  |
                (< 5ms)                    +-----------+-----------+
                                           |           |           |
                                        Approve    Review      Decline

Risk Scoring Implementation

def compute_risk_score(transaction, customer_profile):
    score = 0.0
    
    # Velocity check
    recent_txns = get_transactions(card_hash, last_minutes=5)
    if len(recent_txns) > 3:
        score += 0.3
    
    # Amount anomaly
    avg_amount = customer_profile.avg_transaction_amount
    if avg_amount > 0 and transaction.amount > avg_amount * 5:
        score += 0.2
    
    # Geolocation mismatch
    if transaction.ip_country != customer_profile.billing_country:
        score += 0.2
    
    # Device fingerprint
    if transaction.device_id not in customer_profile.known_devices:
        score += 0.15
    
    # ML model adjustment (trained on historical fraud data)
    ml_score = ml_model.predict(extract_features(transaction, customer_profile))
    
    # Blend rule-based and ML scores
    final_score = 0.4 * score + 0.6 * ml_score
    
    return min(final_score, 1.0)

10. Deep Dive: PCI-DSS Compliance

Architecture for PCI Isolation

+----------------------------------------------------------+
|                    Public Zone                            |
|  +------------+  +--------------+  +------------------+  |
|  | API Gateway|  | Merchant     |  | Merchant Portal  |  |
|  |            |  | Checkout SDK |  | (dashboard)      |  |
|  +------+-----+  +------+-------+  +------------------+  |
|         |               |                                |
+---------+---------------+--------------------------------+
          |               |
+=========+===============+============================+
|         PCI-DSS Scope (isolated network segment)     |
|                                                      |
|  +------------+     +------------------+             |
|  | Tokenizer  |     | Card Vault       |             |
|  | Service    |<--->| (encrypted at    |             |
|  |            |     |  rest, HSM keys) |             |
|  +------+-----+     +------------------+             |
|         |                                            |
|  +------v-----------+    +---------------------+     |
|  | Payment Processor|    | Key Management      |     |
|  | (PSP connector)  |    | (HSM - Hardware     |     |
|  +------------------+    |  Security Module)    |     |
|                          +---------------------+     |
+======================================================+

Tokenization Flow

1. Client-side: Card number entered in merchant's checkout form
   (within an iframe served from OUR domain, not the merchant's)
2. JavaScript SDK sends card directly to our Tokenizer over TLS 1.3
   (card number NEVER touches the merchant's servers)
3. Tokenizer encrypts card with HSM-managed key, stores in Card Vault
4. Returns token: "pm_card_visa_4242"
5. Merchant uses token for all subsequent API calls
6. Merchant never sees or stores full card number

Key outcome: merchant does NOT need PCI-DSS certification
             only WE need PCI-DSS Level 1 compliance

Data Classification

+---------------------+------------------+------------------------+
| Data Type           | Classification   | Storage                |
+---------------------+------------------+------------------------+
| Full card number    | PCI Restricted   | Card Vault (encrypted) |
| CVV/CVC             | PCI Restricted   | NEVER stored           |
| Card expiry         | PCI Sensitive    | Card Vault (encrypted) |
| Cardholder name     | PCI Sensitive    | Card Vault (encrypted) |
| Payment token       | Non-PCI          | Payment DB             |
| Transaction amount  | Non-PCI          | Payment DB             |
| Last 4 digits       | Non-PCI          | Payment DB             |
+---------------------+------------------+------------------------+

11. Deep Dive: Refunds and Chargebacks

Refund Flow

Merchant                Payment Service          PSP (Visa)          Ledger
  |                          |                      |                  |
  |-- POST /refund --------->|                      |                  |
  |   (Idempotency-Key)      |                      |                  |
  |                          |-- validate: -------->|                  |
  |                          |   refund <= captured  |                  |
  |                          |   refund <= remaining |                  |
  |                          |                      |                  |
  |                          |-- create refund      |                  |
  |                          |   status=pending     |                  |
  |                          |                      |                  |
  |                          |-- reverse charge --->|                  |
  |                          |                      |                  |
  |                          |<-- approved ---------|                  |
  |                          |                      |                  |
  |                          |-- update refund      |                  |
  |                          |   status=succeeded   |                  |
  |                          |                      |                  |
  |                          |-- reverse ledger ----|----------------->|
  |                          |   entries            |                  |
  |                          |                      |                  |
  |                          |-- update payment     |                  |
  |                          |   refunded_amount    |                  |
  |                          |                      |                  |
  |                          |-- enqueue webhook    |                  |
  |                          |   "refund.succeeded" |                  |
  |                          |                      |                  |
  |<-- 201 refund created --|                      |                  |

Timing: refund to customer's card takes 5-10 business days
        (funds move: us -> card network -> issuing bank -> customer)

Chargeback Flow

A chargeback occurs when a cardholder disputes a charge with their bank.

Timeline:
  Day 0:  Customer calls bank: "I did not make this purchase"
  Day 1:  Issuing bank files chargeback with card network
  Day 2:  Card network notifies us via PSP
  Day 3:  We notify merchant via webhook: "chargeback.created"
  Day 3-21: Merchant submits evidence (receipt, shipping proof, logs)
  Day 21: Evidence deadline
  Day 30-45: Card network reviews evidence
  Day 45-75: Decision: merchant wins or loses

Our system:
  1. Receive chargeback notification from PSP
  2. Create chargeback record in database
  3. Immediately debit merchant balance (provisional hold)
  4. Notify merchant via webhook + email
  5. Provide evidence submission API
  6. Track deadline and send reminders
  7. On resolution:
     - Merchant wins: credit merchant balance back
     - Merchant loses: debit becomes permanent; fee charged ($15-25)

Chargeback rate monitoring:
  If merchant chargeback rate > 1%: flag for review
  If merchant chargeback rate > 2%: restrict merchant account
  Card networks penalize platforms with high chargeback rates

12. Deep Dive: Reconciliation

Why Reconciliation

Three sources of truth that can diverge:
  1. Our internal ledger (what we think happened)
  2. PSP records (what Visa/Mastercard thinks happened)
  3. Bank settlement files (what the bank actually moved)

Reconciliation ensures all three agree.

Reconciliation Pipeline

Daily batch process (runs at 2:00 AM UTC):

Step 1: Fetch PSP settlement files
  - Download CSV/SFTP files from Visa, Mastercard, etc.
  - Parse into standardized format

Step 2: Match internal records
  For each PSP record:
    - Find matching payment by psp_reference
    - Compare: amount, currency, status, timestamp
    
Step 3: Identify discrepancies
  - MATCHED: Internal record matches PSP record
  - UNMATCHED_INTERNAL: We have a record, PSP does not
  - UNMATCHED_EXTERNAL: PSP has a record, we do not
  - AMOUNT_MISMATCH: Amounts differ
  - STATUS_MISMATCH: Status differs

Step 4: Generate reconciliation report
  +-------------------------------------------+
  | Daily Reconciliation Report               |
  | Date: 2026-04-11                          |
  +-------------------------------------------+
  | Total transactions:    50,234,891         |
  | Matched:               50,234,650 (99.99%)|
  | Unmatched (internal):  142                |
  | Unmatched (external):  87                 |
  | Amount mismatches:     12                 |
  +-------------------------------------------+

Step 5: Auto-resolve known patterns
  - Timing differences (processed at midnight boundary)
  - Currency rounding (< $0.01 difference)
  - Delayed PSP processing (appears next day)

Step 6: Escalate unresolved to finance team
  - Auto-create JIRA ticket for each unresolved discrepancy
  - SLA: resolve within 48 hours

Balance Reconciliation

Continuous verification (every hour):

For each merchant account:
  Calculated balance = SUM(credits) - SUM(debits) in ledger
  Stored balance    = merchants.current_balance

  If calculated != stored:
    ALERT: balance drift detected
    Action: freeze payouts, investigate

This catches bugs where a ledger write succeeded but the
balance update did not (or vice versa).

13. Webhook Delivery System

Reliable Webhook Delivery

Payment Event --> Outbox Table --> Outbox Worker --> Kafka --> Webhook Worker
                                                               |
                                                    Merchant Endpoint

Using the Outbox Pattern:
  BEGIN TRANSACTION
    UPDATE payments SET status = 'succeeded' ...
    INSERT INTO outbox (event_type, payload) VALUES ('payment.succeeded', ...)
  COMMIT

  Outbox worker reads outbox table, publishes to Kafka, marks as sent.
  This ensures the payment update and event are atomic.

Webhook Worker logic:
1. Consume event from Kafka
2. Construct webhook payload
3. Sign payload with merchant's webhook secret
   signature = HMAC-SHA256(webhook_secret, timestamp + "." + payload)
4. POST to merchant's webhook URL
   Headers:
     X-Webhook-Signature: sha256=abc123...
     X-Webhook-Timestamp: 1681200000
     X-Webhook-Id: evt_abc123
5. If merchant responds 2xx: mark delivered, commit offset
6. If merchant responds 4xx/5xx or timeout: schedule retry

Retry schedule (exponential backoff):
  Attempt 1: immediate
  Attempt 2: 1 minute
  Attempt 3: 5 minutes
  Attempt 4: 30 minutes
  Attempt 5: 2 hours
  Attempt 6: 12 hours
  Attempt 7: 24 hours (final attempt)
  
After 7 failures: mark as "failed", notify merchant via email.
Merchant can replay missed webhooks from their dashboard.

Webhook Payload Example

{
  "id": "evt_abc123",
  "type": "payment.succeeded",
  "created": 1681200000,
  "data": {
    "payment_id": "pay_7xKq9mP3",
    "amount": 4999,
    "currency": "USD",
    "status": "succeeded",
    "merchant_id": "merch_xyz"
  }
}

14. Payment Gateway Integration

Multi-PSP Strategy

Why multiple PSPs?
  1. Redundancy: if one PSP goes down, route to another
  2. Cost optimization: different PSPs charge different rates
  3. Authorization rates: some PSPs have higher approval rates for certain
     card types or regions
  4. Coverage: some PSPs support local payment methods others do not

Routing logic:
  def select_psp(payment):
      # Check PSP health
      healthy_psps = [p for p in ALL_PSPS if health_check(p)]
      
      # Filter by payment method support
      compatible = [p for p in healthy_psps if p.supports(payment.method)]
      
      # Score by: cost, authorization rate, latency
      scored = []
      for psp in compatible:
          score = (
              0.4 * psp.auth_rate_for(payment.card_bin)
              + 0.3 * (1 / psp.cost_for(payment))
              + 0.3 * (1 / psp.avg_latency)
          )
          scored.append((psp, score))
      
      return max(scored, key=lambda x: x[1])[0]

Failover:
  If primary PSP returns error or times out (> 5 seconds):
    1. Retry once with primary PSP (network glitch)
    2. If still fails, route to secondary PSP
    3. Record the failover for monitoring
    4. If primary PSP fails > 10% of requests in 5 min: circuit breaker OPEN
       (all traffic goes to secondary until primary recovers)

15. Scaling Considerations

Database Scaling

Strategy: Shard by merchant_id

Shard routing:
  shard_id = consistent_hash(merchant_id) % num_shards

Shard layout (16 shards):
  Each shard: PostgreSQL primary + 2 synchronous replicas (zero data loss)

Read scaling: read replicas for dashboard/reporting queries
Write scaling: sharding by merchant_id distributes writes evenly

Hot Merchant Problem

Problem: A single large merchant (e.g., Amazon) generates 10% of all traffic.
         One shard becomes a bottleneck.

Solution: Sub-shard large merchants by customer_id within the merchant shard.
  shard_id = hash(merchant_id + customer_id) % num_sub_shards
  
Alternatively: Dedicated shard cluster for top 10 merchants.

Payment Processing Pipeline

For 10,000 TPS peak:

API Gateway:          20 instances (500 TPS each)
Payment Service:      40 instances (250 TPS each)
Fraud Engine:         20 instances (500 TPS each)
PSP Connectors:       10 per PSP (load balanced)
Database:             16 shards * 3 replicas = 48 DB instances
Webhook Workers:      30 instances
Kafka:                12 brokers, 64 partitions for payment events

Total infrastructure: ~200 instances for payment path

Multi-Currency Support

Payment in EUR, merchant settles in USD:

1. Customer pays 40.00 EUR
2. System converts: 40.00 EUR * 1.08 (rate) = 43.20 USD
3. Apply fee: 43.20 * 2.9% + 0.30 = $1.55 fee
4. Merchant receives: 43.20 - 1.55 = $41.65 USD

Exchange Rate Service:
  - Rates fetched from multiple providers every minute
  - Cached with 5-minute TTL
  - Locked at time of payment (quoted rate honored for 30 minutes)
  - Spread applied (0.5-1% markup on mid-market rate)

Schema addition:
  payment_amount:      4000       (in EUR cents)
  payment_currency:    "EUR"
  settlement_amount:   4320       (in USD cents)
  settlement_currency: "USD"
  exchange_rate:       1.08
  rate_locked_at:      "2026-04-11T10:00:00Z"

Geographic Distribution

Active-Active across 3 regions:

+------------------+     +------------------+     +------------------+
|  US-East         |     |  EU-West         |     |  AP-Southeast    |
|  - Full stack    |     |  - Full stack    |     |  - Full stack    |
|  - DB primary    |     |  - DB primary    |     |  - DB primary    |
|    (US merchants)|     |    (EU merchants)|     |    (APAC merch.) |
+------------------+     +------------------+     +------------------+

Merchant assigned to region by country.
Within-region: synchronous replication (RPO = 0).
Cross-region: async replication for disaster recovery (RPO = 500ms).

16. Key Tradeoffs

DecisionOption AOption BOur Choice
DatabaseNoSQL (scale)PostgreSQL (ACID)PostgreSQL
Idempotency storageRedis (fast, volatile)DB (durable)DB
Fraud checkSync only (slow, safe)Async only (fast, risky)Both layers
Ledger modelSingle-entry (simple)Double-entry (auditable)Double-entry
Webhook deliveryAt-most-onceAt-least-onceAt-least-once
Card storageMerchant-sideCentralized vaultCentral vault
Settlement timingReal-timeBatch (T+2)Batch (T+2)
Consistency modelEventualStrongStrong
Event publishingDual-write (simple)Outbox pattern (safe)Outbox
PSP strategySingle PSPMulti-PSP routingMulti-PSP

17. Failure Scenarios and Mitigations

Scenario                           Mitigation
---------------------------------------------------------------------------
PSP timeout during charge          Store as "pending"; query PSP for status;
                                   idempotent retry
Double charge (network retry)      Idempotency key prevents duplicate processing
DB failure mid-transaction         Saga pattern: compensating transaction to PSP
Webhook endpoint down              Exponential retry for 24 hours; manual replay
Fraud model false positive         Manual review queue; merchant override option
Reconciliation mismatch            Auto-resolve known patterns; escalate unknowns
HSM failure                        Redundant HSM pair; failover in < 1s
Surge in chargebacks               Auto-pause merchant; escalate to risk team
Data center failure                Active-active in 3 regions; DNS failover
Exchange rate stale                Lock rate at quote time; refresh every minute
Outbox worker lag                  Monitor outbox table size; auto-scale workers

Key Takeaways

  1. Idempotency is non-negotiable in payment systems -- every write endpoint must accept an idempotency key and guarantee exactly-once semantics, including the calls to external PSPs.
  2. Double-entry bookkeeping ensures every dollar is accounted for and enables automated reconciliation with external PSP records.
  3. PCI-DSS compliance is achieved by isolating card data in a tokenization vault -- merchants never see raw card numbers, drastically reducing audit scope.
  4. The outbox pattern solves the dual-write problem -- database state changes and event publishing become atomic, which is critical for webhook reliability.
  5. Strong consistency is chosen over availability for financial data -- unlike most other systems, payment systems cannot tolerate eventual consistency for the transaction path.
  6. Multi-PSP routing with automatic failover ensures that a single PSP outage does not take down the entire payment platform.