UUIDs in Distributed Event Sourcing: Patterns and Pitfalls

    July 15, 2024
    10 min read
    Technical deep-dive
    Architecture
    uuid
    distributed-systems
    microservices

    Event sourcing is the beating heart of modern distributed systems — capturing every change as an immutable log of events.

    And UUIDs?

    They’re the identity layer of those events. The tags. The trail of breadcrumbs.

    But while UUIDs provide uniqueness, their role in event sourcing is more nuanced than just “generate and forget.”

    Let’s explore how UUIDs are used — and misused — in distributed event systems.


    🔁 What UUIDs Do in Event Sourcing

    In distributed event sourcing, UUIDs are commonly used for:

    • Event IDs: globally unique reference for each change
    • Idempotency keys: to detect and skip duplicate writes
    • Correlation IDs: tracing a request across services
    • Aggregate IDs: identifying the entity being updated

    They help ensure:

    • No duplicate replays
    • No partial writes
    • Consistent cross-node processing

    📦 Common Patterns That Work Well

    1. **UUIDs as Event Identity**

    Each event gets a UUIDv4 or UUIDv7:

    json
    {
      "event_id": "550e8400-e29b-41d4-a716-446655440000",
      "type": "UserCreated",
      "payload": { "user_id": "abc123" },
      "timestamp": "2024-07-15T14:00:00Z"
    }

    Even across retries or node restarts, that event_id remains constant — critical for deduplication.

    2. **Idempotent Handlers with UUID Checking**

    Store processed event_ids in a side table or Redis:

    sql
    IF NOT EXISTS (SELECT 1 FROM processed_events WHERE event_id = ?) THEN
      INSERT INTO processed_events ...
      HANDLE EVENT ...
    END IF

    This makes consumers replay-safe.

    3. **Aggregate-Level UUIDs**

    Aggregate roots (e.g. users, accounts, carts) can use UUIDs to:

    • Partition event streams
    • Ensure cross-service correlation
    • Maintain uniqueness in global topics
    json
    "user_id": "9f97b4af-8312-4fae-a3c5-76f7d819b2ab"

    ❌ Pitfalls and Anti-Patterns

    1. **Duplicate UUID Generation**

    You’d think UUIDs are always unique — but if you’re using poor RNG sources (like Math.random() or local clock + MAC without a namespace), collisions happen.

    Fix: Always use a CSPRNG, and consider deterministic v5 UUIDs for idempotency across retries.


    2. **Non-Sortable UUIDs in Ordered Logs**

    UUIDv4 is fully random — which means no natural sort order.

    If your event store (e.g. Kafka, Pulsar) expects messages to be ordered by ID or timestamp, UUIDv4 won’t help.

    Fix: Use UUIDv7 or ULID for sortable, millisecond-accurate IDs.


    3. **Timestamp Confusion**

    Event consumers may assume UUID timestamp == event timestamp. But:

    • UUIDv1 embeds time
    • UUIDv4 doesn’t
    • UUIDv7 does, but you must interpret it

    Fix: Always store an explicit created_at timestamp in ISO8601, separate from the UUID.


    4. **Truncated UUIDs in Cache Keys**

    Some teams shorten UUIDs for performance:

    json
    cache_key = "user:" + uuid[0..8]

    Which works... until you have collisions.

    Fix: Use full UUIDs or hash the UUID if space is a concern. Never truncate blindly.


    🛠️ Best Practices

    PrincipleRecommendation
    Event ID formatUUIDv7 or ULID for sortability
    Aggregate identityUUIDv4 or namespaced UUIDv5
    Replay protectionStore processed event_ids
    Message orderingUse timestamp-based UUIDs, not v4
    Collision preventionCSPRNG or deterministic UUIDv5
    Logs and tracesCorrelate with UUID per request or workflow

    🧪 Sample UUID Strategy

    Here’s a JSON event model that uses UUIDs effectively:

    json
    {
      "event_id": "01H8TVF3YVRXN6BGC36FXCX8YT",
      "aggregate_id": "9a4cfe23-cd5d-4d20-a2e4-66efb4303a1a",
      "type": "OrderPlaced",
      "payload": {
        "order_id": "O-12345",
        "amount": 99.99
      },
      "occurred_at": "2024-07-15T13:45:00.000Z"
    }

    Notes:

    • event_id is ULID (sortable + compact)
    • aggregate_id is UUIDv4
    • occurred_at is explicit and canonical

    Final Thoughts

    UUIDs bring order to chaos in event-driven systems — but only if you use them with intention.

    • Don't rely on "random = safe"
    • Know your UUID version
    • Design your identifiers like they’re part of your architecture — because they are

    🎯 In event sourcing, identity is everything. And UUIDs? They're the passports your events use to move through time, space, and system boundaries.

    Generate Your Own UUIDs

    Ready to put this knowledge into practice? Try our UUID generators:

    Generate a Single UUID

    Create a UUID with our fast, secure generator

    Bulk UUID Generator

    Need multiple UUIDs? Generate them in bulk

    Summary

    This article explores how UUIDs power distributed event sourcing systems, helping ensure idempotency, event identity, and ordering. It unpacks common patterns, anti-patterns, and actionable strategies to avoid subtle bugs.

    TLDR;

    UUIDs are essential in event sourcing for ensuring uniqueness and replay integrity — but they come with traps.

    Key takeaways:

    • UUIDs help enforce idempotency and event identity across services
    • Sortability matters — UUIDv7 or ULID improves replay logic and batching
    • Pitfalls include duplicate ingestion, timestamp confusion, and log divergence

    Use UUIDs with structure and care — your event log (and future self) will thank you.

    Cookie Consent

    We use cookies to enhance your experience on our website. By accepting, you agree to the use of cookies in accordance with our Privacy Policy.