Collision Course: The Real Mathematics Behind UUID Uniqueness

    February 12, 2024
    10 min read
    Long read
    Mathematical
    uuid
    deep-dive

    Ever heard someone say, _“If you generate enough UUIDs, you’ll eventually get a collision”_?

    Technically, they’re right. Practically? Not even close.

    This article dives into the math and probability theory behind UUID collisions — especially UUIDv4 — and explains why your distributed system is (almost certainly) safe.


    🧬 A Quick Refresher: What Is a UUIDv4?

    A UUIDv4 is a 128-bit identifier where 122 bits are random (the remaining bits are used for version and variant metadata).

    That means there are:

    code
    2^122 ≈ 5.3 × 10^36

    possible UUIDv4s.

    For comparison:

    • The number of grains of sand on Earth: ~7.5 × 10^18
    • The number of stars in the observable universe: ~1 × 10^24

    🎂 The Birthday Problem: A Collision Analogy

    To understand UUID collisions, we use a classic example from probability theory: the birthday paradox.

    The Problem:

    > How many people need to be in a room before two of them are likely to share a birthday?

    Answer: Just 23 people for a 50% chance.

    This unintuitive result is based on comparing every pair in the group, not just matching to a fixed value.

    The same logic applies to UUID collisions — every newly generated UUID is compared against all previously generated ones.


    📐 The Formula: Collision Probability

    Let’s apply the birthday formula to UUIDs.

    Approximate Probability:

    code
    p ≈ 1 - e^(-n² / (2 × N))

    Where:

    • n = number of UUIDs generated
    • N = total possible UUIDs (2^122)
    • p = probability of at least one collision

    Let’s reverse the formula to ask:

    How many UUIDs can we generate before the chance of a collision hits 50%?


    🔢 The Threshold: When Collisions Become "Likely"

    Plugging into the math:

    code
    n ≈ sqrt(2 × N × ln(1 / (1 - p)))

    For p = 0.5 and N = 2^122, we get:

    code
    n ≈ 2.71 × 10^18 UUIDs

    That’s 2.7 quintillion UUIDs before hitting just a 50% collision chance.

    To generate that many:

    • At 1 million UUIDs/sec → would take ~85,000 years
    • At 1 billion UUIDs/sec → still takes 85 years

    > Bottom line: UUIDv4 is extremely safe at global scale.


    🧪 Real-World Collision Math

    Let’s break down smaller scale scenarios:

    Case 1: Generate 1 million UUIDs

    code
    p ≈ 1 - e^(-(1e6)^2 / (2 × 2^122)) ≈ 1.84 × 10^-29

    That’s a 1 in 10 octillion chance. You’re more likely to be struck by lightning while winning the lottery.

    Case 2: Generate 1 billion UUIDs

    code
    p ≈ 1.84 × 10^-23

    Still absurdly low. Nothing to worry about in production.


    🔍 Where the Myth Comes From

    So why do developers still fear collisions?

    1. Misunderstood Math

    The birthday paradox is counterintuitive. People often underestimate how large 2^122 really is.

    2. Truncated UUIDs

    Some teams shorten UUIDs for display (e.g. first 8 characters). This reduces entropy to ~32 bits — collisions are much more likely in that case.

    3. Bad PRNGs or Bugs

    Poorly implemented UUID generators (especially non-cryptographic ones) can increase risk.

    Always use standard libraries and CSPRNGs (os.urandom, secrets, etc.).


    🧠 The Bigger Picture: Collision Isn’t the Real Risk

    When choosing ID schemes, the real trade-offs are often:

    • Storage size (UUIDs = 16 bytes vs INT = 4 bytes)
    • Index performance (UUIDv4 = random inserts vs UUIDv7 = ordered)
    • Traceability (UUIDv1 = timestamp + MAC; UUIDv4 = anonymous)
    • Debuggability (ULID, KSUID, and UUIDv7 are more human-friendly)

    Collision risk? Not even in the top five concerns.


    💡 Bonus: UUIDv7 Improves Sortability

    One reason some developers avoid UUIDv4 is the random ordering in indexes and logs.

    UUIDv7, introduced in RFC 9562 (2024), addresses this:

    • Includes a timestamp in the first 48 bits
    • Still has 74+ bits of randomness
    • Lexicographically sortable

    It’s great for databases, logs, and systems that benefit from time-ordered IDs — without sacrificing global uniqueness.


    ✅ Takeaways for Developers

    • UUIDv4 collisions are mathematically possible but practically irrelevant
    • You can safely generate billions of UUIDv4s with zero collision concern
    • If you're seeing collisions, it's likely due to:

    - Shortened/truncated UUIDs

    - Faulty generation (e.g., random() instead of secrets)

    - Logic bugs (e.g., duplicate writes)

    > Bottom line: It’s not the math — it’s usually the implementation.


    Final Thoughts

    UUIDs are a brilliant engineering solution — elegant in theory, rock-solid in practice.

    Understanding the math demystifies them and lets us focus on real engineering trade-offs, not hypothetical ghosts.

    So next time someone warns about UUID collisions, send them this article… and get back to shipping.

    🧠 Probability theory: 1

    Myth: 0

    Generate Your Own UUIDs

    Ready to put this knowledge into practice? Try our UUID generators:

    Generate a Single UUID

    Create a UUID with our fast, secure generator

    Bulk UUID Generator

    Need multiple UUIDs? Generate them in bulk

    Summary

    This article explores the real mathematics behind UUID uniqueness using probability theory and the birthday problem. Learn how collision risks are calculated and why UUIDv4 remains safe for use even at massive scales.

    TLDR;

    UUID collisions are a theoretical possibility — but practically impossible under real-world conditions.

    Key takeaways:

    • UUIDv4 offers 122 bits of randomness, making collisions vanishingly rare
    • You’d need to generate billions per second for centuries to approach meaningful risk
    • The birthday paradox helps model collision probabilities for large sets of IDs

    For nearly every application, UUIDv4 provides more than enough uniqueness — no central coordination required.

    Cookie Consent

    We use cookies to enhance your experience on our website. By accepting, you agree to the use of cookies in accordance with our Privacy Policy.