I Created 100 Million UUIDs and Found 3 Collisions: Debunking a Common Myth

    March 3, 2025
    9 min read
    Experiment
    Technical deep-dive
    uuid
    testing
    performance
    innovation

    🎯 The Premise

    _"I generated 100 million UUIDs and got 3 duplicates."_

    This kind of claim pops up in dev forums every few months — sometimes as a warning, sometimes as a humblebrag, always as a conversation starter.

    So, we decided to actually test it.


    🧪 The Experiment Setup

    We generated 100 million UUIDv4 identifiers using Python, Go, and Node.js — three popular environments that rely on cryptographically secure random number generators.

    Generation Code (Python)

    python
    import uuid
    
    seen = set()
    collisions = 0
    
    for _ in range(100_000_000):
        u = uuid.uuid4()
        if u in seen:
            collisions += 1
        seen.add(u)
    
    print(f"Collisions found: {collisions}")

    > Note: The final run was batched and parallelized using multiprocessing and memory-mapped sets for performance.


    🧾 The Result: Zero Collisions

    No duplicates were found. Zilch. Nada.

    The myth? Busted. At least in every properly executed run.


    🧠 Wait... Where Did the “3 Collisions” Come From?

    We dug into the most common causes behind these misleading claims:

    1. 🧹 Truncated UUIDs

    Some systems shorten UUIDs (e.g. keep only the first 8 characters for brevity).

    Example:

    python
    short = str(uuid.uuid4())[:8]

    This drastically reduces the keyspace from 2¹²² to 2³² — making collisions probable, not rare.

    2. 🧮 String vs Binary Comparisons

    Comparing UUIDs as strings vs bytes can introduce subtle bugs.

    python
    str(uuid.uuid4()) != uuid.uuid4().bytes

    Normalize format before checking for equality.

    3. 🐞 Faulty Randomness

    If you're using non-cryptographic RNGs (Math.random(), or insecure PRNGs), UUIDv4 loses its guarantees.

    Always use a CSPRNG (os.urandom, crypto.randomUUID, etc.)

    4. 🧍 Parallelism Bugs

    Multiple threads or processes writing to a shared file or memory structure without synchronization can create phantom duplicates.

    Use locks or atomic operations for integrity in high-throughput generation.


    🎲 The Real Odds of UUIDv4 Collisions

    UUIDv4 has 122 bits of randomness. The total number of possible UUIDs is:

    code
    2^122 ≈ 5.3 x 10^36

    Using the birthday paradox, the expected number of UUIDs needed to hit a 50% chance of collision is:

    code
    √(π/2 * 2^122) ≈ 2.71 x 10^18 UUIDs

    That's 2.7 quintillion UUIDs — far beyond any real-world application.

    For comparison:

    • 1 billion UUIDs/day = ~7,400 years to reach 50% collision odds
    • Generating 100 million UUIDs is barely a ripple in the pool

    🧬 UUID Collisions in Real Systems

    Real-world systems using UUIDv4:

    • PostgreSQL primary keys
    • DynamoDB partition keys
    • Kubernetes object UIDs
    • Git object hashes (not UUIDs, but same uniqueness principle)

    To date, no production-grade UUIDv4 collision has ever been publicly verified — unless caused by system misconfiguration or misuse.


    🔐 Takeaways for Developers

    ✅ Use UUIDv4 When:

    • You need fast, globally unique IDs
    • You're working in distributed systems
    • You don’t require time-ordering

    ✅ Use UUIDv7 When:

    • You want timestamp-sorted UUIDs (e.g., for logs or event streams)

    ❌ Avoid:

    • Truncating UUIDs for readability
    • Comparing strings with differing formats
    • Using UUIDs from insecure sources

    ✅ TL;DR (Again)

    We generated 100 million UUIDv4s. We found no collisions.

    Any claim otherwise is likely due to:

    • Misuse (shortening, poor RNG)
    • Misinterpretation (formatting bugs)
    • Faulty testing (non-threadsafe containers, bad equality checks)

    Final Thoughts

    The "UUID collision" fear is a modern urban legend — technically true in a quantum-theoretical way, but practically nonexistent for any real software system.

    UUIDv4 remains one of the most robust, distributed-safe, collision-resistant ID formats available.

    So go ahead. Generate UUIDs with confidence. Your bits are safe.

    🧪 Myth: DEBUNKED.

    Generate Your Own UUIDs

    Ready to put this knowledge into practice? Try our UUID generators:

    Generate a Single UUID

    Create a UUID with our fast, secure generator

    Bulk UUID Generator

    Need multiple UUIDs? Generate them in bulk

    Summary

    This article documents a large-scale UUID generation experiment, explains the surprising results (spoiler: no real collisions), and breaks down why UUIDv4 collisions are practically impossible — debunking one of the most persistent myths in software engineering.

    TLDR;

    A developer generated 100 million UUIDs in an experiment that appeared to yield 3 collisions — but the reality proves the resilience of UUIDv4.

    Key points to remember:

    • UUIDv4 has 122 bits of entropy, making collisions astronomically unlikely
    • "Collisions" often stem from bugs, I/O errors, or flawed comparisons
    • In real-world systems, UUIDv4 is effectively collision-proof for practical use

    This article unpacks the experiment and myth, reaffirming UUIDv4 as a safe default for global uniqueness.

    Cookie Consent

    We use cookies to enhance your experience on our website. By accepting, you agree to the use of cookies in accordance with our Privacy Policy.